Async Wandering Part 13 — Reentrant recursive async lock

This is the thirteenth part of the Async Wandering series. For your convenience you can find other parts in the table of contents in Part 1 – Why creating Form from WinForms in unit tests breaks async?

Today I’ll discuss some misconception regarding the reentrant recursive async lock described in this article.

It shows the following code:

with the following output:

and concludes that the reentrant recursive async lock is impossible. Sounds plausible but the code is simply incorrect.

The code tries to get the lock in a recursive manner but mixes two contradictory requirements. It effectively wants the lock to be recursive and to be non-recursive at the same time.

Why? Let’s say that the lock is indeed possible and it works okay. With all the sleeping around, we may get the following order of events:

What can happen next? If main goes to the NonThreadSafe then we don’t want the child to take the lock. But we wanted the lock to be reentrant recursive so that’s against the rules. The child should be allowed to take the lock.

But let’s say that what we meant is that only one execution context can get the lock and that execution context is allowed to take it recursively. In that case, we want the main to finish the NonThreadSafe and then wait for the child to complete in line 18. But this line is still inside the critical section so the child shouldn’t be allowed to take the lock as it’s in a different execution context. But since we wait for the child to complete — we just get a deadlock.

The code is incorrect, that’s all. Let’s make it a little better:

The difference is in line 14 — we take the lock in the main context again.

Can we now implement such a lock? The answer is — yes. Here it goes:

That’s a super simple implementation but it shows the idea which shouldn’t be surprising when we understand how a regular lock works.

The regular locking mechanism is tied to a thread. Since the thread ID cannot change, we don’t need to keep an extensive track of it. Just holding the ID is enough. However, that’s not enough with async contexts. We need to hold the context, but also understand how it changes. If we had a way to keep a strong reference to the ExecutionContext instance, we could do it in a much more robust way. However, we don’t have such an option, so we need to track it manually.

The idea is: every time we take a lock, we create a new hash of the execution trace. We represent it with just a random letter (told you it’s a super simple implementation). So now, when someone tries to take the lock, we check the hash and use it to determine whether the reentrancy is allowed.

We start with a field used for global synchronization (line 2). Next goes the “lock path” representing current hash of the owner (line 3). Line 4 holds the hash value of the execution context.

We initialize the lock in the constructor.

Next, LockAsync does the job. It locks all the competing threads, randomizes new character, and the checks if we can take the lock. If we’re allowed, we just store the new path and return. Otherwise, we create a new waiter, and add it to the queue (lines 25-26). We then schedule a continuation to try taking the lock again (line 31) and return. Notice, that LockAsync is not marked as async — that would change the execution context and the contents of contextPath variable! Yet another tricky part of async code.

Releasing the lock is similar. We lock all the threads and remove last letter of the path. We then wake up all the waiters.

Obviously, this is not a production-ready code.

Full snippet at dotnetfiddle:


Okay, but what if we go with the original snippet (with no additional lock taking on the main task)? Is it possible to implement such a lock with “only one execution context allowed, reentrant and recursive” semantic? It’s still not smart (as the code is prone to a data corruption) but look at this:

It’s almost the same as before. The idea is that when setting a new value for the local hash, we check if it should be allowed to run, or if something changed it in the middle. We check for it in line 13 and detect that the new value is “entering the context to execute” because it’s longer than the previous value of the field. We then stop the thread (line 14) and wait for someone unlocking the lock (line 61). Again, this is prone to the state corruption as the main task may be in the middle of the critical section and then the child may take ownership of the lock.

Generally, it is possible to implement a reentrant recursive async lock, but you need to be careful and understand that the reasoning is much harder. Especially, that every continuation may or may not be executed on a different thread (because of finishing synchronously).