JVM Inside Out Part 4 — Locks and out of band exceptions

This is the fourth part of the JVM Inside Out series. For your convenience you can find other parts in the table of contents in Part 1 — Getting object address

Typical locking pattern in Java (and other languages, even outside them JVM ecosystem) looks like this:

Simple enough, nothing should break here. However, there is a catch.

Our code is optimized a lot. Compiler (javac) does that, JIT does that, even CPU does that. It tries to preserve semantic of our application but if we don’t obey the rules (i.e. we don’t use barriers when accessing variables modified in other threads) we may get unexpected results.

try block in JVM is implemented using metadata. There is a piece of information saying that try is between instructions X and Y. If we don’t get to those lines then the try is not respected (and finally is not called). Under the hood it is very „basic” approach — operating system mechanisms are used (SEH, SJLJ, signals etc) to catch interrupt (whether hardware or software) and ultimately to compare addresses. Details may differ but general concept is similar across platforms.

Now, what happens if JIT decides to compile the code like this:

We finish taking lock and we end up in instruction 2 but we are not in try block yet. Now, if some out of band exception appears we never release the lock. Out of band exception like ThreadDeath or OutOfMemory.

Typically we would like to kill JVM when any of these out of band situations happen. But nothing stops us from catching them and stop the thread from being killed.

Let’s take this code:

Output is:

and the application hangs forever.

So what happened? We emulated the nop instruction inserted just before the try block and exception thrown right in that place. We can see that background thread handles the exception and continues execution but the lock is never released so the main thread is blocked forever.

Now let’s see what happens if we try taking the lock in the try block (warning: this code is not correct! it is just to show the idea):

Output:

Application finishes successfully. Why is this code wrong? It’s because we try to release the lock in finally but we don’t know if we locked it. If someone else locked it then we may release it incorrectly or get exception. We may also break it in case of recursive situation.

Now the question is: is this just a theory or did it actually happen? I don’t know of any example in JVM world but this happened in .NET and was fixed in .NET 4.0. On the other hand I am not aware of any guarantee that this will not happen in JVM.

How to solve it? Avoid Thread.stop() as stopping threads is bad. But remember that it doesn’t solve the „problem” — what if you have distributed lock (whether it is OS lock across processes or something across machines)? You have exactly the same issue and saying „avoid Process.kill()” or „avoid getting your machine broken” is not an answer. This problem can always appear so think about it whenever you take the lock. And as a rule of thumb, track the owner and always take the lock with timeout.