This is the fourth part of the JVM Inside Out series. For your convenience you can find other parts in the table of contents in Part 1 — Getting object address
Typical locking pattern in Java (and other languages, even outside them JVM ecosystem) looks like this:
1 2 3 4 5 6 |
lock.lock(); try{ ... }finally{ lock.unlock(); } |
Simple enough, nothing should break here. However, there is a catch.
Our code is optimized a lot. Compiler (javac) does that, JIT does that, even CPU does that. It tries to preserve semantic of our application but if we don’t obey the rules (i.e. we don’t use barriers when accessing variables modified in other threads) we may get unexpected results.
try
block in JVM is implemented using metadata. There is a piece of information saying that try
is between instructions X and Y. If we don’t get to those lines then the try
is not respected (and finally
is not called). Under the hood it is very „basic” approach — operating system mechanisms are used (SEH, SJLJ, signals etc) to catch interrupt (whether hardware or software) and ultimately to compare addresses. Details may differ but general concept is similar across platforms.
Now, what happens if JIT decides to compile the code like this:
1 2 3 |
1: call lock.lock(); 2: nop 3: anything from try |
We finish taking lock and we end up in instruction 2 but we are not in try
block yet. Now, if some out of band exception appears we never release the lock. Out of band exception like ThreadDeath
or OutOfMemory
.
Typically we would like to kill JVM when any of these out of band situations happen. But nothing stops us from catching them and stop the thread from being killed.
Let’s take this code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 |
import java.sql.Date; import java.util.concurrent.locks.ReentrantLock; public class Play{ public static void main(String[] args) throws InterruptedException { final ReentrantLock lock = new ReentrantLock(); Thread t = new Thread(){ @Override public void run(){ try { lock.lock(); while (new Date(2019, 9, 19).getTime() > 0) {} // This emulates nop instruction (and infinite loop which isn't clearly infinite so the compiler accepts the code) try{ System.out.println("Try: Never gonna get here"); }finally{ System.out.println("Finally: Never gonna get here"); lock.unlock(); } }catch(Throwable e){ System.out.println(e); } System.out.println("We caught the exception and can 'safely' carry on"); } }; t.start(); Thread.sleep(1000); t.stop(); System.out.println("Checking deadlock"); lock.lock(); System.out.println("Done, no deadlock"); lock.unlock(); } } |
Output is:
1 2 3 |
Checking deadlock java.lang.ThreadDeath We caught the exception and can 'safely' carry on |
and the application hangs forever.
So what happened? We emulated the nop
instruction inserted just before the try
block and exception thrown right in that place. We can see that background thread handles the exception and continues execution but the lock is never released so the main thread is blocked forever.
Now let’s see what happens if we try taking the lock in the try
block (warning: this code is not correct! it is just to show the idea):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 |
import java.sql.Date; import java.util.concurrent.locks.ReentrantLock; public class Play{ public static void main(String[] args) throws InterruptedException { final ReentrantLock lock = new ReentrantLock(); Thread t = new Thread(){ @Override public void run(){ try { try{ lock.lock(); while (new Date(2019, 9, 19).getTime() > 0) {} // This emulates nop instruction (and infinite loop which isn't clearly infinite so the compiler accepts the code) System.out.println("Try: Never gonna get here"); }finally{ System.out.println("Finally: Never gonna get here"); lock.unlock(); } }catch(Throwable e){ System.out.println(e); } System.out.println("We caught the exception and can 'safely' carry on"); } }; t.start(); Thread.sleep(1000); t.stop(); System.out.println("Checking deadlock"); lock.lock(); System.out.println("Done, no deadlock"); lock.unlock(); } } |
Output:
1 2 3 4 5 |
Checking deadlock Finally: Never gonna get here Done, no deadlock java.lang.ThreadDeath We caught the exception and can 'safely' carry on |
Application finishes successfully. Why is this code wrong? It’s because we try to release the lock in finally
but we don’t know if we locked it. If someone else locked it then we may release it incorrectly or get exception. We may also break it in case of recursive situation.
Now the question is: is this just a theory or did it actually happen? I don’t know of any example in JVM world but this happened in .NET and was fixed in .NET 4.0. On the other hand I am not aware of any guarantee that this will not happen in JVM.
How to solve it? Avoid Thread.stop()
as stopping threads is bad. But remember that it doesn’t solve the „problem” — what if you have distributed lock (whether it is OS lock across processes or something across machines)? You have exactly the same issue and saying „avoid Process.kill()” or „avoid getting your machine broken” is not an answer. This problem can always appear so think about it whenever you take the lock. And as a rule of thumb, track the owner and always take the lock with timeout.