Spark and NullPointerException in UTF8String.contains

Recently I was debugging a NullPointerException in Spark. The stacktrace was indicating this: After some digging I found out that the following query causes the problem: If I commented out the line with the comment the NPE was no longer there. Also, when I replaced either df2(“ref”) or df1(“ref”) with lit(“ref”) it was not crashing … Continue reading Spark and NullPointerException in UTF8String.contains

Random notes from crashing and hanging EMR Spark job

It sometimes happens that your EMR job crashes or hangs indefinitely with no meaningful log. You can try to capture memory dump but it is not very useful when your cluster machines have hundreds gigabytes of memory each. Below are “fixes” which worked for me. If it just crashes with lost slave or lost task, … Continue reading Random notes from crashing and hanging EMR Spark job