Running Anaconda with DGL and mxnet on CUDA GPU in Spark running in EMR

Today I’m going to share my configuration for running custom Anaconda Python with DGL (Deep Graph Library) and mxnet library, with GPU support via CUDA, running in Spark hosted in EMR. Actually, I have Redshift configuration as well, with support for gensim, tensorflow, keras, theano, pygpu, and cloudpickle. You can also install more libraries if … Continue reading Running Anaconda with DGL and mxnet on CUDA GPU in Spark running in EMR

Connecting to Redshift from Spark running in EMR

Today I’ll share my configuration for Spark running in EMR to connect to Redshift cluster. First, I assume the cluster is accessible (so configure virtual subnet, allowed IPs and all network stuff before running this). I’m using Zeppelin so I’ll show two interpreters configured for the connection, but the same thing should work with standalone … Continue reading Connecting to Redshift from Spark running in EMR

Spark and NullPointerException in UTF8String.contains

Recently I was debugging a NullPointerException in Spark. The stacktrace was indicating this: After some digging I found out that the following query causes the problem: If I commented out the line with the comment the NPE was no longer there. Also, when I replaced either df2(“ref”) or df1(“ref”) with lit(“ref”) it was not crashing … Continue reading Spark and NullPointerException in UTF8String.contains