Tag Archives

Archive of posts published in the tag: Spark

Machine Learning Part 1 — Linear regression in MXNet

This is the first part of the Machine Learning series. For your convenience you can find other parts using the links below (or by guessing the address): Part 1 — Linear regression in MXNet Part 2 — Linear regression in SQL Part 3…

Random notes from crashing and hanging EMR Spark job

It sometimes happens that your EMR job crashes or hangs indefinitely with no meaningful log. You can try to capture memory dump but it is not very useful when your cluster machines have hundreds gigabytes of memory each. Below are “fixes” which worked…

Investigating AWS SDK conflicts in EMR

When you deploy your package to Amazon Elastic Map Reduce (EMR), you can access the AWS SDK provided by the platform. This gets tricky if you compile your code against different version of SDK because then you may get very cryptic bugs in…

Dynamically loading JAR file in Zeppelin

Imagine that you need to load JAR file dynamically in Zeppelin working on your EMR cluster. One easy way is to deploy the file to the instance and load it from there, however, what can you do if you have almost no access…