Tag Archives

Archive of posts published in the tag: Spark

Machine Learning Part 1 — Linear regression in MXNet

In this series I assume you do know basics of machine learning. I will provide some source code for different use cases but no extensive explanation. Let’s go. Today we will take a look at linear regression in MXNet. We will predict sepal…

Random notes from crashing and hanging EMR Spark job

It sometimes happens that your EMR job crashes or hangs indefinitely with no meaningful log. You can try to capture memory dump but it is not very useful when your cluster machines have hundreds gigabytes of memory each. Below are “fixes” which worked…

Investigating AWS SDK conflicts in EMR

When you deploy your package to Amazon Elastic Map Reduce (EMR), you can access the AWS SDK provided by the platform. This gets tricky if you compile your code against different version of SDK because then you may get very cryptic bugs in…

Dynamically loading JAR file in Zeppelin

Imagine that you need to load JAR file dynamically in Zeppelin working on your EMR cluster. One easy way is to deploy the file to the instance and load it from there, however, what can you do if you have almost no access…