Investigating AWS SDK conflicts in EMR

When you deploy your package to Amazon Elastic Map Reduce (EMR), you can access the AWS SDK provided by the platform. This gets tricky if you compile your code against different version of SDK because then you may get very cryptic bugs in runtime, like class not found or method not existing. You should always check EMR changelog to see if the SDK version has changed.

But what to do if you have this problem? How to debug it?

First, make sure that your jars are on the classpath. You can add them using spark.driver.extraClassPath (similar for executor) and spark.driver.userClassPathFirst parameters.

Next, verify that your classes are actually loaded. Just add -verbose:class when submitting the job. Next, see the logs and check if the class you want to use is loaded and where it comes from.

Now you should probably see the reason why a method is not found. Typically it is not there (because of SDK version change) or it has slightly different signature. If you still have no clue what’s going on, make sure that you are using right classloader.