Data encryption in s3 in Spark in EMR with multiple encryption schemes

Spark supports multiple encryption schemes. You can use client side encryption, server side encryption, etc. What wasn’t working for me for a long time is reading encrypted data and writing as a plain text. Before reading I was configuring encryption and this was working fine. However, writing as plain text didn’t work (data was encrypted), even though I was disabling encryption.

I was told that this is because encryption settings are cached and my changes are not honored. However, what works for me now is using different access protocols to read and write s3 files.

So, for configuration do this:

or in Python do this:

Now, when you read or write file using s3 prefix, it uses encryption with KMS key. However, if you read or write using s3a, it doesn’t encrypt. You can use s3n prefix to configure yet another encryption scheme. If you want to do more, you need to dig into protocol handlers.