Recently I was debugging a memory leak in a Java 11 code running in AWS Lambda. Here are two tricks you may find useful if you need to do similar.
Taking a memory dump
There is no direct way to take a dump from outside of the lambda. You can use the following code and run it from inside the lambda:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
public static void dumpHeap() { MBeanServer server = ManagementFactory.getPlatformMBeanServer(); HotSpotDiagnosticMXBean mxBean = ManagementFactory.newPlatformMXBeanProxy( server, "com.sun.management:type=HotSpotDiagnostic", HotSpotDiagnosticMXBean.class); mxBean.dumpHeap("/tmp/dump.hprof", true); AmazonS3 s3Client = AmazonS3ClientBuilder.standard() .withRegion(AWSClients.REGION.toString()) .withCredentials(new DefaultAWSCredentialsProviderChain()) .build(); TransferManager tm = TransferManagerBuilder.standard() .withS3Client(s3Client) .build(); Upload upload = tm.upload("bucket", "dumps/dump.hprof", new File("/tmp/dump.hprof")); upload.waitForCompletion(); System.gc(); } |
We first create a bean for managing the JVM. Then we dump the memory to a file in the tmp
directory. Mind that you need to use .hprof
extension. Finally, we upload the file to S3. I use TransferManager
here because the file is big, and PutObject
hangs.
Also, I call the garbage collection at the end. This is optional.
Step through debugging
There is no way to debug the code directly. However, you can always modify your code to be able to step through it and exit early. Let’s say that your code looks like this:
1 2 3 |
operation1(); operation2(); operation3(); |
You can now add a string field to the input of your lambda, and restructure your code like this:
1 2 3 4 5 6 7 8 9 |
if(input.containts("operation1")){ operation1(); if(input.containts("operation2")){ operation2(); if(input.containts("operation3")){ operation3(); } } } |
This way you can easily control the code from the outside by passing string like “operation1,operation2” as the input. If you don’t know which line of code is causing the leak, then call your lambda 1000 times with “operation1” only. If it crashes with OOM, then you have your culprit. Otherwise call the lambda another 1000 times with “operation1,operation2”, and see if it crashes. And so on.