Chatterbox Part 11 — Scraping memory dump in Chrome with Chrome Debugging Protocol

This is the eleventh part of the Chatterbox series. For your convenience you can find other parts in the table of contents in Part 1 – Origins

Today we are going to scrape memory dump capture with Chrome Debugging Protocol and Puppeteer. Some reading before moving on might be helpful.

Let’s start node with Puppeteer and heapsnapshot-parser:

Got to example.com (or whatever other page) and execute this snippet in console window:

We create an object with different properties. We now want to capture the memory dump, find the object and examine its content.

First, we create the Chrome Debugging Protocol session:

Now we need to take the dump. It’s delivered as a series of chunks so we need to join it manually on our end:

Important thing here is captureNumericValue — without this the dump will not have the numbers (integers, doubles).

We parse the dump after it’s done:

What we have here is the pure dump of the objects graph. Now, we need to recreate JS objects from it:

We end with objectsById collection which holds all the objects. Notice that we extract the string value from name and store a couple of helper synthetic fields.

Now, we want to traverse them and find the string. We provide a helper function:

This thing will go through the object hierarchy up the tree up to a given height. We now want to find the text Some Text here and since we know it’s a direct child of the object we’re after, we just need to go one parent up:

Obviously, this one string may be held by multiple objects so we need to understand the structure of the parents to find the right one. We can now traverse this in any way, for instance like this:

Since this is a simple memory dump, we don’t need to do that. We just now that the first string is the one we need. Now, we can dump values:

Okay, we can see a lot here. We do see maps stored by V8 to handle object internals, we see properties, parents etc. The most important thing is:

So we can see that strings are extracted and stored in the __syntheticValue. Booleans are stored in some property named < dummy> whereas arrays have additional property called elements. Aprat from that, we can get all values from the dump.

It should be now straightforward to analyze memory dumps automatically. Obviously, parsing logic is very straightforward and can be adjusted to our needs.