This is the eleventh part of the Chatterbox series. For your convenience you can find other parts in the table of contents in Part 1 – Origins
Today we are going to scrape memory dump capture with Chrome Debugging Protocol and Puppeteer. Some reading before moving on might be helpful.
Let’s start node with Puppeteer and heapsnapshot-parser:
1 2 3 4 5 |
const puppeteer = require('puppeteer'); const parser = require('heapsnapshot-parser'); puppeteer.launch({headless: false, devtools: true, userDataDir: "SomeProfileDirectory", ignoreDefaultArgs: ["--disable-extensions"], args: ["--enable-remote-extensions", "--disable-web-security"]}).then(br => b = br); b.newPage().then(pa => p = pa); |
Got to example.com (or whatever other page) and execute this snippet in console window:
1 2 3 4 5 6 |
window.someObject = { someText: "Some Text here", someNumber: 12345678, someArray: ["Array element 1", "Array element 2"], someBoolean: true }; |
We create an object with different properties. We now want to capture the memory dump, find the object and examine its content.
First, we create the Chrome Debugging Protocol session:
1 |
p.target().createCDPSession().then(cd => c = cd); |
Now we need to take the dump. It’s delivered as a series of chunks so we need to join it manually on our end:
1 2 3 4 5 6 |
var d = []; c.on('HeapProfiler.addHeapSnapshotChunk', (data) => { d.push(data); }); c.send('HeapProfiler.takeHeapSnapshot', { reportProgress: false, treatGlobalObjectsAsRoots: false, captureNumericValue: true }).then(() => console.log("Done")); |
Important thing here is captureNumericValue
— without this the dump will not have the numbers (integers, doubles).
We parse the dump after it’s done:
1 2 |
var snapshotFile = d.map(d => d.chunk).join(""); var snapshot = parser.parse(snapshotFile); |
What we have here is the pure dump of the objects graph. Now, we need to recreate JS objects from it:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
var objectsById = {}; snapshot.nodes.map(node => { objectsById[node.id] = {}; if(node.type === "string"){ objectsById[node.id].__syntheticValue = node.name; } objectsById[node.id].__syntheticId = node.id, objectsById[node.id].__syntheticParents = []; }); snapshot.edges.map(edge => { objectsById[edge.fromNode.id][edge.name_or_index] = objectsById[edge.toNode.id]; objectsById[edge.toNode.id].__syntheticParents.push({ target: objectsById[edge.fromNode.id], edgeName: edge.name_or_index }); }); |
We end with objectsById
collection which holds all the objects. Notice that we extract the string value from name and store a couple of helper synthetic fields.
Now, we want to traverse them and find the string. We provide a helper function:
1 2 3 4 |
var parentPathsAtHeight = (o, height, path) => { if(height == 0) return [{path: path.join(","), target: o}]; return o.__syntheticParents.flatMap(p => parentPathsAtHeight(p.target, height-1, path.concat(p.edgeName))); } |
This thing will go through the object hierarchy up the tree up to a given height. We now want to find the text Some Text here
and since we know it’s a direct child of the object we’re after, we just need to go one parent up:
1 2 3 4 5 |
var oneLineText = "Some Text here"; var matchingStrings = Object.values(objectsById).filter(o => o.__syntheticValue == oneLineText).map(s => {return { o: s, parents: parentPathsAtHeight(s, 1, []) }}); |
Obviously, this one string may be held by multiple objects so we need to understand the structure of the parents to find the right one. We can now traverse this in any way, for instance like this:
1 |
var wantedObject = matchingStrings[0].parents.filter(p => p.path.startsWith("whateverProperty")).filter(p => p.path.indexOf("someProperty,someOtherProperty") >= 0)[0].target |
Since this is a simple memory dump, we don’t need to do that. We just now that the first string is the one we need. Now, we can dump values:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 |
matchingStrings[0].parents[1].target { __syntheticId: 5777, __syntheticParents: [ { target: [Object], edgeName: '85 / DevTools console' }, { target: [Object], edgeName: '86 / DevTools console' }, { target: [Object], edgeName: 'someObject' }, { target: [Object], edgeName: 'value' } ], someText: { __syntheticValue: 'Some Text here', __syntheticId: 15581, __syntheticParents: [ [Object], [Object], [Object] ], map: { __syntheticId: 91, __syntheticParents: [Array], dependent_code: [Object], map: [Object] } }, someNumber: { __syntheticId: 49703, __syntheticParents: [ [Object] ], value: { __syntheticValue: '12345678', __syntheticId: 49705, __syntheticParents: [Array] } }, someArray: { __syntheticId: 49707, __syntheticParents: [ [Object] ], '<dummy>': { __syntheticValue: 'Array element 1', __syntheticId: 14887, __syntheticParents: [Array], map: [Object] }, '': { __syntheticValue: 'Array element 2', __syntheticId: 15879, __syntheticParents: [Array], map: [Object] }, elements: { '0': [Object], '1': [Object], __syntheticId: 49711, __syntheticParents: [Array], map: [Object] }, map: { __syntheticId: 34723, __syntheticParents: [Array], transitions: [Object], descriptors: [Object], prototype: [Object], back_pointer: [Object], dependent_code: [Object], map: [Object], '<dummy>': [Object] } }, someBoolean: { __syntheticId: 71, __syntheticParents: [ [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object], [Object] ], map: { __syntheticId: 267, __syntheticParents: [Array], dependent_code: [Object], map: [Object] }, '<dummy>': { __syntheticValue: 'true', __syntheticId: 1101, __syntheticParents: [Array], map: [Object] }, '': { __syntheticValue: 'boolean', __syntheticId: 619, __syntheticParents: [Array], map: [Object] } }, map: { __syntheticId: 49709, __syntheticParents: [ [Object], [Object] ], descriptors: { '0': [Object], '3': [Object], '6': [Object], '9': [Object], __syntheticId: 49697, __syntheticParents: [Array], enum_cache: [Object], map: [Object] }, prototype: { __syntheticId: 30425, __syntheticParents: [Array], constructor: [Object], __defineGetter__: [Object], __defineSetter__: [Object], hasOwnProperty: [Object], __lookupGetter__: [Object], __lookupSetter__: [Object], isPrototypeOf: [Object], propertyIsEnumerable: [Object], toString: [Object], valueOf: [Object], 'get __proto__': [Object], 'set __proto__': [Object], toLocaleString: [Object], properties: [Object], map: [Object] }, back_pointer: { __syntheticId: 56833, __syntheticParents: [Array], transition: [Circular], descriptors: [Object], prototype: [Object], back_pointer: [Object], dependent_code: [Object], map: [Object], '<dummy>': [Object] }, dependent_code: { __syntheticId: 315, __syntheticParents: [Array], map: [Object] }, map: { __syntheticId: 77, __syntheticParents: [Array], dependent_code: [Object], map: [Circular] }, '<dummy>': { __syntheticId: 1429, __syntheticParents: [Array] } } } |
Okay, we can see a lot here. We do see maps stored by V8 to handle object internals, we see properties, parents etc. The most important thing is:
1 2 3 4 5 6 7 8 9 10 |
> matchingStrings[0].parents[1].target.someText.__syntheticValue 'Some Text here' > matchingStrings[0].parents[1].target.someNumber.value.__syntheticValue '12345678' > matchingStrings[0].parents[1].target.someBoolean["<dummy>"].__syntheticValue 'true' > matchingStrings[0].parents[1].target.someArray.elements["0"].__syntheticValue 'Array element 1' > matchingStrings[0].parents[1].target.someArray.elements["1"].__syntheticValue 'Array element 2' |
So we can see that strings are extracted and stored in the __syntheticValue
. Booleans are stored in some property named < dummy>
whereas arrays have additional property called elements
. Aprat from that, we can get all values from the dump.
It should be now straightforward to analyze memory dumps automatically. Obviously, parsing logic is very straightforward and can be adjusted to our needs.