Chatterbox Part 11 — Scraping memory dump in Chrome with Chrome Debugging Protocol

This is the eleventh part of the Chatterbox series. For your convenience you can find other parts in the table of contents in Part 1 – Origins

Today we are going to scrape memory dump capture with Chrome Debugging Protocol and Puppeteer. Some reading before moving on might be helpful.

Let’s start node with Puppeteer and heapsnapshot-parser:

const puppeteer = require('puppeteer');
const parser = require('heapsnapshot-parser');

puppeteer.launch({headless: false, devtools: true, userDataDir: "SomeProfileDirectory", ignoreDefaultArgs: ["--disable-extensions"], args: ["--enable-remote-extensions", "--disable-web-security"]}).then(br => b = br);
b.newPage().then(pa => p = pa);

const puppeteer = require('puppeteer');

const parser = require('heapsnapshot-parser');

puppeteer.launch({headless: false, devtools: true, userDataDir: "SomeProfileDirectory", ignoreDefaultArgs: ["--disable-extensions"], args: ["--enable-remote-extensions", "--disable-web-security"]}).then(br => b = br);

b.newPage().then(pa => p = pa);

Got to example.com (or whatever other page) and execute this snippet in console window:

window.someObject = {
	someText: "Some Text here",
	someNumber: 12345678,
	someArray: ["Array element 1", "Array element 2"],
	someBoolean: true
};

window.someObject = {

someText: "Some Text here",

someNumber: 12345678,

someArray: ["Array element 1", "Array element 2"],

someBoolean: true

};

We create an object with different properties. We now want to capture the memory dump, find the object and examine its content.

First, we create the Chrome Debugging Protocol session:

p.target().createCDPSession().then(cd => c = cd);

1	p.target().createCDPSession().then(cd => c = cd);

Now we need to take the dump. It’s delivered as a series of chunks so we need to join it manually on our end:

var d = [];
c.on('HeapProfiler.addHeapSnapshotChunk', (data) => {
	d.push(data);
});

c.send('HeapProfiler.takeHeapSnapshot', { reportProgress: false, treatGlobalObjectsAsRoots: false, captureNumericValue: true }).then(() => console.log("Done"));

var d = [];

c.on('HeapProfiler.addHeapSnapshotChunk', (data) => {

d.push(data);

});

c.send('HeapProfiler.takeHeapSnapshot', { reportProgress: false, treatGlobalObjectsAsRoots: false, captureNumericValue: true }).then(() => console.log("Done"));

Important thing here is captureNumericValue — without this the dump will not have the numbers (integers, doubles).

We parse the dump after it’s done:

var snapshotFile = d.map(d => d.chunk).join("");
var snapshot = parser.parse(snapshotFile);

1 2	var snapshotFile = d.map(d => d.chunk).join(""); var snapshot = parser.parse(snapshotFile);

What we have here is the pure dump of the objects graph. Now, we need to recreate JS objects from it:

var objectsById = {};
snapshot.nodes.map(node => {
	objectsById[node.id] = {};
	
	if(node.type === "string"){
		objectsById[node.id].__syntheticValue = node.name;
	}
	
	objectsById[node.id].__syntheticId = node.id,
	objectsById[node.id].__syntheticParents = [];
});

snapshot.edges.map(edge => {
	objectsById[edge.fromNode.id][edge.name_or_index] = objectsById[edge.toNode.id];
	objectsById[edge.toNode.id].__syntheticParents.push({
		target: objectsById[edge.fromNode.id],
		edgeName: edge.name_or_index
	});
});

var objectsById = {};

snapshot.nodes.map(node => {

objectsById[node.id] = {};

if(node.type === "string"){

objectsById[node.id].__syntheticValue = node.name;

}

objectsById[node.id].__syntheticId = node.id,

objectsById[node.id].__syntheticParents = [];

});

snapshot.edges.map(edge => {

objectsById[edge.fromNode.id][edge.name_or_index] = objectsById[edge.toNode.id];

objectsById[edge.toNode.id].__syntheticParents.push({

target: objectsById[edge.fromNode.id],

edgeName: edge.name_or_index

});

We end with objectsById collection which holds all the objects. Notice that we extract the string value from name and store a couple of helper synthetic fields.

Now, we want to traverse them and find the string. We provide a helper function:

var parentPathsAtHeight = (o, height, path) => {
	if(height == 0) return [{path: path.join(","), target: o}];
	return o.__syntheticParents.flatMap(p => parentPathsAtHeight(p.target, height-1, path.concat(p.edgeName)));
}

var parentPathsAtHeight = (o, height, path) => {

if(height == 0) return [{path: path.join(","), target: o}];

return o.__syntheticParents.flatMap(p => parentPathsAtHeight(p.target, height-1, path.concat(p.edgeName)));

}

This thing will go through the object hierarchy up the tree up to a given height. We now want to find the text Some Text here and since we know it’s a direct child of the object we’re after, we just need to go one parent up:

var oneLineText = "Some Text here";
var matchingStrings = Object.values(objectsById).filter(o => o.__syntheticValue == oneLineText).map(s => {return {
	o: s,
	parents: parentPathsAtHeight(s, 1, [])
}});

var oneLineText = "Some Text here";

var matchingStrings = Object.values(objectsById).filter(o => o.__syntheticValue == oneLineText).map(s => {return {

o: s,

parents: parentPathsAtHeight(s, 1, [])

}});

Obviously, this one string may be held by multiple objects so we need to understand the structure of the parents to find the right one. We can now traverse this in any way, for instance like this:

var wantedObject = matchingStrings[0].parents.filter(p => p.path.startsWith("whateverProperty")).filter(p => p.path.indexOf("someProperty,someOtherProperty") >= 0)[0].target

1	var wantedObject = matchingStrings[0].parents.filter(p => p.path.startsWith("whateverProperty")).filter(p => p.path.indexOf("someProperty,someOtherProperty") >= 0)[0].target

Since this is a simple memory dump, we don’t need to do that. We just now that the first string is the one we need. Now, we can dump values:

matchingStrings[0].parents[1].target
{
  __syntheticId: 5777,
  __syntheticParents: [
    { target: [Object], edgeName: '85 / DevTools console' },
    { target: [Object], edgeName: '86 / DevTools console' },
    { target: [Object], edgeName: 'someObject' },
    { target: [Object], edgeName: 'value' }
  ],
  someText: {
    __syntheticValue: 'Some Text here',
    __syntheticId: 15581,
    __syntheticParents: [ [Object], [Object], [Object] ],
    map: {
      __syntheticId: 91,
      __syntheticParents: [Array],
      dependent_code: [Object],
      map: [Object]
    }
  },
  someNumber: {
    __syntheticId: 49703,
    __syntheticParents: [ [Object] ],
    value: {
      __syntheticValue: '12345678',
      __syntheticId: 49705,
      __syntheticParents: [Array]
    }
  },
  someArray: {
    __syntheticId: 49707,
    __syntheticParents: [ [Object] ],
    '<dummy>': {
      __syntheticValue: 'Array element 1',
      __syntheticId: 14887,
      __syntheticParents: [Array],
      map: [Object]
    },
    '': {
      __syntheticValue: 'Array element 2',
      __syntheticId: 15879,
      __syntheticParents: [Array],
      map: [Object]
    },
    elements: {
      '0': [Object],
      '1': [Object],
      __syntheticId: 49711,
      __syntheticParents: [Array],
      map: [Object]
    },
    map: {
      __syntheticId: 34723,
      __syntheticParents: [Array],
      transitions: [Object],
      descriptors: [Object],
      prototype: [Object],
      back_pointer: [Object],
      dependent_code: [Object],
      map: [Object],
      '<dummy>': [Object]
    }
  },
  someBoolean: {
    __syntheticId: 71,
    __syntheticParents: [
      [Object], [Object], [Object], [Object],
      [Object], [Object], [Object], [Object],
      [Object], [Object], [Object], [Object],
      [Object], [Object], [Object], [Object],
      [Object], [Object], [Object], [Object],
      [Object], [Object], [Object], [Object],
      [Object], [Object], [Object], [Object],
      [Object], [Object], [Object], [Object],
      [Object], [Object], [Object], [Object],
      [Object], [Object], [Object], [Object],
      [Object], [Object], [Object], [Object],
      [Object], [Object], [Object], [Object],
      [Object], [Object], [Object]
    ],
    map: {
      __syntheticId: 267,
      __syntheticParents: [Array],
      dependent_code: [Object],
      map: [Object]
    },
    '<dummy>': {
      __syntheticValue: 'true',
      __syntheticId: 1101,
      __syntheticParents: [Array],
      map: [Object]
    },
    '': {
      __syntheticValue: 'boolean',
      __syntheticId: 619,
      __syntheticParents: [Array],
      map: [Object]
    }
  },
  map: {
    __syntheticId: 49709,
    __syntheticParents: [ [Object], [Object] ],
    descriptors: {
      '0': [Object],
      '3': [Object],
      '6': [Object],
      '9': [Object],
      __syntheticId: 49697,
      __syntheticParents: [Array],
      enum_cache: [Object],
      map: [Object]
    },
    prototype: {
      __syntheticId: 30425,
      __syntheticParents: [Array],
      constructor: [Object],
      __defineGetter__: [Object],
      __defineSetter__: [Object],
      hasOwnProperty: [Object],
      __lookupGetter__: [Object],
      __lookupSetter__: [Object],
      isPrototypeOf: [Object],
      propertyIsEnumerable: [Object],
      toString: [Object],
      valueOf: [Object],
      'get __proto__': [Object],
      'set __proto__': [Object],
      toLocaleString: [Object],
      properties: [Object],
      map: [Object]
    },
    back_pointer: {
      __syntheticId: 56833,
      __syntheticParents: [Array],
      transition: [Circular],
      descriptors: [Object],
      prototype: [Object],
      back_pointer: [Object],
      dependent_code: [Object],
      map: [Object],
      '<dummy>': [Object]
    },
    dependent_code: { __syntheticId: 315, __syntheticParents: [Array], map: [Object] },
    map: {
      __syntheticId: 77,
      __syntheticParents: [Array],
      dependent_code: [Object],
      map: [Circular]
    },
    '<dummy>': { __syntheticId: 1429, __syntheticParents: [Array] }
  }
}

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

matchingStrings[0].parents[1].target

{

__syntheticId: 5777,

__syntheticParents: [

{ target: [Object], edgeName: '85 / DevTools console' },

{ target: [Object], edgeName: '86 / DevTools console' },

{ target: [Object], edgeName: 'someObject' },

{ target: [Object], edgeName: 'value' }

someText: {

__syntheticValue: 'Some Text here',

__syntheticId: 15581,

__syntheticParents: [ [Object], [Object], [Object] ],

map: {

__syntheticId: 91,

__syntheticParents: [Array],

dependent_code: [Object],

map: [Object]

}

someNumber: {

__syntheticId: 49703,

__syntheticParents: [ [Object] ],

value: {

__syntheticValue: '12345678',

__syntheticId: 49705,

__syntheticParents: [Array]

}

someArray: {

__syntheticId: 49707,

__syntheticParents: [ [Object] ],

'<dummy>': {

__syntheticValue: 'Array element 1',

__syntheticId: 14887,

__syntheticParents: [Array],

map: [Object]

'': {

__syntheticValue: 'Array element 2',

__syntheticId: 15879,

__syntheticParents: [Array],

map: [Object]

elements: {

'0': [Object],

'1': [Object],

__syntheticId: 49711,

__syntheticParents: [Array],

map: [Object]

map: {

__syntheticId: 34723,

__syntheticParents: [Array],

transitions: [Object],

descriptors: [Object],

prototype: [Object],

back_pointer: [Object],

dependent_code: [Object],

map: [Object],

'<dummy>': [Object]

}

someBoolean: {

__syntheticId: 71,

__syntheticParents: [

[Object], [Object], [Object], [Object],

[Object], [Object], [Object]

map: {

__syntheticId: 267,

__syntheticParents: [Array],

dependent_code: [Object],

map: [Object]

'<dummy>': {

__syntheticValue: 'true',

__syntheticId: 1101,

__syntheticParents: [Array],

map: [Object]

'': {

__syntheticValue: 'boolean',

__syntheticId: 619,

__syntheticParents: [Array],

map: [Object]

}

map: {

__syntheticId: 49709,

__syntheticParents: [ [Object], [Object] ],

descriptors: {

'0': [Object],

'3': [Object],

'6': [Object],

'9': [Object],

__syntheticId: 49697,

__syntheticParents: [Array],

enum_cache: [Object],

map: [Object]

prototype: {

__syntheticId: 30425,

__syntheticParents: [Array],

constructor: [Object],

__defineGetter__: [Object],

__defineSetter__: [Object],

hasOwnProperty: [Object],

__lookupGetter__: [Object],

__lookupSetter__: [Object],

isPrototypeOf: [Object],

propertyIsEnumerable: [Object],

toString: [Object],

valueOf: [Object],

'get __proto__': [Object],

'set __proto__': [Object],

toLocaleString: [Object],

properties: [Object],

map: [Object]

back_pointer: {

__syntheticId: 56833,

__syntheticParents: [Array],

transition: [Circular],

descriptors: [Object],

prototype: [Object],

back_pointer: [Object],

dependent_code: [Object],

map: [Object],

'<dummy>': [Object]

dependent_code: { __syntheticId: 315, __syntheticParents: [Array], map: [Object] },

map: {

__syntheticId: 77,

__syntheticParents: [Array],

dependent_code: [Object],

map: [Circular]

'<dummy>': { __syntheticId: 1429, __syntheticParents: [Array] }

}

Okay, we can see a lot here. We do see maps stored by V8 to handle object internals, we see properties, parents etc. The most important thing is:

> matchingStrings[0].parents[1].target.someText.__syntheticValue
'Some Text here'
> matchingStrings[0].parents[1].target.someNumber.value.__syntheticValue
'12345678'
> matchingStrings[0].parents[1].target.someBoolean["<dummy>"].__syntheticValue
'true'
> matchingStrings[0].parents[1].target.someArray.elements["0"].__syntheticValue
'Array element 1'
> matchingStrings[0].parents[1].target.someArray.elements["1"].__syntheticValue
'Array element 2'

> matchingStrings[0].parents[1].target.someText.__syntheticValue

'Some Text here'

> matchingStrings[0].parents[1].target.someNumber.value.__syntheticValue

'12345678'

> matchingStrings[0].parents[1].target.someBoolean["<dummy>"].__syntheticValue

'true'

> matchingStrings[0].parents[1].target.someArray.elements["0"].__syntheticValue

'Array element 1'

> matchingStrings[0].parents[1].target.someArray.elements["1"].__syntheticValue

'Array element 2'

So we can see that strings are extracted and stored in the __syntheticValue. Booleans are stored in some property named < dummy> whereas arrays have additional property called elements. Aprat from that, we can get all values from the dump.

It should be now straightforward to analyze memory dumps automatically. Obviously, parsing logic is very straightforward and can be adjusted to our needs.