Custom memory allocation in C# Part 2 — List copying objects

This is the second part of the Custom memory allocation series. For your convenience you can find other parts in the table of contents in Part 1 — Allocating object on a stack

Hi! Last time we saw how to allocate managed object on a stack. Today we are going to implement custom list which stores objects instead of references to them. This post is inspired by Unsafe Part 3: Benchmarking a java UnsafeArrayList

Introduction

First, we need to examine how the array is stored in memory. Let’s consider a one-dimensional, ordinary (not jagged) array of integers. It is a reference so it has pointer to method table. We can lock on it, so it has sync block. However, it has additional field between method table pointer and its content — one integer representing the size of the array.

So if we store an element in array[0] it is physically placed 8 bytes after the beginning of the array (to be precise: the address pointed by reference).

Our idea is simple: instead of storing references to objects we will store objects themselves in the array of integers. We will need to copy objects by hand and then we will need to recover the reference using pointer arithmetic. Let’s go.

Adding object to the list

Let’s start with declaring class and fields:

We declare generic class and store size of the element (including sync block), dummy reference called _target (we will use it to get references to stored objects), integer array for objects, and current index in the array. Next, let’s implement addition:

We first create typed reference to added item. Next, we calculate the address of the element and move back four bytes to include sync block. Finally, we iterate over all object bytes and copy them one by one to our array.

Let’s now implement getter:

Looks pretty easy. We get two typed references: one pointing to storage array, another one pointing to dummy reference. Next, we modify dummy reference and store there an address of the element to retrieve. We perform simple pointer arithmetic:
(*(int*) *(int*) (&arrayReference) + index * _elementSize*sizeof (int) + 3*sizeof (int))
First, we retrieve the pointer to pointer to pointer to data. Next, we dereference just enough times to get the address of the array. Next, we move forward 3*sizeof(int): we need to skip two integers of the array (pointer to method table and size of the array), and we need to omit sync block of the object.

And this is it.

Testing program

Let’s now evaluate our solution. We start with two helper functions:

We declare a class with sixteen fields. Next, we create a method which will create objects, and finally a method performing few additions.
Next, let’s compare three different containers: ordinary array, built-in list, and our custom list.

The code testing the array goes as follows:

We create an array, create and store few objects, check insertion time, restart the stopwatch, check calculation time, and print results. Almost the same goes for the built-in list:

We create a list with predefined size in order to avoid reallocation. And now the code for our list:

As you can see, all these codes look almost the same. The only difference is API for performing operations on collections.

Let’s now invoke the test methods:

We test each collection three times. Each time we create 20 millions objects, perform test, and run GC. I performed the test on the same configuration as in part 1, you can see the results below:
Unsafe list benchmark
As we can see, insertion time is almost the same for ordinary array and for built-in list. However, inserting to unsafe list is much faster. When it comes to calculation time, unsafe list is a bit slower, probably because of additional time to extract the reference to the object.

You can test the code with this gist.

Summary

You can now compare achieved results with the blog post describing the java implementation. Our results are a bit different, but still interesting. You can compare different architectures as well.

Bonus chatter: what needs to be changed in order to get this code working on x64?