This is the third part of the Custom memory allocation series. For your convenience you can find other parts in the table of contents in Part 1 — Allocating object on a stack

Hi! We already know how to allocate objects on a stack, we also evaluated the performance of list storing objects instead of references. In this part we are going to do two things. First, we will implement basic memory allocator which will be able to allocate any object from scratch (no copying this time). Second, we will hijack new operator in order to be able to decide where objects should be allocated. Finally, we will be able to instrument CLR where it should store almost all objects. Let’s go!

Memory management

We already know that every reference object has pointer to method table. In first part we allocated object on a stack, however, we had to have already initialized object in order to copy its method table pointer. But in order to implement custom memory allocator, we would like to be able to allocate object without using additional information. We can extract this value using the following line:
typeof(T).TypeHandle.Value
We also need to know the object size. In part two we defined the object size by hand in code, however, we can extract this value from method table. It is stored 4 bytes after the beginning of the method table, so we are able to find this value using method table pointer.

Having all this details we are able to implement basic memory allocator:

First, in lines 4-6 we declare variables: integer array where we will store our objects, offset within that array, and reference to dummy object. Next, we create array big enough to be stored in GC generation 2. We will utilize this to verify that our allocator is working pretty well.

In line 14 we start method for allocating object of any type. We first obtain handle to method table, we next allocate object, finally we return it from this function. Remember that the dummy reference is modified to point to the newly allocated object so there is no null reference problem. Also, this reference will be modified every time we allocate something.

Finally, in RawAllocate method we allocate new object. We accept one parameter: a pointer to method table. We also return one value: pointer to allocated memory. We start with calculating object size (by reading it from the method table), in line 28 we store method table pointer for newly created object, and finally (line 37) we modify the reference so now it points to brand new object. Next, we update offset variable and return result.

You can verify this code by running this sample:

Ordinary object should be in generation 0 (since it is very small and brand new), however, custom object should be in generation 2.

Hacking new

So far so good. Now we are able to allocate any object using our custom logic. We don’t need to have already allocated object, we can do one from scratch and don’t bother with other things. However, we would like to be able to allocate objects using our allocator using only new operator. In other words, we would like to modify CLR in a way that it will store elements in a place we specify.

First, let’s try do achieve that by hand. We will use WinDBG to examine CLR internals and patch them on the fly. Let’s run previous application (do not try to use F5 and debugging from VS since it will greatly modify execution environment, just run the application), attach WinDBG to it and see some internal structures.

First, let’s try to find code of our function executing allocation logic:

As we can see, we first load SOS extension. Next, we find all Program in all modules, we dump our module, and finally we get to method descriptions by going through method tables.

We can see, that in line Program.cs@132 we allocate new object. Internally it is realized by call 015830c8 (JitHelp: CORINFO_HELP_NEWSFAST). WinDBG helps us here and says that this is JitHelp method, which is used to allocate memory.

Now we can decompile the code and see what it does:

As we can see, this is small method stub. It does not have ordinary code setting stack frame, so we suspect that it is dynamically generated in runtime. You can verify it by rerunning the application, finding this code, and looking for it in modules. As far as we can see, it is not stored in any particular DLL, so we can suspect that it is written dynamically to the memory. You can try to find this code in .NET Core or Rotor sources (“open source” version of .NET Framework 2.0).

OK, so let’s see, what this function really does. Here are only my assumptions, so don’t consider them “rock solid”. However, they seem quite reasonable.
First, it loads [ecx+4] into eax. This is size of object to allocate.
Next, it retrieves variable from fs segment. This is probably metadata of recently allocated block.
In next line it adds size of the object and a value retrieved from metadata. In the next two lines it compares this value to some other value from metadata. This is probably check if object to allocate fits in memory block. If it doesn’t then we jump to function which will allocate another chunk of memory and do some magic.
If the object fits, we store size. Next, we perform arithmetic on pointers to get address to our object. Finally, we return the pointer to the object.

Now we can find our allocate function:

We can see, that RawAllocate is jitted and stored in 02cb07d0. We can now patch CLR function and jump to our code:

And we are done. Now all alocation mechanism will go through our function!

Patching code from code

We already know how to patch code using WinDBG. However, we would like to do the same in C#.

Since extracting address of CLR method for allocation might be difficult, we will extract it from code. We will use a helper function:

We define a method which only creates new object. We also add attribute which will disable optimizations for this function.

Now we can find address with the following code:

We hard-code two offsets, one for DEBUG build, one for normal build.

Finally, we patch the code by hand:

We get handle to our method, compile it with RuntimeHelpers. Next, we get address of jitted code (which is stored eight bytes after the beginning of the metadata). We get address of CLR function and calculate code for jump.

Jump instruction has five bytes: one for opcode, and four for address. We need to remember that the address is relative.

And we are done. We can now test our code with this simple program:

And the result is:

Generic allocator result.

You can find all code here: https://gist.github.com/afish/33046b6c90833c922d6d

Summary

We were able to patch CLR in order to implement custom allocation mechanism. Of course, this is just proof of concept — it is definitely not production-ready. In order to make it more reliable, we should patch other functions (yes, there are different functions for allocating arrays etc), we also would need to make it working on x64. However, this is just proof of concept.

Bonus chatter: I said that you should not run it from VS debugger. Why? What would you need to change in order to make it work?
Bonus chatter 2: What changes are required to run it on x64?
Bonus chatter 3: What about memory pinning? Will GC move our objects or not?