This is the first part of the .NET Inside Out series where I play with CLR internals. For your convenience you can find other parts using the links below (or by guessing the address):
Part 1 — Virtual and non-virtual calls in C#
Part 2 — Handling and rethrowing exceptions
Part 3 — How to override sealed function
Part 4 — How to override sealed function revisited
Part 5 — Capture thread creation to handle exceptions
Part 6 — Proxy handling casting
Part 7 — Generating Func from a bunch of bytes
Part 8 — Handling Stack Overflow Exception in C# with VEH
Part 9 — Generating Func from a bunch of bytes in C# revisited

Today we are going to dive into function invocation mechanism in .NET. We will use C# language to prepare few applications, then we will examine IL for these applications, finally, we will see the jitted machine code. Let’s go!

Theory

In C# we have multiple types of functions. We have static functions which we need to call using class name. We have virtual functions which we can override using inheritance. We also have instance functions which are not virtual. Syntax for invoking all these functions in C# is the same — we simply use parenthesis after the full name and we are done. However, in IL there are different instructions for calling different methods. Let’s see the difference.

Static functions

It is usually said that static functions (e.g., class functions) are called using call opcode. This is available because we are able to determine address of a function during compilation so we can hardcode the address in the IL code.

Instance functions

Non-virtual instance functions works almost the same as static functions, however, they need another thing — instance of a class for which we call the method. This is why we cannot use the same opcode — we need to verify whether the reference is null or not. In former case we need to throw appropriate exception. To call non-virtual instance functions we use callvirt opcode. This opcode is also capable of calling virtual instance functions.

Virtual functions

In order to be able to call virtual functions .NET uses dispatch table. This is the most common way of implementing this mechanism, it is also used in most C++ compilers. Basically, every type contains it’s own table of functions with pointers to actual implementation. Imagine that we have a function called Foo in base class, and we override this function in derived class. Both types will contain this method in their tables, however, the pointers to implementation will differ. Having that CLR is able to invoke function basing on the actual type, because it simply examines the dispatch table and calls function. However, this is slower than calling static function because CLR needs to extract the function address from the dispatch table.

Calling instance functions other way

In theory it is possible to call virtual function using call opcode — without checking for null. If we wouldn’t use this in method then everything should work fine.

Practice

Let’s see some xamples. I will use .NET 4.5.2 on Windows 10 x64. I will compile codes as Release with Any CPU and debug them using WinDBG x86. Let’s begin.

Static functions

Let’s start with the following code:

Nothing fancy here. We simply call a static method from other class. We also add attribute which will disable inlining. Let’s disassemble the code using ILSpy:

We can see that we indeed call the method using call opcode. Let’s now execute the app and see the generated machine code:

We load executable and we are ready to execute it. Let’s start it and let it work till the end.

We can see that our process is about to terminate. Let’s load all symbols and SOS.

We have symbols loaded. Let’s find machine code for Main function. We can do it for instance by finding assemblies:

We have our assembly. Let’s dump its method tables:

We can see that we have two interesting method tables. Let’s dump the one for Program class:

We can see that Main method is already jitted (because it was executed). Let’s dump its machine code:

We can see that we are calling method directly using call instruction and passing the hardcoded address.

Non-virtual instance function

Let’s modify the code in the following way:

We changed the method to non-virtual instance method. In our Main we create object of the class and directly call a method. Let’s see the IL:

We still use call instruction here. Let’s examine the machine code:

We can see few interesting things. First, we start by calling constructor for the object. Next, we store this reference in ecx register. Finally, we call method directly using hardcoded address.

This might look a bit strange since in theory we should use callvirt opcode. Let’s modify code a bit:

We simply store instance in a variable. Let’s see the IL:

And now we can see that we are indeed using callvirt instruction. Interesting! Let’s see the machine code:

As we can see, the machine code is exactly the same. There is no null check, so in this situation both call and callvirt instructions were jitted to the same code. Let’s modify the program a little more:

We do almost the same, however, we pass object to another function and then we call instance function. The IL is as follows:

And the machine code:

We create an object, put it in the register and call a method. Let’s move on:

And here we have what we wanted to see. First, we prepare a stack frame by storing ebp register. Next, we compare ecx register and perform a null check. Finally, we call a method directly using hardcoded address. Next, we can see a cleanup and exit instruction.

How does null check work?

You might ask what is going on. I said that there is a null check, however, there is neither branch instruction nor null handler. Let’s see the instruction:

Here we compare a register to some extracted value. cmp instruction sets CPU flags so we can later perform conditional jumps based on them. However, in our listing we simply ignore the comparison result so how does it work?
First, let’s assume that we passed a correct reference. We try to compare ecx (which has correct value) with dword ptr [ecx]. The latter tries to dereference the pointer and since it is valid, it extracts some value. We then perform a comparison and store flags in the CPU.
However, imagine that ecx is a null reference (which means that it is equal to zero). If we try to dereference it, we will try to read something from the zero address. Since this is a null pointer memory partition, we will be blocked by the MMU and there will be a hardware interrupt. CLR will handle it and convert to NullReferenceException.
So it looks like we can safely ignore CPU flags after the comparison, because in case of having null reference the CPU will notify us about the problem. Clever — we can perform a null check using one CPU instruction.

Virtual function

Let us now call a virtual function. Let’s use this code:

We will utilize ToString method, since it is virtual in System.Object class. IL for this code:

We use callvirt instruction. Please also notice that we are calling method from System.Object and not from our class. Right now we expect to see invocation using dispatch table. Let’s check it:

And we can indeed verify that there is a dispatch table used. These three lines are doing that:

We first dereference the pointer to an object and store it in the eax register. Since .NET reference points to pointer to type descriptor, we end up with pointer to type descriptor in eax. Next, we dereference the value which is stored 40 bytes after the beginning of the type descriptor and store it in the eax register. This is an address of the method descriptor of implementation of ToString in our custom class. Finally, we call the method using the register value. So we can see that it is indeed using dynamic address instead of hardcoded one.

Dynamic call

For now we were only calling methods using ordinary mechanisms which can be checked during compilation time. However, there is also a dynamic keyword which allows us to defer the call and perform it in runtime. Let’s modify the code a bit and see how it works:

Only one change in here. We replaced the var with dynamic so now the compiler should emit code for using DLR mechanisms. Let’s decompile the code:

And indeed we can see, that calling dynamic method is much more difficult. We use things like CallSite and lots of DLR magic here.

Summary

In this post we saw how different functions are called. The actual opcode used for invocation depends on type of a method and even on a way of storing the variable which we use to call the method. However, even using different opcode might not result in different machine code since the CLR is able to perform optimizations when jitting the code.