Coding – Random IT Utensils https://blog.adamfurmanek.pl IT, operating systems, maths, and more. Fri, 21 Nov 2025 14:38:12 +0000 en-US hourly 1 https://wordpress.org/?v=6.7.1 State Machine Executor Part 6 — Forking https://blog.adamfurmanek.pl/2025/11/21/state-machine-executor-part-6/ https://blog.adamfurmanek.pl/2025/11/21/state-machine-executor-part-6/#respond Fri, 21 Nov 2025 01:30:10 +0000 https://blog.adamfurmanek.pl/?p=5204 Continue reading State Machine Executor Part 6 — Forking]]>

This is the sixth part of the State Machine Executor series. For your convenience you can find other parts in the table of contents in State Machine Executor Part 1 — Introduction

Let’s revisit our execution function:

void Run(StateMachine machine, string initialTransition){
	string state = null;
	string currentTransition = initialTransition;
	StoreHolder store = ReadStore();
	bool wasTimedOut = false;
	do {
		if(machine.IsTimedOut(state)){
			wasTimedOut = true;
			currentTransition = "timeout-handler-transition";
		}
		bool hadException = false;
		try{
			result = machine.RunTransition(currentTransition, store);
		}catch(Exception e){
			hadException = true;
		}
		state = result.CurrentState;
		currentTransition = hadException ? "exception-handler-transition" : result.NextTransition;
		MergeAndPersist(store, result.StoreChanges);
		ExecuteActions(result.ActionsToExecute);
		if(result.Suspend) {
			break;
		}
		if(wasTimedOut){
			break;
		}
	}while(!machine.IsCompleted(state));
}

It’s quite quite already as we extended it with a support for timeouts, exception handling, state management, and actions. We’re going to make it even more complex now.

Specifically, we will focus on the following line:

result = machine.RunTransition(currentTransition, store);

This line triggers the transition and makes the state machine to execute one piece of code. In this post, we’re going to discuss how to run transitions in parallel.

Problem statement

When discussing state machines, we typically think in terms of the state machine having one “state” at a time. However, that’s not very realistic. We often need to be able to run multiple things in parallel. Some of them are contained within a single state/transition (e.g., reading multiple files in parallel instead of one by one), some are independent sub-workflows (e.g., one workflow sending emails, another one uploading files), and some are tightly connected with each other (e.g., one workflow validates something and can affect the other workflow which tries to process the query at the same time).

The problem we’re facing is: how to run multiple transitions in parallel, manage the state changes happening with those multiple transitions, and how to deal with the interconnections between transitions. Let’s explore some solutions.

Some programming models

Many programming models for workflows have been developed over the years. Let’s see some of them, without going into many details and formalities.

Bulk-synchronous-parallel / Superstep

Bulk-synchronous-parallel (BSP) model consists of a series of supersteps with synchronization and communication in between. Each superstep is a logical computation performed by a node/processor/unit. Once the computation is done, nodes exchange data and wait until all nodes reach the same step.

This model is quite easy to analyze, but is rather rigid and inflexible in its structure. It typically goes with batch-like approach in which we divide the amount of work between nodes, send it for processing, and then wait for results.

This model is very popular in state machines.

Fork & Join

In this approach, we fork the process into many copies, and each copy performs similar work on its won. It’s more flexible than BSP because we can use work stealing and sometimes we can avoid synchronization.

This model is often used in parallel processing of collections or in handling web requests.

Threading

Threading is a low-level approach to computation. Each thread is completely independent and can do whatever it wants. Threads synchronize only when needed, and there is no clear structure of their behavior.

This model is very powerful, but quite hard to analyze and reason about.

Trails as a middle ground

BSP is often used in the state machines because it can be represented easily in terms of states and transitions. We can think of it as one transition between two states that is executed many times in parallel. While the model is simple, it’s also quite inflexible as it requires that all the parallelization does the same work (but with different data).

Threads on the other hand are very flexible, but they require synchronization. Effectively, each thread must have some kind of a handle to the other threads it wants to synchronize with. Things get much more complex when those other threads fork on their own, as now the synchronization involves “group of threads” which are often represented as jobs or with a parent-child relationship.

To keep the flexibility of threads but without the rigidness of BSP, we can introduce something in between – namely a trail.

Trail structure

Trail is like a thread but it doesn’t “synchronize on its own”. It only states the “requirements” for it to continue and the platform takes care of making sure the requirements are met. We can think of trails as of threads and named mutexes managed by the platform.

A trail is an object with the following properties:

class Trail {
    string Name;
    string CurrentState;
    string NextTransition;
    string[] BlockingTrails;
}

Name is used to synchronized with other trails. CurrentState and NextTransition simply indicate where the trail is and what it’s going to do next. BlockingTrails is a collection of other trails that need to complete first before the current trail can move on.

When starting a state machine from scratch, we simply have one trail with the initial state and transition. It can have any name.

To implement trail spawning, we extend the result of a transition to have a list of trails to continue with:

class TransitionResult {
    Trail[] NextTrails;
}

This way, one trail can fork into many sub-trails. Each sub-trail is independent and can do whatever it wants.

To implement joins, we simply deduplicate trails based on their name. We also assume that if duplicates appear, they must be in the same state and transition.

Let’s see how to use them.

Parallel collection processing

Let’s say that we want to process a collection of elements in parallel.

We start with the initial state that splits the work:

Trail initial = new Trail(Name="work"...);

When called, the transition splits the work and returns one trail for each element:

return new TransitionResult{
   NextTrails = new {
       new Trail(Name="work.1", ...),
       new Trail(Name="work.2", ...),
       new Trail(Name="work.3", ...),
       ...
       new Trail(Name="work.n", ...),
   }
}

All these trails are executed by the platform, hopefully in a parallel manner. Now, they need to synchronize at the very end, so each of those worker trails returns the same result of the final transition;

return new TransitionResult {
    NextTrails = new {
        new Trail(Name="work", BlockingTrails= new { "work." }, ...)
    ]
}

Now, the platform needs to do the following:

  1. The platform needs to deduplicate all the trails. Each worker trail returns trails with the same name (work), so the platform knows what to do.
  2. The platform needs to wait until all the worker trails finish processing items. This is achieved thanks to the BlockingTrails set to work. (notice the dot at the end). The platform needs to wait until all trails with names starting with work. finish the work, and then it can proceed with the new deduplicated trail

This way, we can achieve typical parallel collection processing.

Running child state machines

Running child state machines is quite straightforward. Let’s say that we want to continue execution and also start something completely independent on the side. We simply return:

return new TransitionResult{
   NextTrails = new {
       new Trail(Name="work", ...),
       new Trail(Name="some_side_work", ...)
   }
}

At some point, the side work completes. It simply indicates that it reached the end by returning empty NextTrails collection.

Summary

Trails provide a flexible approach without the complex overhead of manual synchronization. They are more flexible than BSP which is crucial when we want to run independent child state machines.

]]>
https://blog.adamfurmanek.pl/2025/11/21/state-machine-executor-part-6/feed/ 0
Non-atomic assignments in Python https://blog.adamfurmanek.pl/2025/11/20/non-atomic-assignments-in-python/ https://blog.adamfurmanek.pl/2025/11/20/non-atomic-assignments-in-python/#respond Thu, 20 Nov 2025 12:26:12 +0000 https://blog.adamfurmanek.pl/?p=5202 Continue reading Non-atomic assignments in Python]]> It’s not a hidden knowledge that many assignments are not atomic and we can face the word tearing. They are mostly related to CPU word length, synchronization, concurrency, etc.

However, things can be much worse when we’re dealing with interpreted languages or languages with no strict schema. In these languages, a “regular” assignment can also be non-atomic.

Let’s take Python. In Python, every object can be considered a dictionary of fields. This means that a single assignment may result in expanding the forementioned dictionary which may cause issues for some other thread. Let’s see an example:

import jsonpickle
from concurrent.futures import ThreadPoolExecutor
 
threads = 40
iterations = 1000
promises = []
 
class Sample:
	def __init__(self):
		self.big_property = [x for x in range(100000)]
 
 
def serializer(s):
	jsonpickle.dumps(s)
 
def result_setter(s):
	s.abc = "abc"
 
with ThreadPoolExecutor(max_workers=threads) as executor:
	for x in range(iterations):
		s = Sample()
		promises.append(executor.submit(result_setter, s))
		promises.append(executor.submit(serializer, s))
 
for promise in promises:
	promise.result()

We have a Sample class that has one field initially, namely big_property.

We have two different types of tasks: the first one serializer uses the jsonpickle library to serialize an object to a JSON string. The second task result_setter sets a field on the object. We then run sufficiently many tasks to observe the issue.

If we’re unlucky enough, we’ll hit the following race condition: the first task starts serializing the object, then the first task is paused and the second task kicks in. The second tasks sets a field on the object. Normally, we could think this assignment is “atomic” as we only set a reference into a field. However, since the Python object is a dictionary of fields, we need to add new entry to the dictionary. Once the first task is resumed, it throws the following error:

Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "/usr/lib/python3.11/concurrent/futures/_base.py", line 449, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/usr/lib/python3.11/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<stdin>", line 2, in serializer
  File "/.venv/lib/python3.11/site-packages/jsonpickle/pickler.py", line 166, in encode
    context.flatten(value, reset=reset), indent=indent, separators=separators
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/.venv/lib/python3.11/site-packages/jsonpickle/pickler.py", line 366, in flatten
    return self._flatten(obj)
           ^^^^^^^^^^^^^^^^^^
  File "/.venv/lib/python3.11/site-packages/jsonpickle/pickler.py", line 326, in _flatten
    result = self._flatten_impl(obj)
             ^^^^^^^^^^^^^^^^^^^^^^^
  File "/.venv/lib/python3.11/site-packages/jsonpickle/pickler.py", line 386, in _flatten_impl
    return self._pop(self._flatten_obj(obj))
                     ^^^^^^^^^^^^^^^^^^^^^^
  File "/.venv/lib/python3.11/site-packages/jsonpickle/pickler.py", line 419, in _flatten_obj
    raise e
  File "/.venv/lib/python3.11/site-packages/jsonpickle/pickler.py", line 413, in _flatten_obj
    return flatten_func(obj)
           ^^^^^^^^^^^^^^^^^
  File "/.venv/lib/python3.11/site-packages/jsonpickle/pickler.py", line 716, in _ref_obj_instance
    return self._flatten_obj_instance(obj)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/.venv/lib/python3.11/site-packages/jsonpickle/pickler.py", line 697, in _flatten_obj_instance
    return self._flatten_dict_obj(obj.__dict__, data, exclude=exclude)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/.venv/lib/python3.11/site-packages/jsonpickle/pickler.py", line 794, in _flatten_dict_obj
    for k, v in util.items(obj, exclude=exclude):
  File "/.venv/lib/python3.11/site-packages/jsonpickle/util.py", line 584, in items
    for k, v in obj.items():
RuntimeError: dictionary changed size during iteration

We can see the dictionary of the fields changed the size because of the assignment. This would not be the case if we initialized the field in the constructor (i.e., if we didn’t need to add a new field to the object but to modify an existing one).

]]>
https://blog.adamfurmanek.pl/2025/11/20/non-atomic-assignments-in-python/feed/ 0
State Machine Executor Part 5 — Streaming https://blog.adamfurmanek.pl/2025/10/24/state-machine-executor-part-5/ https://blog.adamfurmanek.pl/2025/10/24/state-machine-executor-part-5/#respond Fri, 24 Oct 2025 14:55:52 +0000 https://blog.adamfurmanek.pl/?p=5199 Continue reading State Machine Executor Part 5 — Streaming]]>

This is the fifth part of the State Machine Executor series. For your convenience you can find other parts in the table of contents in State Machine Executor Part 1 — Introduction

Being able to describe side effects instead of executing them may sound great, but it has one significant drawback – the side effects need to be described completely before they are handed off to the executor. Building an object describing the action may cause significant memory usage. Let’s see how to fix that.

Streaming

We’d like to be able to stream the data. Let’s say that we have the following Action describing web request to execute:

class HttpAction {
	public string Url;
	public string Method;
	public byte[] Body;
}

See the Body field. It holds the entire payload to be sent. Creating such a payload and storing it in memory will increase the memory usage and decrease scalability. To avoid that, we should have something like this:

class HttpAction {
	public string Url;
	public string Method;
	public Stream<byte> Body;
}

Looks great, but it doesn’t solve any problem. Remember that the state machine must create the action object and hand it over to the executor. The state machine won’t be able to run any code until the action is executed. This means that the Stream must be filled with the data, so we still have the problem with high memory usage.

Instead of passing the stream, we could pass a stream generator. That could be a lambda or some other interface with yield keyword:

class HttpAction {
	public string Url;
	public string Method;
	public IEnumerable<byte> Body;
}

Looks slightly better, but still has issues. If Body wraps any local variables into a closure, then the memory will not be released until the stream is read. Not to mention that it’s much harder to persist the HttpAction object to provide reliability.

Solution

To solve the problem, we need to effectively stream the data. However, since the actions are executed after the state machine is done, we need to stream the data somewhere else – to a local file.

The executor can provide the following abstraction:

class Env{
	public FileWrapper CreateFile();
	public FileWrapper ReadFile(string identifier);
}

class FileWrapper {
	public string Identifier;
	public File FileHandle;
	public void Commit();
}

Now, the state machine can call CreateFile to get a temporary file. Next, the state machine can stream the content to the file. Finally, the state machine calls Commit to indicate to the executor that the file is ready to be persisted. The executor can then upload the file to the persistent store.

Last but not least, we need to modify the action definition:

class HttpAction {
	public string Url;
	public string Method;
	public string BodyFileIdentifier;
}

The action executor can now stream the body from the file. If something fails, the file can be retrieved from the persistent storage and the action can be retried.

This solution is not perfect, though. The data is streamed twice which slows everything down. That’s an obvious trade-off.

]]>
https://blog.adamfurmanek.pl/2025/10/24/state-machine-executor-part-5/feed/ 0
State Machine Executor Part 4 — Timeouts, exceptions, suspending https://blog.adamfurmanek.pl/2025/10/21/state-machine-executor-part-4/ https://blog.adamfurmanek.pl/2025/10/21/state-machine-executor-part-4/#respond Tue, 21 Oct 2025 14:31:14 +0000 https://blog.adamfurmanek.pl/?p=5195 Continue reading State Machine Executor Part 4 — Timeouts, exceptions, suspending]]>

This is the fourth part of the State Machine Executor series. For your convenience you can find other parts in the table of contents in State Machine Executor Part 1 — Introduction

Let’s discuss how to improve reliability of our state machines.

How machines are executed

In part 1, we defined the contract for triggering a single transition. Each transition returns instructions what actions to execute and what transition to call next. We then run in a loop until the state machine is completed.

We can modify this mechanism to deal with crashes, errors, and other undersired effects. Let’s revisit the loop that we defined in part 2:

void Run(StateMachine machine, string initialTransition){
	string state = null;
	string currentTransition = initialTransition;
	StoreHolder store = ReadStore();
	do {
		result = machine.RunTransition(currentTransition, store);
		state = result.CurrentState;
		currentTransition = result.NextTransition;
		MergeAndPersist(store, result.StoreChanges);
		ExecuteActions(result.ActionsToExecute);
	}while(!machine.IsCompleted(state));
}

We read the store before entering the loop. In each loop iteration, we pass the store to the transition, and then update the state and execute actions. We’re now going to modify this solution.

Suspending

The first thing to support is suspension of the state machine. If the machine decides that it needs to wait, it can indicate that in the TransitionResult:

class TransitionResult {
    ....
    bool Suspend;
}

We can now include that in the loop handling:

void Run(StateMachine machine, string initialTransition){
	string state = null;
	string currentTransition = initialTransition;
	StoreHolder store = ReadStore();
	do {
		result = machine.RunTransition(currentTransition, store);
		state = result.CurrentState;
		currentTransition = result.NextTransition;
		MergeAndPersist(store, result.StoreChanges);
		ExecuteActions(result.ActionsToExecute);
		if(result.Suspend) {
			break;
		}
	}while(!machine.IsCompleted(state));
}

We can then proceed with the state machine when the time comes. We can obviously extend that to support sleep or waiting for some condition.

Exceptions

We need to handle unexpected crashes as well. We simply catch the exception and then we need to let the state machine know it happened. We can do that by redirecting the state machine to a well-known transition:

void Run(StateMachine machine, string initialTransition){
	string state = null;
	string currentTransition = initialTransition;
	StoreHolder store = ReadStore();
	do {
		bool hadException = false;
		try{
			result = machine.RunTransition(currentTransition, store);
		}catch(Exception e){
			hadException = true;
		}
		state = result.CurrentState;
		currentTransition = hadException ? "exception-handler-transition" : result.NextTransition;
		MergeAndPersist(store, result.StoreChanges);
		ExecuteActions(result.ActionsToExecute);
		if(result.Suspend) {
			break;
		}
	}while(!machine.IsCompleted(state));
}

We can obviously extend that to give access to the exception or add any additional details.

Timeouts

We would also like to terminate the state machine if it runs for tool long. There are two ways to do that: we can terminate it the hard way by interrupting the thread (in a preemtive way), or we can wait for it to complete the transition (in a cooperative way). No matter what happens, we may want to redirect the state machine to a well-known transition for handling timeouts:

void Run(StateMachine machine, string initialTransition){
	string state = null;
	string currentTransition = initialTransition;
	StoreHolder store = ReadStore();
	bool wasTimedOut = false;
	do {
		if(machine.IsTimedOut(state)){
			wasTimedOut = true;
			currentTransition = "timeout-handler-transition";
		}
		bool hadException = false;
		try{
			result = machine.RunTransition(currentTransition, store);
		}catch(Exception e){
			hadException = true;
		}
		state = result.CurrentState;
		currentTransition = hadException ? "exception-handler-transition" : result.NextTransition;
		MergeAndPersist(store, result.StoreChanges);
		ExecuteActions(result.ActionsToExecute);
		if(result.Suspend) {
			break;
		}
		if(wasTimedOut){
			break;
		}
	}while(!machine.IsCompleted(state));
}

Notice that we stop processing after the timeout transition. Had we not do that, we would run in an infinite loop. If you don’t want to terminate the processing, then make sure you don’t run into rerouting the state machine constantly.

Summary

Next time, we’re going to see how to deal with data streaming and why it’s needed.

]]>
https://blog.adamfurmanek.pl/2025/10/21/state-machine-executor-part-4/feed/ 0
State Machine Executor Part 3 — Actions and history https://blog.adamfurmanek.pl/2025/10/14/state-machine-executor-part-3/ https://blog.adamfurmanek.pl/2025/10/14/state-machine-executor-part-3/#respond Tue, 14 Oct 2025 12:54:39 +0000 https://blog.adamfurmanek.pl/?p=5192 Continue reading State Machine Executor Part 3 — Actions and history]]>

This is the third part of the State Machine Executor series. For your convenience you can find other parts in the table of contents in State Machine Executor Part 1 — Introduction

Our state machines can execute side-effectful actions. But how do they read results?

One approach is to write the result back to the StoreHolder we designed last time. After executing an action, the executor would write the result back as a property specified by the state machine. This works but is much more complex than just one property. What about retries? What about exceptions? What if the property is already there?

Another approach is to keep the list of all executed actions in some kind of even store. Executing an action would generate a new event indicating that the action has been executed. The state machine would then look check the events and act accordingly. If we need to retry the action, we can simply model that as a yet another event. If we have an exception, then it’s another event. And so on.

Effectively, we can model that in the following way:

class StoreHolder {
	...
	IList<Event> Events;
}

We can have an event indicating the result of an action:

class ActionExecuted<T> {
	Action ExecutedAction;
	T Result;
	Exception? Exception;
}

We can add many more properties to indicate what exactly happened. We may also consider adding unique identifiers to events, order them based on the timestamps, etc.

Finaly, the state machine can simply traverse the list and find the events it needs.

There is more. Since this is a very generic mechanism, we can also add any sort of communication between the executor and the state machine. For instance, you can initialize the store with some events representing the initial input.

]]>
https://blog.adamfurmanek.pl/2025/10/14/state-machine-executor-part-3/feed/ 0
State Machine Executor Part 2 — Fault tolerance https://blog.adamfurmanek.pl/2025/10/13/state-machine-executor-part-2/ https://blog.adamfurmanek.pl/2025/10/13/state-machine-executor-part-2/#respond Mon, 13 Oct 2025 08:05:00 +0000 https://blog.adamfurmanek.pl/?p=5184 Continue reading State Machine Executor Part 2 — Fault tolerance]]>

This is the second part of the State Machine Executor series. For your convenience you can find other parts in the table of contents in State Machine Executor Part 1 — Introduction

The code we implemented in the last part is unable to recover from machine crashes. If the process dies midway, we need to start it from scratch. Let’s fix that.

Before going into details, let’s think how we could be triggering the state machine. We could run it on an API call – someone calls the endpoint, we start processing the request, and we trigger the execution along the way. If something dies, the caller will probably retry their call. Another approach is to use a queue. We receive a message from the queue, we start the processing, and we trigger the state machine. If something breaks, the message will get retried. Other scenarios may be similar.

In all of those scenarios, we get a retry due to some other mechanisms. Once we retry, we want to resume the state machine processing. This is very simple conceptually. We just need to recreate the state machine and retrigger the transition. Let’s do that.

State management

The hard part in retrying this way is recovering of the state. The state machine is most likely stateful and calculates something as it goes through the states. We can tackle this in many ways: preserve the whole state machine, provide an interface to read and write data that the state machine would use, or provide a temporary object.

Preserving the state machine in its entirety may be possible, but has many drawbacks. First, we may be unable to serialize the object as we don’t even know what it consists of (it may be loaded dynamically and not owned by us). Second, some objects may be not serializable by definition (like locks, things tied to OS data like threads, etc.). Third, this may impose technological limits (like the programming language you use etc.).

Another approach is to have an interface for the state machine to read and write some pieces of information. For instance, the state machine executor could expose a simple key-value store for the data. Each read and write would be effectively handled by the state machine executor. While this is quite easy, it lacks transactions interleaved with other side effects.

Another approach is a simple dictionary that the state machine can use. This lets the state machine effectively couple the transaction with other side effects. The state machine executor can persist both the changes to the dictionary and the description of the actions in one transaction.

Let’s take this last approach and see how it works. We now would like to have the following object for keeping the changes:

class StoreHolder {
	Dictionary<string, object> Store;
}

Now, the state machine needs to describe modifications to this store:

class TransitionResult {
	...
	Dictionary<string, object> StoreChanges;
}

Also, the state machine executor needs to pass this object to the state machine:

class StateMachine {
	...
	TransitionResult RunTransition(string transitionName, StoreHolder store) {...}
}

Finally, this is how we execute the state machine now:

void Run(StateMachine machine, string initialTransition){
	string state = null;
	string currentTransition = initialTransition;
	StoreHolder store = ReadStore();
	do {
		result = machine.RunTransition(currentTransition, store);
		state = result.CurrentState;
		currentTransition = result.NextTransition;
		MergeAndPersist(store, result.StoreChanges);
		ExecuteActions(result.ActionsToExecute);
	}while(!machine.IsCompleted(state));
}

Looks nice. Let’s see what problems we may have with this approach.

Persisting the store

Let’s now see some pros and cons of this approach.

By persisting the store at once, we can easily identify if there are two state machines executing at the same time. This would result in concurrent writes which we can find by using the versions or locks.

By saving the changes after the state machine finishes the transition, we can have the outbox behavior. We persist the store changes and the information what actions to execute. This way, when we can retry the actions in case of crashes. We’ll see that in details in the next part.

This approach is also technology-independent. It’s easy to serialize the key-value dictionary in any technology. However, if the state machine decides to put some complex objects in the store, they need to be serializable and deserializable. Also, they need to be backwards compatible when the state machine code changes. Let’s explore that a little more.

Let’s say that the state machine preserves something like Store["property"] = someObject. If the state machine executor would like to serialize the dictionary now, the someObject value must be serializable. While this sounds trivial, this is often not the case. For instance, many types in Python are not serializable by the built-in solutions like json package. Similarly, objects in Java must implement the Serializable interface or adhere to the requirements of the serialization library. While this is not a big issue, this puts some requirements on the state machine.

Much bigger issues may happen when deserializing the value. First, it may be impossible to deserialize the someObject value due to lack of parameterless constructor or other library requirements. This is not a rare issue.

Worse, we now need to deal with backward and forward compatibility. Let’s say that the state machine is paused and then resumed on some other node. This can be due to a retry or rolling deployment. When the execution is retried, it may happen on either newer or older code version. This means that the store must be deserialized using a different code. If you use a binary serializer, this will most likely cause problems. The same issue may happen if the newer code would like to examine the store written by some older version of the code, like some other state machine execution.

The easiest solution to this problem is to avoid storing complex object entirely. This simplifies the serialization and the deserialization process. However, it doesn’t solve the issue with schema changes and compatibility.

If you need to store complex objects and still want to access stores created by the older state machines, it may be beneficial to store two versions of the store. One version is serialized using a binary serializer that can serialize and deserialize objects of any kind. The other version is stored using some regular JSON serializer that can only serialize the data but can’t deserialize it into complex objects. You would then examine this JSON data as raw JSON objects.

]]>
https://blog.adamfurmanek.pl/2025/10/13/state-machine-executor-part-2/feed/ 0
State Machine Executor Part 1 — Introduction https://blog.adamfurmanek.pl/2025/10/12/state-machine-executor-part-1/ https://blog.adamfurmanek.pl/2025/10/12/state-machine-executor-part-1/#respond Sun, 12 Oct 2025 16:59:20 +0000 https://blog.adamfurmanek.pl/?p=5180 Continue reading State Machine Executor Part 1 — Introduction]]>

This is the first part of the State Machine Executor series. For your convenience you can find other parts using the links below:
Part 1 — Introduction
Part 2 — Fault tolerance
Part 3 — Actions and history
Part 4 — Timeouts, exceptions, suspending
Part 5 — Streaming
Part 6 — Forking

In this series we explore how to decouple describing “what to do” from actually doing it. This means that instead of doing Console.WriteLine("Hello world!") we just describe that we want the Hello world! to be printed out to the standard output.

Introducing such an abstraction if very beneficial. If we only describe what to do, we get the following:

  • The business code doesn’t need to worry about actual implementation details but focus on the business part only
  • We can change implementation of “how things are done” without affecting the business code
  • It’s easier to add additional layers (e.g., monitoring, logging, scaling) without changing the business code
  • We can postpone the actual materialization of the side effects
  • We get history and auditing for free, just by checking the description
  • Testing is much easier as we don’t need to mock anything or deal with side effects being executed
  • It’s much easier to inspect what will happen. We can also change the description if needed

It’s actually yet another form of dependency inversion and introducing higher-level APIs for lower-level operations. However, with each generalization, there comes a price of having to adhere to a specific framework of thought.

Conceptual implementation

Let’s start with some pseudocode describing what we want to do. We would like to have a framework for executing finite state machines. The state machine consists of states and transitions between the state. Importantly, whenever the state machine wants to execute a side-effectful operation, it needs to describe them and ask the framework to get them done.

Conceptually, we have the following:

class StateMachine {
	TransitionResult RunTransition(string transitionName) {...}
	bool IsCompleted(string state) {...}
}

The state machine supports running a transition, and can report whether a given state is the terminal one or not. Each TransitionResult describes the following:

class TransitionResult {
	string CurrentState;
	List<Action> ActionsToExecute;
	string NextTransition;
}

We see that after running a transition, we get the new state that the machine is in, the list of actions to execute, and the name of the next transition that the state machine would like to run.

Finally, we have the following execution logic:

void Run(StateMachine machine, string initialTransition){
	string state = null;
	string currentTransition = initialTransition;
	do {
		result = machine.RunTransition(currentTransition);
		state = result.CurrentState;
		currentTransition = result.NextTransition;
		ExecuteActions(result.ActionsToExecute);
	}while(!machine.IsCompleted(state));
}

We take the state machine and the initial transition, and then we loop until completed. Nothing fancy here.

There are few missing blocks that we will need to provide based on our needs:

  • How are actions described? Ideally we’d like to have strongly typed objects and be able to change how the actions are executed. You may need a registery of action executors that can handle specific action types. We can also replace these executors dynamically and change them without touching the business code.
  • How is the state machine created? You may need a factory that will create the instance based on some input or whatever.
  • How are actions executed? Simple for loop will be a good start, but you may also run them in parallel or even scale-out. Again, we can change that without touching the business code.

At first glance, this approach may look great. It gives many benefits in terms of testing and code organization, we can optimize many aspects without touching the business code, and this can be adapted to various program flows.

However, this programming model is very limiting, which is not obvious initially. In the next parts, we’ll explore various aspects and see how to tackle them.

]]>
https://blog.adamfurmanek.pl/2025/10/12/state-machine-executor-part-1/feed/ 0
Bit Twiddling Part 7 — Change the order of the screens for RDP https://blog.adamfurmanek.pl/2025/08/28/bit-twiddling-part-7/ https://blog.adamfurmanek.pl/2025/08/28/bit-twiddling-part-7/#respond Thu, 28 Aug 2025 07:36:55 +0000 https://blog.adamfurmanek.pl/?p=5152 Continue reading Bit Twiddling Part 7 — Change the order of the screens for RDP]]>

This is the seventh part of the Bit Twiddling series. For your convenience you can find other parts in the table of contents in Par 1 — Modifying Android application on a binary level

Today we’re going to tackle the problem of keeping windows in the same place when connecting over RDP. In the other blog post I said, that it’s not possible to control the order of the screens enumerated by mstsc /l programmatically, and that we have to move the screens by plugging them differently. Let’s actually change that with some low-level hacks.

What is the problem again

Let’s say, that I have the following monitors reported by mstsc /l:

We can see that I have 4 monitors:

  • Monitor 1 is in the center. It’s reported as id 0 in the MSTSC Setup
  • Monitor 2 is on the left. It’s reported as id 3 in the MSTSC Setup
  • Monitor 3 is on the right. It’s reported as id 4 in the MSTSC Setup
  • Monitor 4 is at the bottom. It’s reported as id 30 in the MSTSC Setup

Once I log in to the remote machine, this is what the Display in RDP shows:

Looks good. Most importantly, if I open a window on the Monitor 1 (the one in the center), then it is shown like this:

Important part in this picture is that the notepad is in the center. You don’t need to zoom in to see any other details.

Now, let’s say that I connect another display and make it duplicate the Monitor 1. Windows may decide that the order of the devices changed, and now mstsc /l shows this:

Notice that the Monitor 1 in the center changed its id to 5.

Let’s connect to the RDP server again. This is what Display in RDP shows:

Notice that the monitors changed their numbering. First monitor is now on the left. What’s worse, the windows have moved:

You can see that the notepad moved to the left. It didn’t move actually, it is still on the same first monitor. It’s that the monitor changed its position.

What’s worse, even if I add selectedmonitors:s:5,3,4,30 to the .rdp file, the problem is still there.

Why does it happen? You can read more in this article where I describe it in details. But now, let’s try to fix it.

How it works

We can use API Monitor or other strace-like application to figure out what happens. mstsc uses EnumDisplayDevicesW function to find all the devices.

The API accepts a parameter with the device id (that is the second parameter), and the pointer to the structure where the details will be stored (the third parameter). Most importantly, it returns true or false indicating whether a monitor with given device id exists.

mstsc simply runs in a loop like this:

for(int i=0;;++i){
    if(!EnumDisplayDevicesW(null, i, pointer, flags)){
         break;
    }
}

mstsc iterates over the devices starting from zero until there are no more devices reported by the API. This explains why numbering in mstsc /l is not continuous and why the numbers may change any time.

How can we fix that? We need to hijack the method, check the second parameter, and override it accordingly. With the screenshots above, the loop runs like this:

id = 0 => some virtual screen and returns true
id = 1 => some virtual screen and returns true
id = 2 => some virtual screen and returns true
id = 3 => Monitor 2 and returns true
id = 4 => Monitor 3 and returns true
id = 5 => Monitor 1 and returns true
...
id = 30 => Monitor 4 and returns true
...
id = something big => returns false

We would like it to effectively do something like this:

id = 0 => changes id to 5 => Monitor 1 and returns true
id = 1 => changes id to 3 => Monitor 2 and returns true
id = 2 => changes id to 4 => Monitor 3 and returns true
id = 3 => changes id to 30 => Monitor 4 and returns true
id = 4 => returns false

Let’s do it.

Implementation

As with other dirty hacks, we are going to use the debugger to inject the payload and do some memory operations.

The plan is as follows:

  • We allocate some memory on the side
  • We add a payload call_and_return_false that calls the just_call payload and returns false afterwards
  • We add a payload call_and_return_true that calls the just_call payload and returns true afterwards
  • We add a payload just_call that restores proper registers and then just calls the original EnumDisplayDevicesW
  • We find the regular EnumDisplayDevicesW implementation and do the long jump to our main payload
  • The main payload checks the second parameter (which is in the rdx register). If the parameter should be changed, it is modified and then we jump to the call_and_return_true or call_and_return_false payload accordingly. Otherwise, it jumps to the just_call payload.

Let’s see the payloads in action. First, let’s examine the original EnumDisplayDevicesW method:

u USER32!EnumDisplayDevicesW
0:000> u USER32!EnumDisplayDevicesW
USER32!EnumDisplayDevicesW:
00007ffa`0cd01240 4053            push    rbx
00007ffa`0cd01242 55              push    rbp
00007ffa`0cd01243 56              push    rsi
00007ffa`0cd01244 57              push    rdi
00007ffa`0cd01245 4156            push    r14
00007ffa`0cd01247 4157            push    r15
00007ffa`0cd01249 4881ec98030000  sub     rsp,398h
00007ffa`0cd01250 488b0529e60700  mov     rax,qword ptr [USER32!_security_cookie (00007ffa`0cd7f880)]

Nothing special here. It simply preserves the registers. We’ll need to override this with a long jump to our payload, like this:

mov rax, payload_address
push rax
ret
nop
nop
nop
nop

This way, we override the first 16 bytes.

Okay, let’s now see the main payload:

cmp rdx, monitor_id_1
jne 0x13
mov rdx, overriden_id_1
mov rax, call_and_return_false or call_and_return_true
push rax
ret
cmp rdx, monitor_id_2
jne 0x13
mov rdx, overriden_id_2
mov rax, call_and_return_false or call_and_return_true
push rax
ret
...
cmp rdx, monitor_id_n
jne 0x13
mov rdx, overriden_id_n
mov rax, call_and_return_false or call_and_return_true
push rax
ret
mov rax, just_call
push rax
ret

We have a series of instructions like this:

if(second_parameter == monitor_id_1){
    second_parameter = overriden_id_1
    jump call_and_return_false or call_and_return_true
}
if(second_parameter == monitor_id_2){
    second_parameter = overriden_id_2
    jump call_and_return_false or call_and_return_true
}
...
if(second_parameter == monitor_id_n){
    second_parameter = overriden_id_n
    jump call_and_return_false or call_and_return_true
}
jump just_call

We compare each monitor in a sequence, override the parameter if needed, and jump accordingly.

Now, let’s see the payload for call_and_return_false

movabs rax, call_and_return_false+0x13
push rax
movabs rax, just_call
push rax
ret
mov rax, 0
ret

This conceptually looks like this:

put after_call address on the stack
jump just_call

:after_call
return 0

We do the same for call_and_return_true and just return different value.

The payload for just_call looks like this:

push    rbx
push    rbp
push    rsi
push    rdi
push    r14
push    r15
sub     rsp,398h
movabs rax, EnumDisplayDevicesW+0x16
push rax
ret

We simply run the preamble of the WinAPI function (which we scratched by introducing long jump over there), and then we jump th the WinAPI function in the correct place.

Automation

Let’s now see some sample C# code that does all of that automatically:

public static void Patch(int id){
		Console.WriteLine(id);
		var addresses = RunCbd(id, @"
.sympath srv*C:\tmp*http://msdl.microsoft.com/download/symbols
.reload
.dvalloc 3000
.dvalloc 4000
.dvalloc 5000
.dvalloc 6000
u USER32!EnumDisplayDevicesW
qd
		");
		
		Func<string, IEnumerable<string>> splitInPairs = address => address.Where((c, i) => i % 2 == 0).Zip(address.Where((c, i) => i % 2 == 1), (first, second) => first.ToString() + second.ToString());			
		Func<string, string> translateToBytes = address => string.Join(" ", splitInPairs(address.Replace(((char)96).ToString(), "").PadLeft(16, '0')).Reverse().Select(p => "0x" + p));
		
		var pattern = "0,0,1-1,3,1-2,5,1-3,30,1-4,0,0";
	
		var freeMemoryForEnumDisplayReturnFalse = addresses.Where(o => o.Contains("Allocated 3000 bytes starting at")).First().Split(' ').Last().Trim();
		var freeMemoryForEnumDisplayReturnTrue = addresses.Where(o => o.Contains("Allocated 4000 bytes starting at")).First().Split(' ').Last().Trim();
		var freeMemoryForEnumDisplayJustCall = addresses.Where(o => o.Contains("Allocated 5000 bytes starting at")).First().Split(' ').Last().Trim();
		var freeMemoryForEnumDisplay = addresses.Where(o => o.Contains("Allocated 6000 bytes starting at")).First().Split(' ').Last().Trim();
		var enumDisplayAddress = addresses.SkipWhile(o => !o.StartsWith("USER32!EnumDisplayDevicesW:"))
				.Skip(1)
				.First().Split(' ').First().Trim();
		
		var enumAfterPayloadReturnAddress = (Convert.ToUInt64(enumDisplayAddress.Replace(((char)96).ToString(),""), 16) + 0x10).ToString("X").Replace("0x", "");		
		var freeMemoryForEnumDisplayReturnFalseAfterCall = (Convert.ToUInt64(freeMemoryForEnumDisplayReturnFalse.Replace(((char)96).ToString(),""), 16) + 23).ToString("X").Replace("0x", "");
		var freeMemoryForEnumDisplayReturnTrueAfterCall = (Convert.ToUInt64(freeMemoryForEnumDisplayReturnTrue.Replace(((char)96).ToString(),""), 16) + 23).ToString("X").Replace("0x", "");
		
		var patternInstruction = "";
		
		if(pattern != ""){
			foreach(var part in pattern.Split('-')){
			var sourceMonitor = int.Parse(part.Split(',')[0]).ToString("X");
			var destinationMonitor = int.Parse(part.Split(',')[1]).ToString("X");
				var returnValue = part.Split(',')[2];

				patternInstruction += @" 0x48 0x83 0xFA " + sourceMonitor + @" ";
				patternInstruction += @" 0x75 13 "; // jump short 13 bytes
				patternInstruction += @" 0x48 0xC7 0xC2 " + destinationMonitor + @" 0x00 0x00 0x00 ";
				patternInstruction += @" 0x48 0xB8 " + translateToBytes(returnValue == "1" ? freeMemoryForEnumDisplayReturnTrue : freeMemoryForEnumDisplayReturnFalse) + @" 0x50 0xC3 ";
			}
		}
		
		var patchEnumDisplayScript = @"
.sympath srv*C:\tmp*http://msdl.microsoft.com/download/symbols
.reload
e	" + enumDisplayAddress + @"	0x48 0xB8 " + translateToBytes(freeMemoryForEnumDisplay) + @" 0x50 0xC3 0x90 0x90 0x90 0x90
e	" + freeMemoryForEnumDisplayReturnFalse + @" 0x48 0xB8 " + translateToBytes(freeMemoryForEnumDisplayReturnFalseAfterCall) + @" 0x50 0x48 0xB8 " + translateToBytes(freeMemoryForEnumDisplayJustCall) + @" 0x50 0xC3 0x48 0xC7 0xC0 0x00 0x00 0x00 0x00 0xC3
e	" + freeMemoryForEnumDisplayReturnTrue + @"	0x48 0xB8 " + translateToBytes(freeMemoryForEnumDisplayReturnTrueAfterCall) + @" 0x50 0x48 0xB8 " + translateToBytes(freeMemoryForEnumDisplayJustCall) + @" 0x50 0xC3 0x48 0xC7 0xC0 0x01 0x00 0x00 0x00 0xC3
e	" + freeMemoryForEnumDisplayJustCall + @" 0x53 0x55 0x56 0x57 0x41 0x56 0x41 0x57 0x48 0x81 0xEC 0x98 0x03 0x00 0x00 0x48 0xB8 " + translateToBytes(enumAfterPayloadReturnAddress) + @" 0x50 0xC3
e	" + freeMemoryForEnumDisplay + @" " + patternInstruction + @" 0x48 0xB8 " + translateToBytes(freeMemoryForEnumDisplayJustCall) + @" 0x50 0xC3

qd
		";
		RunCbd(id, patchEnumDisplayScript);
	}

Most of that should be rather straightforward, but let’s go through it line by line.

We start by allocating some memory in the process and dumping the machine code of the EnumDisplayDevicesW function (lines 3-12).

We then parse the output (lines 14-15 and 19-25). We then calculate some dependent labels (lines 27-29).

Important part is the pattern in line 17. We encode it in the following way (spaces added for clarity):

monitor_id_1,override_id_1,return_true_or_false - monitor_id_2,override_id_2,return_true_or_false - ...

We then parse this pattern and use it in lines 31-44.

Finally, we concatenate all of that together and create the instructions for the debugger (lines 46-57).

]]>
https://blog.adamfurmanek.pl/2025/08/28/bit-twiddling-part-7/feed/ 0
SAT Part 4 — Solving 3CNF with C++ templates https://blog.adamfurmanek.pl/2025/07/25/sat-part-4/ https://blog.adamfurmanek.pl/2025/07/25/sat-part-4/#respond Fri, 25 Jul 2025 13:57:38 +0000 https://blog.adamfurmanek.pl/?p=5125 Continue reading SAT Part 4 — Solving 3CNF with C++ templates]]>

This is the fourth part of the SAT series. For your convenience you can find other parts in the table of contents in Part 1 — Boolean logic

Let’s use C++ templates to solve 3CNF SAT in compile-time.

The solution

First, let’s start with the code:

#include <iostream>

using namespace std;

// ============================

class NullType {};

// ============================

template <class T, class U>
struct Typelist
{
	typedef T Head;
	typedef U Tail;
};

// ============================

template <class TList, unsigned int index> struct TypeAt;
template <class Head, class Tail>
struct TypeAt<Typelist<Head, Tail>, 0>
{
	typedef Head Result;
};
template <class Head, class Tail, unsigned int i>
struct TypeAt<Typelist<Head, Tail>, i>
{
	typedef typename TypeAt<Tail, i - 1>::Result Result;
};

// ============================

template <class TList, class TAggregated> struct ReversedTypelist;
template<class TAggregated>
struct ReversedTypelist<NullType, TAggregated>{
	typedef TAggregated Result;
};
template <class Head, class Tail, class TAggregated>
struct ReversedTypelist<Typelist<Head, Tail>, TAggregated>
{
	typedef typename ReversedTypelist<Tail, Typelist<Head, TAggregated> >::Result Result;
};

// ============================

template<int v> struct Literal {
	enum {
		value = v
	};
};

struct LiteralTrue {
	enum {
		value = 1
	};
};

struct LiteralFalse {
	enum {
		value = 0
	};
};

// ============================


template<typename T> struct Id { typedef T Result; };
template<typename T> struct Negated {};
template<> struct Negated<LiteralTrue> { typedef LiteralFalse Result; };
template<> struct Negated<LiteralFalse> { typedef LiteralTrue Result; };




// ============================

template<int i>
struct Variable {};

template<>
struct Variable<0> {
	typedef LiteralFalse value1;
	typedef LiteralFalse value2;
};

template<>
struct Variable<1> {
	typedef LiteralTrue value1;
	typedef LiteralTrue value2;
};

template<>
struct Variable<2> {
	typedef LiteralFalse value1;
	typedef LiteralTrue value2;
};

// ============================

template<typename Constraint, typename BoundVariables> struct Conjunction { };
template<typename BoundVariables> struct Conjunction<LiteralFalse, BoundVariables> { typedef NullType Result; };
template<typename BoundVariables> struct Conjunction<LiteralTrue, BoundVariables> { typedef BoundVariables Result; };

template<typename BoundVariables1, typename BoundVariables2> struct Disjunction { typedef BoundVariables1 Result; };
template<> struct Disjunction<NullType, NullType> { typedef NullType Result; };
template<typename BoundVariables1> struct Disjunction<BoundVariables1, NullType> { typedef BoundVariables1 Result; };
template<typename BoundVariables2> struct Disjunction<NullType, BoundVariables2> { typedef BoundVariables2 Result; };

// ============================


// ============================
// Constraints with actual calculation
// This is where we check if SAT clauses (Variables) really satisfy a particular constraint
template<typename AOrNegated, typename BOrNegated, typename COrNegated> struct Constraint {};

template<typename A, typename B, typename C> struct ThreeClause { };
template<> struct ThreeClause<LiteralFalse, LiteralFalse, LiteralFalse> { typedef LiteralFalse value; };
template<> struct ThreeClause<LiteralFalse, LiteralFalse, LiteralTrue> { typedef LiteralTrue value; };
template<> struct ThreeClause<LiteralFalse, LiteralTrue, LiteralFalse> { typedef LiteralTrue value; };
template<> struct ThreeClause<LiteralFalse, LiteralTrue, LiteralTrue> { typedef LiteralTrue value; };
template<> struct ThreeClause<LiteralTrue, LiteralFalse, LiteralFalse> { typedef LiteralTrue value; };
template<> struct ThreeClause<LiteralTrue, LiteralFalse, LiteralTrue> { typedef LiteralTrue value; };
template<> struct ThreeClause<LiteralTrue, LiteralTrue, LiteralFalse> { typedef LiteralTrue value; };
template<> struct ThreeClause<LiteralTrue, LiteralTrue, LiteralTrue> { typedef LiteralTrue value; };

// ============================




// ============================


// ============================
// BoundConstraints
// Extracts referenced Variables and calculates literal representing the result of logical operation


template<typename ConstraintType, typename BoundVariables> struct BoundConstraint {};
template<int A, int B, int C, typename BoundVariables>
struct BoundConstraint<Constraint<Id<Literal<A> >, Id<Literal<B> >, Id<Literal<C> > >, BoundVariables> {
	typedef typename ThreeClause<
		typename Id<typename TypeAt<BoundVariables, A>::Result>::Result,
		typename Id<typename TypeAt<BoundVariables, B>::Result>::Result,
		typename Id<typename TypeAt<BoundVariables, C>::Result>::Result
	>::value value;
};
template<int A, int B, int C, typename BoundVariables>
struct BoundConstraint<Constraint<Id<Literal<A> >, Id<Literal<B> >, Negated<Literal<C> > >, BoundVariables> {
	typedef typename ThreeClause<
		typename Id<typename TypeAt<BoundVariables, A>::Result>::Result,
		typename Id<typename TypeAt<BoundVariables, B>::Result>::Result,
		typename Negated<typename TypeAt<BoundVariables, C>::Result>::Result
	>::value value;
};
template<int A, int B, int C, typename BoundVariables>
struct BoundConstraint<Constraint<Id<Literal<A> >, Negated<Literal<B> >, Id<Literal<C> > >, BoundVariables> {
	typedef typename ThreeClause<
		typename Id<typename TypeAt<BoundVariables, A>::Result>::Result,
		typename Negated<typename TypeAt<BoundVariables, B>::Result>::Result,
		typename Id<typename TypeAt<BoundVariables, C>::Result>::Result
	>::value value;
};
template<int A, int B, int C, typename BoundVariables>
struct BoundConstraint<Constraint<Id<Literal<A> >, Negated<Literal<B> >, Negated<Literal<C> > >, BoundVariables> {
	typedef typename ThreeClause<
		typename Id<typename TypeAt<BoundVariables, A>::Result>::Result,
		typename Negated<typename TypeAt<BoundVariables, B>::Result>::Result,
		typename Negated<typename TypeAt<BoundVariables, C>::Result>::Result
	>::value value;
};
template<int A, int B, int C, typename BoundVariables>
struct BoundConstraint<Constraint<Negated<Literal<A> >, Id<Literal<B> >, Id<Literal<C> > >, BoundVariables> {
	typedef typename ThreeClause<
		typename Negated<typename TypeAt<BoundVariables, A>::Result>::Result,
		typename Id<typename TypeAt<BoundVariables, B>::Result>::Result,
		typename Id<typename TypeAt<BoundVariables, C>::Result>::Result
	>::value value;
};
template<int A, int B, int C, typename BoundVariables>
struct BoundConstraint<Constraint<Negated<Literal<A> >, Id<Literal<B> >, Negated<Literal<C> > >, BoundVariables> {
	typedef typename ThreeClause<
		typename Negated<typename TypeAt<BoundVariables, A>::Result>::Result,
		typename Id<typename TypeAt<BoundVariables, B>::Result>::Result,
		typename Negated<typename TypeAt<BoundVariables, C>::Result>::Result
	>::value value;
};
template<int A, int B, int C, typename BoundVariables>
struct BoundConstraint<Constraint<Negated<Literal<A> >, Negated<Literal<B> >, Id<Literal<C> > >, BoundVariables> {
	typedef typename ThreeClause<
		typename Negated<typename TypeAt<BoundVariables, A>::Result>::Result,
		typename Negated<typename TypeAt<BoundVariables, B>::Result>::Result,
		typename Id<typename TypeAt<BoundVariables, C>::Result>::Result
	>::value value;
};
template<int A, int B, int C, typename BoundVariables>
struct BoundConstraint<Constraint<Negated<Literal<A> >, Negated<Literal<B> >, Negated<Literal<C> > >, BoundVariables> {
	typedef typename ThreeClause<
		typename Negated<typename TypeAt<BoundVariables, A>::Result>::Result,
		typename Negated<typename TypeAt<BoundVariables, B>::Result>::Result,
		typename Negated<typename TypeAt<BoundVariables, C>::Result>::Result
	>::value value;
};

// ============================


// ============================
// Model

template<typename Variables, typename BoundVariables, typename Constraints, typename BoundConstraints> struct Model {};

// Model with no BoundConstraints just returns the bound variables
template<typename BoundVariables>
struct Model<NullType, BoundVariables, NullType, NullType>{
	typedef BoundVariables Result;
};

// Recursively calculates AND between current BoundConstraint and the rest of BoundConstraints
// Effectively removes one element from BoundConstraints, and returns Conjunction of the removed BoundConstraint and the rest
template<typename BoundVariables, typename Head, typename Tail>
struct Model<NullType, BoundVariables, NullType, Typelist<Head, Tail> > {
	typedef typename Conjunction<
		typename Head::value,
		typename Model<NullType, BoundVariables, NullType, Tail>::Result
	>::Result Result;
};

// Recursively bounds Constraint with BoundVariables
// Effectively removes one element from Constraints, and adds one element to BoundConstraints
template<typename BoundVariables, typename Head, typename Tail, typename BoundConstraints>
struct Model<NullType, BoundVariables, Typelist<Head, Tail>, BoundConstraints> {
	typedef typename Model<NullType, BoundVariables, Tail, Typelist<BoundConstraint<Head, BoundVariables>, BoundConstraints> >::Result Result;
};

// Takes first Variable, branches into two trees for the first value of the variable and the second value
// Effectively removes one element from Variables, and adds one element to BoundVariables
template<typename Head, typename Tail, typename BoundVariables, typename Constraints, typename BoundConstraints>
struct Model<Typelist<Head, Tail>, BoundVariables, Constraints, BoundConstraints> {
	typedef typename Disjunction<
		typename Model<Tail, Typelist<typename Head::value1, BoundVariables>, Constraints, BoundConstraints>::Result,
		typename Model<Tail, Typelist<typename Head::value2, BoundVariables>, Constraints, BoundConstraints>::Result
	>::Result Result;
};

template<typename Variables, typename Constraints>
struct FullModel {
	typedef typename Model<typename ReversedTypelist<Variables, NullType>::Result, NullType, Constraints, NullType>::Result Result;
};

// ============================


// ============================
// Print helpers

template<typename T> struct Print {};
template<typename Head, typename Tail>
struct Print<Typelist<Head, Tail> >{
	void print(){
		std::cout<<Head::value<<std::endl;
		Print<Tail>().print();
	}
};
template<>
struct Print<NullType>{
	void print(){
		std::cout<<"NullType"<<std::endl;
	}
};

// ============================


int main() {
	// XOR
	Print<typename FullModel< 
		Typelist<Variable<1>, Typelist<Variable<0>, Typelist<Variable<2>, NullType> > >,
			Typelist<Constraint<Negated<Literal<0> >, Id<Literal<1> >, Id<Literal<2> > >,
			Typelist<Constraint<Id<Literal<0> >, Negated<Literal<1> >, Id<Literal<2> > >,
			Typelist<Constraint<Id<Literal<0> >, Id<Literal<1> >, Negated<Literal<2> > >,
			Typelist<Constraint<Negated<Literal<0> >, Negated<Literal<1> >, Negated<Literal<2> > >,
				NullType> > > >
	>::Result>().print();
}

Problem definition

Let’s start with the main function which shows how to calculate exclusive or (XOR) using this approach:

Print<typename FullModel< 
	Typelist<Variable<1>, Typelist<Variable<0>, Typelist<Variable<2>, NullType> > >,
		Typelist<Constraint<Negated<Literal<0> >, Id<Literal<1> >, Id<Literal<2> > >,
		Typelist<Constraint<Id<Literal<0> >, Negated<Literal<1> >, Id<Literal<2> > >,
		Typelist<Constraint<Id<Literal<0> >, Id<Literal<1> >, Negated<Literal<2> > >,
		Typelist<Constraint<Negated<Literal<0> >, Negated<Literal<1> >, Negated<Literal<2> > >,
			NullType> > > >
>::Result>().print();

We will create an instance of FullModel template that will encapsulate our parameters.

As a first parameter, we are going to pass a list of variables. Each variable can be either fixed (Variable<0> is false, Variable<1> is true) or free (Variable<2>). Fixed variables represent clauses for which we know the value. Free variables need to be solved by the solver. We’ll later refer to this list as list of variables and we’ll get its elements by the 0-based indices.

The second parameter is a list of constraints. Each constraint represents a disjunction of three variables. Each variable is either negated or left intact. Let’s take this line as an example:

Constraint<Negated<Literal<0> >, Id<Literal<1> >, Id<Literal<2> >

This creates a constraint in which we take variables with numbers 0, 1, and 2. The first variable (with index 0) is negated, other ones are left intact. This constraint represents the following formula:

*** QuickLaTeX cannot compile formula:
\[\neg A \vee B \vee C\]

*** Error message:
Cannot connect to QuickLaTeX server: cURL error 6: Could not resolve host: www.quicklatex.com
Please make sure your server/PHP settings allow HTTP requests to external resources ("allow_url_fopen", etc.)
These links might help in finding solution:
http://wordpress.org/extend/plugins/core-control/
http://wordpress.org/support/topic/an-unexpected-http-error-occurred-during-the-api-request-on-wordpress-3?replies=37

Helpers

We need some helper structures to represent concepts.

class NullType {};

This represents a null type. It’s basically a sentinel to indicate end of list or missing values.

Next, we need a definition of linked list. We build it the typical way, with head and tail:

template <class T, class U>
struct Typelist
{
	typedef T Head;
	typedef U Tail;
};

We need some helpers to operate on this list. We start with indexing:

template <class TList, unsigned int index> struct TypeAt;
template <class Head, class Tail>
struct TypeAt<Typelist<Head, Tail>, 0>
{
	typedef Head Result;
};
template <class Head, class Tail, unsigned int i>
struct TypeAt<Typelist<Head, Tail>, i>
{
	typedef typename TypeAt<Tail, i - 1>::Result Result;
};

We also need a way to reverse the list:

template <class TList, class TAggregated> struct ReversedTypelist;
template<class TAggregated>
struct ReversedTypelist<NullType, TAggregated>{
	typedef TAggregated Result;
};
template <class Head, class Tail, class TAggregated>
struct ReversedTypelist<Typelist<Head, Tail>, TAggregated>
{
	typedef typename ReversedTypelist<Tail, Typelist<Head, TAggregated> >::Result Result;
};

We assume that each list will end with a null type and will not have any null type in the middle.

Now, we need a literal that represents a number used for indexing inside the list:

template<int v> struct Literal {
	enum {
		value = v
	};
};

In a similar way, we need two literals to represent true and false. They are used to represent the actual variable values:

struct LiteralTrue {
	enum {
		value = 1
	};
};

struct LiteralFalse {
	enum {
		value = 0
	};
};

Finally, we need to have a way to negate the variable value or leave it intact.

template<typename T> struct Id { typedef T Result; };
template<typename T> struct Negated {};
template<> struct Negated<LiteralTrue> { typedef LiteralFalse Result; };
template<> struct Negated<LiteralFalse> { typedef LiteralTrue Result; };

Representing variables

Let’s now define actual variables used in the model. They represent logical clauses that can be either true or false.

Let’s now sketch the general idea for how we’re going to solve the model. We’ll try to substitute each clause with the allowed values. For fixed variables, only one value is allowed (either 0 or 1). For free variables, we need to consider both values (0 and 1). Therefore, each variable will have two fields to store the available values. For fixed variables, both fields will have the same value. For free variables, the fields will have two different values.

template<int i>
struct Variable {};

template<>
struct Variable<0> {
	typedef LiteralFalse value1;
	typedef LiteralFalse value2;
};

template<>
struct Variable<1> {
	typedef LiteralTrue value1;
	typedef LiteralTrue value2;
};

template<>
struct Variable<2> {
	typedef LiteralFalse value1;
	typedef LiteralTrue value2;
};

Constraints

When solving the model, we’ll effectively examine each possible valuation. Therefore, constraints need to be resolved after valuating variables. Therefore, we’ll have a two-stage process: first, we’ll define constraints as which variables they should incorporate. In the second stage, the constraints will be bound with valuated variables.

To do that, we will need to calculate the CNF. Therefore, we’ll need to calculate the conjunction of all constraints, and disjunction of possible valuations. We define these as follows:

template<typename Constraint, typename BoundVariables> struct Conjunction { };
template<typename BoundVariables> struct Conjunction<LiteralFalse, BoundVariables> { typedef NullType Result; };
template<typename BoundVariables> struct Conjunction<LiteralTrue, BoundVariables> { typedef BoundVariables Result; };

template<typename BoundVariables1, typename BoundVariables2> struct Disjunction { typedef BoundVariables1 Result; };
template<> struct Disjunction<NullType, NullType> { typedef NullType Result; };
template<typename BoundVariables1> struct Disjunction<BoundVariables1, NullType> { typedef BoundVariables1 Result; };
template<typename BoundVariables2> struct Disjunction<NullType, BoundVariables2> { typedef BoundVariables2 Result; };

Conceptually, we’ll use them as following:

Disjunction(first valuation, Disjunction(second valuation, Disjunction(third valuation, ...)))

For each valuation, we’ll calculate the following:

Conjunction(first constraint, Conjunction(second constraint, Conjunction(third constraint, ...)))

This will effectively create a tree of valuations (thanks to disjunctions), and for each specific valuation we’ll verify if all 3CNF clauses are met (thanks to conjunction).

Notice that both conjunction and disjunction return a set of bound variables. We use that to retrieve the particular result. If a given valuation doesn’t meet the constraints, then the null type is returned.

Now, we need to actually be able to check if a particular 3CNF is solved. We do it like this:

template<typename AOrNegated, typename BOrNegated, typename COrNegated> struct Constraint {};

template<typename A, typename B, typename C> struct ThreeClause { };
template<> struct ThreeClause<LiteralFalse, LiteralFalse, LiteralFalse> { typedef LiteralFalse value; };
template<> struct ThreeClause<LiteralFalse, LiteralFalse, LiteralTrue> { typedef LiteralTrue value; };
template<> struct ThreeClause<LiteralFalse, LiteralTrue, LiteralFalse> { typedef LiteralTrue value; };
template<> struct ThreeClause<LiteralFalse, LiteralTrue, LiteralTrue> { typedef LiteralTrue value; };
template<> struct ThreeClause<LiteralTrue, LiteralFalse, LiteralFalse> { typedef LiteralTrue value; };
template<> struct ThreeClause<LiteralTrue, LiteralFalse, LiteralTrue> { typedef LiteralTrue value; };
template<> struct ThreeClause<LiteralTrue, LiteralTrue, LiteralFalse> { typedef LiteralTrue value; };
template<> struct ThreeClause<LiteralTrue, LiteralTrue, LiteralTrue> { typedef LiteralTrue value; };

This is quite straightforward. We take the 3CNF and check if any variable is true.

Now, we need to bound the constraints with the variables. To do that, we take the variable by its index, negate it if needed, and use in the 3CNF:

template<typename ConstraintType, typename BoundVariables> struct BoundConstraint {};
template<int A, int B, int C, typename BoundVariables>
struct BoundConstraint<Constraint<Id<Literal<A> >, Id<Literal<B> >, Id<Literal<C> > >, BoundVariables> {
	typedef typename ThreeClause<
		typename Id<typename TypeAt<BoundVariables, A>::Result>::Result,
		typename Id<typename TypeAt<BoundVariables, B>::Result>::Result,
		typename Id<typename TypeAt<BoundVariables, C>::Result>::Result
	>::value value;
};
template<int A, int B, int C, typename BoundVariables>
struct BoundConstraint<Constraint<Id<Literal<A> >, Id<Literal<B> >, Negated<Literal<C> > >, BoundVariables> {
	typedef typename ThreeClause<
		typename Id<typename TypeAt<BoundVariables, A>::Result>::Result,
		typename Id<typename TypeAt<BoundVariables, B>::Result>::Result,
		typename Negated<typename TypeAt<BoundVariables, C>::Result>::Result
	>::value value;
};
template<int A, int B, int C, typename BoundVariables>
struct BoundConstraint<Constraint<Id<Literal<A> >, Negated<Literal<B> >, Id<Literal<C> > >, BoundVariables> {
	typedef typename ThreeClause<
		typename Id<typename TypeAt<BoundVariables, A>::Result>::Result,
		typename Negated<typename TypeAt<BoundVariables, B>::Result>::Result,
		typename Id<typename TypeAt<BoundVariables, C>::Result>::Result
	>::value value;
};
template<int A, int B, int C, typename BoundVariables>
struct BoundConstraint<Constraint<Id<Literal<A> >, Negated<Literal<B> >, Negated<Literal<C> > >, BoundVariables> {
	typedef typename ThreeClause<
		typename Id<typename TypeAt<BoundVariables, A>::Result>::Result,
		typename Negated<typename TypeAt<BoundVariables, B>::Result>::Result,
		typename Negated<typename TypeAt<BoundVariables, C>::Result>::Result
	>::value value;
};
template<int A, int B, int C, typename BoundVariables>
struct BoundConstraint<Constraint<Negated<Literal<A> >, Id<Literal<B> >, Id<Literal<C> > >, BoundVariables> {
	typedef typename ThreeClause<
		typename Negated<typename TypeAt<BoundVariables, A>::Result>::Result,
		typename Id<typename TypeAt<BoundVariables, B>::Result>::Result,
		typename Id<typename TypeAt<BoundVariables, C>::Result>::Result
	>::value value;
};
template<int A, int B, int C, typename BoundVariables>
struct BoundConstraint<Constraint<Negated<Literal<A> >, Id<Literal<B> >, Negated<Literal<C> > >, BoundVariables> {
	typedef typename ThreeClause<
		typename Negated<typename TypeAt<BoundVariables, A>::Result>::Result,
		typename Id<typename TypeAt<BoundVariables, B>::Result>::Result,
		typename Negated<typename TypeAt<BoundVariables, C>::Result>::Result
	>::value value;
};
template<int A, int B, int C, typename BoundVariables>
struct BoundConstraint<Constraint<Negated<Literal<A> >, Negated<Literal<B> >, Id<Literal<C> > >, BoundVariables> {
	typedef typename ThreeClause<
		typename Negated<typename TypeAt<BoundVariables, A>::Result>::Result,
		typename Negated<typename TypeAt<BoundVariables, B>::Result>::Result,
		typename Id<typename TypeAt<BoundVariables, C>::Result>::Result
	>::value value;
};
template<int A, int B, int C, typename BoundVariables>
struct BoundConstraint<Constraint<Negated<Literal<A> >, Negated<Literal<B> >, Negated<Literal<C> > >, BoundVariables> {
	typedef typename ThreeClause<
		typename Negated<typename TypeAt<BoundVariables, A>::Result>::Result,
		typename Negated<typename TypeAt<BoundVariables, B>::Result>::Result,
		typename Negated<typename TypeAt<BoundVariables, C>::Result>::Result
	>::value value;
};

I imagine this part can be simplified somehow (with even more templates and higher-kinded templates).

Model

We now have everything to define the model. A model is a structure with 4 elements: list of variables, list of bound variables (variables with valuation), list of constraints, and a list of bound constraints:

template<typename Variables, typename BoundVariables, typename Constraints, typename BoundConstraints> struct Model {};

We don’t want the user to provide all the values, as we just need the variables and constraints. Therefore, we have the following wrapper:

template<typename Variables, typename Constraints>
struct FullModel {
	typedef typename Model<typename ReversedTypelist<Variables, NullType>::Result, NullType, Constraints, NullType>::Result Result;
};

Now, the plan of action. For the exclusive or example, we start with something like this:

FullModel(
  [fixed variable 0 with value true; fixed variable 1 with value false; free variable 2],
  [
    Constraint(!0, 1, 2),
    Constraint(0, !1, 2),
    Constraint(0, 1, !2),
    Constraint(!0, !1, !2)
  ]
)

First, we add two more empty parameters. We also reverse the list of variables because we’ll reverse it later when evaluating:

Model(
  [free variable 2, fixed variable 1 with value false, fixed variable 0 with value true],
  [], // valuated variables
  [
    Constraint(!0, 1, 2),
    Constraint(0, !1, 2),
    Constraint(0, 1, !2),
    Constraint(!0, !1, !2)
  ],
  [] // bound constraints
)

Now, we want to iterate over variables until the list is empty. For each iteration, we want to use disjunction to split into two subtrees:

template<typename Head, typename Tail, typename BoundVariables, typename Constraints, typename BoundConstraints>
struct Model<Typelist<Head, Tail>, BoundVariables, Constraints, BoundConstraints> {
	typedef typename Disjunction<
		typename Model<Tail, Typelist<typename Head::value1, BoundVariables>, Constraints, BoundConstraints>::Result,
		typename Model<Tail, Typelist<typename Head::value2, BoundVariables>, Constraints, BoundConstraints>::Result
	>::Result Result;
};

So, the very first step of evaluation will take first variable and generate this first subtree:

Model(
  [fixed variable 1 with value false, fixed variable 0 with value true],
  [bound variable 2 with value false], // valuated variables
  [
    Constraint(!0, 1, 2),
    Constraint(0, !1, 2),
    Constraint(0, 1, !2),
    Constraint(!0, !1, !2)
  ],
  [] // bound constraints
)

You can see that we removed one unbound variable (from the first list) and added one bound variable (to the second list). We also need to do the same for the other possible value of the variable, so we generate another subtree:

Model(
  [fixed variable 1 with value false, fixed variable 0 with value true],
  [bound variable 2 with value true], // valuated variables
  [
    Constraint(!0, 1, 2),
    Constraint(0, !1, 2),
    Constraint(0, 1, !2),
    Constraint(!0, !1, !2)
  ],
  [] // bound constraints
)

Next steps are similar, but both generated subtrees will be the same (as variables are fixed). Ultimately, we end up with something like this:

Model(
  [],
  [variable 0 with value true, variable 1 with value false, variable 2 with value false], // valuated variables
  [
    Constraint(!0, 1, 2),
    Constraint(0, !1, 2),
    Constraint(0, 1, !2),
    Constraint(!0, !1, !2)
  ],
  [] // bound constraints
)

We bound all the variables. Now, we need to do the same for constraints:

template<typename BoundVariables, typename Head, typename Tail, typename BoundConstraints>
struct Model<NullType, BoundVariables, Typelist<Head, Tail>, BoundConstraints> {
	typedef typename Model<NullType, BoundVariables, Tail, Typelist<BoundConstraint<Head, BoundVariables>, BoundConstraints> >::Result Result;
};

In the first step, we take the first constraint and bound it with the variables to get something like this:

Model(
  [],
  [bound variable 0 with value true, bound variable 1 with value false, bound variable 2 with value false], // valuated variables
  [    
    Constraint(0, !1, 2),
    Constraint(0, 1, !2),
    Constraint(!0, !1, !2)
  ],
  [
    BoundConstraint(!0, 1, 2) with bound variables(true, false, false),
  ] // bound constraints
)

We do the same for all the constraints, and finally end up with this:

Model(
  [],
  [bound variable 0 with value true, bound variable 1 with value false, bound variable 2 with value false], // valuated variables
  [],
  [
    BoundConstraint(0, !1, 2) with bound variables(true, false, false),
    BoundConstraint(0, 1, !2) with bound variables(true, false, false),
    BoundConstraint(!0, !1, !2) with bound variables(true, false, false),
    BoundConstraint(!0, 1, 2) with bound variables(true, false, false),
  ] // bound constraints
)

We repeat a lot, but that’s not a problem.

Finally, we can start evaluating the constraints:

template<typename BoundVariables, typename Head, typename Tail>
struct Model<NullType, BoundVariables, NullType, Typelist<Head, Tail> > {
	typedef typename Conjunction<
		typename Head::value,
		typename Model<NullType, BoundVariables, NullType, Tail>::Result
	>::Result Result;
};

We unwrap the first constraint and turn it into a conjunction. Effectively, we’ll build a structure of this:

Conjunction(BoundConstraint(0, !1, 2) with bound variables(true, false, false), Conjunction(BoundConstraint(0, 1, !2) with bound variables(true, false, false), Conjunction(BoundConstraint(!0, !1, !2) with bound variables(true, false, false), Conjunction(BoundConstraint(!0, 1, 2) with bound variables(true, false, false), Model([],  [bound variable 0 with value true, bound variable 1 with value false, bound variable 2 with value false], [], [])))))

And now we can calculate the conjunctions. If the second parameter is empty model, then we simply return bound variables:

template<typename BoundVariables>
struct Model<NullType, BoundVariables, NullType, NullType>{
	typedef BoundVariables Result;
};

Now, we need to refer back to the definition of conjunction we already saw:

template<typename BoundVariables> struct Conjunction<LiteralFalse, BoundVariables> { typedef NullType Result; };
template<typename BoundVariables> struct Conjunction<LiteralTrue, BoundVariables> { typedef BoundVariables Result; };

You can see that we’ll get some variables as a second parameter, and we’ll need to evaluate the first parameter which is BoundConstraint. We already saw how it’s done:

template<int A, int B, int C, typename BoundVariables>
struct BoundConstraint<Constraint<Negated<Literal<A> >, Id<Literal<B> >, Id<Literal<C> > >, BoundVariables> {
	typedef typename ThreeClause<
		typename Negated<typename TypeAt<BoundVariables, A>::Result>::Result,
		typename Id<typename TypeAt<BoundVariables, B>::Result>::Result,
		typename Id<typename TypeAt<BoundVariables, C>::Result>::Result
	>::value value;
};

And here is where the magic happens. We extract variables from the list, we negate them accordingly, and then use the 3CNF logic:

template<> struct ThreeClause<LiteralFalse, LiteralFalse, LiteralFalse> { typedef LiteralFalse value; };

And we can see this constraint isn’t met. We’ll therefore return LiteralFalse back to Conjunction above and this will in turn return NullType up the tree. You can trace the recursive calls further to figure out what happens next.

Printing

Last thing is printing. It’s rather straighforward:

template<typename T> struct Print {};
template<typename Head, typename Tail>
struct Print<Typelist<Head, Tail> >{
	void print(){
		std::cout<<Head::value<<std::endl;
		Print<Tail>().print();
	}
};
template<>
struct Print<NullType>{
	void print(){
		std::cout<<"NullType"<<std::endl;
	}
};

This simply prints the structure. The solution to SAT is calculated in the compilation time. Printing happens in runtime, so you need to run the application to get the precalculated result.

Summary

We already saw how to reduce ILP to 0-1 ILP, then to 3CNF SAT, and now we can calculate that using C++ templates. Therefore, we can solve our ILP problems using C++ compiler.

]]>
https://blog.adamfurmanek.pl/2025/07/25/sat-part-4/feed/ 0
Async Wandering Part 15 — How async in C# tricks you and how to order async continuations https://blog.adamfurmanek.pl/2024/11/09/async-wandering-part-15/ https://blog.adamfurmanek.pl/2024/11/09/async-wandering-part-15/#respond Sat, 09 Nov 2024 21:10:33 +0000 https://blog.adamfurmanek.pl/?p=5101 Continue reading Async Wandering Part 15 — How async in C# tricks you and how to order async continuations]]>

This is the fifteenth part of the Async Wandering series. For your convenience you can find other parts in the table of contents in Part 1 – Why creating Form from WinForms in unit tests breaks async?

You probably heard that async is all about not blocking the operating system level thread. This is the fundamental principle of asynchronous programming. If you block the operating system level thread, you lose all the benefits of the asynchronous code.

You also need to keep in mind how to write your C# code. You probably heard that you should keep async all the way up. This is rather easy to keep because the compiler takes care of that. What’s slightly harder to remember is to keep ConfigureAwait(false) all the way down. If you don’t do it this way, the compiler won’t help you and you may run into some nasty deadlocking issues, especially if you use some weird SynchronizationContext.

Last but not least, you probably know that the asynchronous code is only useful if your code is IO-bound. You probably heard that many times. However, what might be very surprising is that C# actually does a lot to make your application work even if your code is CPU-bound and you still use async. This is very misleading and may let you believe that you know async, whereas you only know async in C#. Let’s see why.

There is plenty of no threads!

One of the best articles about async is C# is titled There Is No Thread. Stephen Cleary shows that it’s all about continuations and juggling the lambdas to run your code when some IO-bound operation finishes. I even used this title in my Internals of Async talk in which I explain all the internals of synchronization contexts, continuations, and the machinery behind the scenes.

However, it’s only a figure of speech. At the very end of the day, we need to have some thread to run the code. Depending on your synchronization context, there may be some dedicated thread to run the continuations (like in desktop or Blazor applications), or we can use threads from the thread pool. If you think carefully about the asynchronous code, you should notice that this is the place where C# either bites you hard (and causes many deadlocks) or saves your application even if you are doing something very wrong. How? Because C# uses many threads.

By default, C# uses the thread pool to run continuations. The thread pool runs some not-so-trivial algorithm to spawn new threads when there is plenty of work to be done. This is not part of the asynchronous programming paradigm per se. This is just the implementation detail of C#’s asynchronous code which heavily impacts how your applications scale. Other languages don’t do it in the same way and what works well in C# may fail badly somewhere else. For instance, Python’s asyncio uses just one thread even though Python supports multithreading. While this is just an implementation detail, it have tremendous performance implications. Let’s see why.

One thread can kill you

Let’s take a typical message processing flow. We take a message from the service bus, refresh the lease periodically, and process the message in the background. Let’s say that our flow is IO-bound and we use async to benefit from non-blocking thread instead of spawning multiple threads. Let’s simulate the system. You can find the whole code in this gist.
We start with a message that will store when we received it, when we refreshed the lease for the last time, the identifier of the message, and the final status (if we lost the message or finished successfully):

public class Message
{
	public DateTime LastRefreshTime { get; set; }
	public DateTime ReceiveTime { get; set; }
	public bool WasLost { get; set; }
	public bool WasFinished { get; set; }
	public int Id { get; set; }
}

Now, we want to configure timings in our application. We can specify how long it takes to receive the message from the bus, how many operations we need to perform on each message, and how long they all take:

public class Settings
{
	public int MessagesCount { get; set; }
	public TimeSpan ReceivingDuration { get; set; }
	public TimeSpan ProcessingCPUDuration { get; set; }
	public TimeSpan ProcessingIODuration { get; set; }
	public int ProcessingLoops { get; set; }
	public TimeSpan RefreshDelay { get; set; }
	public TimeSpan RefreshDuration { get; set; }
	public TimeSpan MessageLeaseTimeout { get; set; }
}

Finally, we have some statistics showing how we did:

public class Stats
{
	public int GeneratedMessages { get; set; }
	public int ProcessedSuccessfully { get; set; }
	public int Lost { get; set; }
}

Let’s now see the scaffodling code:

public static async Task Simulate()
{
	Initialize();
	StartMonitoringThread();
	await RunLoop();
}

private static void Initialize()
{
	CurrentStats = new Stats();
	CurrentSettings = new Settings
	{
		ReceivingDuration = TimeSpan.FromMilliseconds(250),
		RefreshDelay = TimeSpan.FromSeconds(2),
		RefreshDuration = TimeSpan.FromMilliseconds(250),
		MessageLeaseTimeout = TimeSpan.FromSeconds(5),
		ProcessingCPUDuration = TimeSpan.FromMilliseconds(100),
		ProcessingIODuration = TimeSpan.FromMilliseconds(1000),
		ProcessingLoops = 20,
		MessagesCount = 100
	};
}

private static void StartMonitoringThread()
{
	Thread monitoringThread = new Thread(() =>
	{
		while (true)
		{
			Thread.Sleep(TimeSpan.FromSeconds(3));
			Log($"Received messages {CurrentStats.GeneratedMessages}, " +
				$"success {CurrentStats.ProcessedSuccessfully}, " +
				$"failed {CurrentStats.Lost}, " +
				$"still running {CurrentStats.GeneratedMessages - CurrentStats.ProcessedSuccessfully - CurrentStats.Lost}");
		}
	});
	monitoringThread.IsBackground = true;
	monitoringThread.Start();
}

We have the Simulate method that runs the magic. It starts by initializing the timings and setting up some monitoring thread to print statistics every three seconds.

When it comes to the timings: we will run 20 loops for each message. In each loop’s iteration, we will do some CPU-bound operation (taking 100 milliseconds), and then some IO-bound operation (taking 1000 milliseconds). We can see that the CPU operation is 10 times shorter than the IO-bound one.

Finally, we have the heart of our system:

private static async Task RunLoop()
{
	while (true)
	{
		var message = await ReceiveMessage();
		if (message == null) continue;

		KeepLease(message);
		ProcessMessage(message);
	}
}

We receive the message, keep refreshing the lease, and process the message. Some error-handling code is omitted for brevity.

Receiving the message is rather straightforward – we check if we have more messages in the queue, then take one, otherwise we simply return:

public static async Task<Message?> ReceiveMessage()
{
	await Task.Yield();
	await Task.Delay(CurrentSettings.ReceivingDuration);

	if (CurrentSettings.MessagesCount-- > 0)
	{
		CurrentStats.GeneratedMessages++;
		Message message = new Message
		{
			LastRefreshTime = DateTime.Now,
			WasLost = false,
			Id = CurrentStats.GeneratedMessages,
			ReceiveTime = DateTime.Now
		};
		Log($"New message received with id {message.Id}");
		return message;
	}
	else
	{
		return null;
	}
}

Keeping a lease is also clear – we wait for some time, then refresh the lease and check if we made it on time:

public static async Task KeepLease(Message message)
{
	await Task.Yield();

	while (message.WasFinished == false) // This is unsafe according to memory model
	{
		await Task.Delay(CurrentSettings.RefreshDelay);

		if (DateTime.Now > message.LastRefreshTime + CurrentSettings.MessageLeaseTimeout)
		{
			message.WasLost = true;
			CurrentStats.Lost++;
			Log($"Lost lease for message {message.Id}");
			return;
		}
		else
		{
			await Task.Delay(CurrentSettings.RefreshDuration);
			Log($"Refreshed lease for message {message.Id}");
			message.LastRefreshTime = DateTime.Now;
		}
	}

	CurrentStats.ProcessedSuccessfully++;
}

Finally, the heart of our message processing. We simply run a loop and do the work:

public static async Task ProcessMessage(Message message)
{
	await Task.Yield();

	for (int part = 0; part < CurrentSettings.ProcessingLoops && message.WasLost == false; ++part)
	{
		Thread.Sleep(CurrentSettings.ProcessingCPUDuration); // CPU-bound part

		await Task.Delay(CurrentSettings.ProcessingIODuration); // IO-bound part
	}

	message.WasFinished = true;
	if (!message.WasLost)
	{
		Log($"Finished message with id {message.Id} in {DateTime.Now - message.ReceiveTime}");
	}
}

Notice that we block the thread for the CPU-bound operation and use await for the IO-bound one.

We also have this logging method that prints the timestamp, thread ID, and the message:

public static void Log(string message)
{
	Console.WriteLine($"{DateTime.Now}\t{Thread.CurrentThread.ManagedThreadId}\t{message}");
}

Let’s run the code, let it go for a while, and then see what happens:

11/9/2024 12:05:06 PM   4       New message received with id 1
11/9/2024 12:05:06 PM   6       New message received with id 2
11/9/2024 12:05:06 PM   6       New message received with id 3
11/9/2024 12:05:07 PM   8       New message received with id 4
11/9/2024 12:05:07 PM   8       New message received with id 5
11/9/2024 12:05:07 PM   4       New message received with id 6
11/9/2024 12:05:08 PM   4       New message received with id 7
11/9/2024 12:05:08 PM   4       New message received with id 8
11/9/2024 12:05:08 PM   11      New message received with id 9
11/9/2024 12:05:08 PM   8       Refreshed lease for message 1
11/9/2024 12:05:08 PM   12      New message received with id 10
11/9/2024 12:05:08 PM   11      Refreshed lease for message 2
11/9/2024 12:05:09 PM   12      New message received with id 11
11/9/2024 12:05:09 PM   7       Received messages 11, success 0, failed 0, still running 11
11/9/2024 12:05:09 PM   11      Refreshed lease for message 3
11/9/2024 12:05:09 PM   11      New message received with id 12
11/9/2024 12:05:09 PM   12      Refreshed lease for message 4
11/9/2024 12:05:09 PM   6       New message received with id 13
11/9/2024 12:05:09 PM   8       Refreshed lease for message 5
11/9/2024 12:05:09 PM   12      New message received with id 14
11/9/2024 12:05:10 PM   12      Refreshed lease for message 6
11/9/2024 12:05:10 PM   12      New message received with id 15
11/9/2024 12:05:10 PM   11      Refreshed lease for message 7
11/9/2024 12:05:10 PM   11      New message received with id 16
11/9/2024 12:05:10 PM   6       Refreshed lease for message 8
11/9/2024 12:05:10 PM   12      New message received with id 17
11/9/2024 12:05:10 PM   11      Refreshed lease for message 9
11/9/2024 12:05:10 PM   8       New message received with id 18
11/9/2024 12:05:10 PM   11      Refreshed lease for message 1
11/9/2024 12:05:11 PM   4       Refreshed lease for message 10
11/9/2024 12:05:11 PM   13      New message received with id 19
11/9/2024 12:05:11 PM   12      Refreshed lease for message 2
11/9/2024 12:05:11 PM   13      Refreshed lease for message 11
11/9/2024 12:05:11 PM   11      New message received with id 20
11/9/2024 12:05:11 PM   6       Refreshed lease for message 3
11/9/2024 12:05:11 PM   4       Refreshed lease for message 12
11/9/2024 12:05:11 PM   13      New message received with id 21
11/9/2024 12:05:11 PM   13      Refreshed lease for message 4
11/9/2024 12:05:11 PM   4       Refreshed lease for message 13
11/9/2024 12:05:11 PM   4       New message received with id 22
11/9/2024 12:05:12 PM   4       Refreshed lease for message 5
11/9/2024 12:05:12 PM   6       Refreshed lease for message 14
11/9/2024 12:05:12 PM   13      New message received with id 23
11/9/2024 12:05:12 PM   7       Received messages 23, success 0, failed 0, still running 23
11/9/2024 12:05:12 PM   4       Refreshed lease for message 6
11/9/2024 12:05:12 PM   4       Refreshed lease for message 15
...
11/9/2024 12:05:50 PM   15      Finished message with id 84 in 00:00:22.3550821
11/9/2024 12:05:50 PM   15      Refreshed lease for message 83
11/9/2024 12:05:50 PM   4       Refreshed lease for message 100
11/9/2024 12:05:50 PM   8       Finished message with id 85 in 00:00:22.3554494
11/9/2024 12:05:50 PM   15      Refreshed lease for message 84
11/9/2024 12:05:50 PM   20      Refreshed lease for message 92
11/9/2024 12:05:50 PM   20      Finished message with id 86 in 00:00:22.3882717
11/9/2024 12:05:50 PM   8       Refreshed lease for message 93
11/9/2024 12:05:51 PM   4       Refreshed lease for message 85
11/9/2024 12:05:51 PM   20      Finished message with id 87 in 00:00:22.3452990
11/9/2024 12:05:51 PM   4       Refreshed lease for message 94
11/9/2024 12:05:51 PM   20      Refreshed lease for message 86
11/9/2024 12:05:51 PM   7       Received messages 100, success 86, failed 0, still running 14
11/9/2024 12:05:51 PM   14      Finished message with id 88 in 00:00:22.3968974
11/9/2024 12:05:51 PM   13      Refreshed lease for message 87
11/9/2024 12:05:51 PM   4       Refreshed lease for message 95
11/9/2024 12:05:51 PM   4       Refreshed lease for message 88
11/9/2024 12:05:51 PM   14      Finished message with id 89 in 00:00:22.3782384
11/9/2024 12:05:51 PM   4       Refreshed lease for message 96
11/9/2024 12:05:51 PM   15      Finished message with id 90 in 00:00:22.3557212
11/9/2024 12:05:51 PM   15      Refreshed lease for message 89
11/9/2024 12:05:52 PM   13      Refreshed lease for message 97
11/9/2024 12:05:52 PM   20      Refreshed lease for message 90
11/9/2024 12:05:52 PM   15      Finished message with id 91 in 00:00:22.4805351
11/9/2024 12:05:52 PM   15      Refreshed lease for message 98
11/9/2024 12:05:52 PM   20      Refreshed lease for message 91
11/9/2024 12:05:52 PM   15      Refreshed lease for message 99
11/9/2024 12:05:52 PM   15      Finished message with id 92 in 00:00:22.3979587
11/9/2024 12:05:52 PM   4       Refreshed lease for message 100
11/9/2024 12:05:52 PM   20      Finished message with id 93 in 00:00:22.3374987
11/9/2024 12:05:53 PM   4       Refreshed lease for message 92
11/9/2024 12:05:53 PM   4       Finished message with id 94 in 00:00:22.3451488
11/9/2024 12:05:53 PM   4       Refreshed lease for message 93
11/9/2024 12:05:53 PM   13      Refreshed lease for message 94
11/9/2024 12:05:53 PM   13      Finished message with id 95 in 00:00:22.3784563
11/9/2024 12:05:53 PM   13      Finished message with id 96 in 00:00:22.3800325
11/9/2024 12:05:53 PM   13      Refreshed lease for message 95
11/9/2024 12:05:54 PM   4       Finished message with id 97 in 00:00:22.3312738
11/9/2024 12:05:54 PM   20      Refreshed lease for message 96
11/9/2024 12:05:54 PM   7       Received messages 100, success 96, failed 0, still running 4
11/9/2024 12:05:54 PM   13      Finished message with id 98 in 00:00:22.3502617
11/9/2024 12:05:54 PM   4       Refreshed lease for message 97
11/9/2024 12:05:54 PM   13      Finished message with id 99 in 00:00:22.3527442
11/9/2024 12:05:54 PM   4       Refreshed lease for message 98
11/9/2024 12:05:54 PM   4       Finished message with id 100 in 00:00:22.3675039
11/9/2024 12:05:54 PM   13      Refreshed lease for message 99
11/9/2024 12:05:55 PM   13      Refreshed lease for message 100
11/9/2024 12:05:57 PM   7       Received messages 100, success 100, failed 0, still running 0

We can see that all messaged were processed successfully in around 50 seconds. Processing a message was taking around 22 seconds which makes perfect sense since we had 20 iterations taking around 1100 milliseconds each. No failures, all was good.

Let’s now increase the CPU-bound operation time to 1 second (to match the IO-bound part). This is what happens:

11/9/2024 12:06:57 PM   8       New message received with id 1
11/9/2024 12:06:57 PM   11      New message received with id 2
11/9/2024 12:06:58 PM   11      New message received with id 3
11/9/2024 12:06:58 PM   4       New message received with id 4
11/9/2024 12:06:58 PM   13      New message received with id 5
11/9/2024 12:06:58 PM   13      New message received with id 6
11/9/2024 12:06:59 PM   14      New message received with id 7
11/9/2024 12:06:59 PM   11      New message received with id 8
11/9/2024 12:06:59 PM   4       New message received with id 9
11/9/2024 12:06:59 PM   15      Refreshed lease for message 1
11/9/2024 12:06:59 PM   4       New message received with id 10
11/9/2024 12:07:00 PM   12      Refreshed lease for message 2
11/9/2024 12:07:00 PM   12      New message received with id 11
11/9/2024 12:07:00 PM   7       Received messages 11, success 0, failed 0, still running 11
11/9/2024 12:07:00 PM   6       Refreshed lease for message 3
11/9/2024 12:07:00 PM   16      New message received with id 12
11/9/2024 12:07:00 PM   14      Refreshed lease for message 4
11/9/2024 12:07:00 PM   14      New message received with id 13
11/9/2024 12:07:00 PM   15      Refreshed lease for message 5
11/9/2024 12:07:00 PM   4       New message received with id 14
11/9/2024 12:07:01 PM   13      Refreshed lease for message 6
11/9/2024 12:07:01 PM   12      New message received with id 15
11/9/2024 12:07:01 PM   6       Refreshed lease for message 7
11/9/2024 12:07:01 PM   16      New message received with id 16
11/9/2024 12:07:01 PM   14      Refreshed lease for message 8
11/9/2024 12:07:01 PM   4       Refreshed lease for message 9
11/9/2024 12:07:02 PM   13      Refreshed lease for message 1
11/9/2024 12:07:02 PM   12      Refreshed lease for message 10
11/9/2024 12:07:02 PM   12      New message received with id 17
11/9/2024 12:07:02 PM   6       Refreshed lease for message 2
11/9/2024 12:07:02 PM   16      Refreshed lease for message 11
11/9/2024 12:07:02 PM   14      Refreshed lease for message 3
11/9/2024 12:07:02 PM   14      Refreshed lease for message 12
11/9/2024 12:07:02 PM   4       Refreshed lease for message 4
11/9/2024 12:07:03 PM   13      Refreshed lease for message 13
11/9/2024 12:07:03 PM   13      Refreshed lease for message 5
11/9/2024 12:07:03 PM   12      Refreshed lease for message 14
11/9/2024 12:07:03 PM   12      New message received with id 18
11/9/2024 12:07:03 PM   7       Received messages 18, success 0, failed 0, still running 18
11/9/2024 12:07:03 PM   16      Refreshed lease for message 6
11/9/2024 12:07:03 PM   16      Refreshed lease for message 15
...
11/9/2024 12:08:38 PM   21      Finished message with id 88 in 00:00:42.4181405
11/9/2024 12:08:38 PM   42      Finished message with id 90 in 00:00:41.9138898
11/9/2024 12:08:38 PM   8       Refreshed lease for message 91
11/9/2024 12:08:38 PM   28      Refreshed lease for message 85
11/9/2024 12:08:38 PM   28      Refreshed lease for message 88
11/9/2024 12:08:39 PM   24      Finished message with id 85 in 00:00:43.9296927
11/9/2024 12:08:39 PM   16      Finished message with id 93 in 00:00:40.6588087
11/9/2024 12:08:39 PM   7       Received messages 100, success 88, failed 0, still running 12
11/9/2024 12:08:39 PM   21      Refreshed lease for message 93
11/9/2024 12:08:39 PM   23      Refreshed lease for message 94
11/9/2024 12:08:39 PM   16      Refreshed lease for message 92
11/9/2024 12:08:40 PM   23      Refreshed lease for message 97
11/9/2024 12:08:40 PM   16      Refreshed lease for message 99
11/9/2024 12:08:40 PM   21      Refreshed lease for message 96
11/9/2024 12:08:40 PM   42      Refreshed lease for message 98
11/9/2024 12:08:40 PM   16      Refreshed lease for message 90
11/9/2024 12:08:40 PM   42      Refreshed lease for message 95
11/9/2024 12:08:40 PM   42      Refreshed lease for message 100
11/9/2024 12:08:40 PM   24      Finished message with id 94 in 00:00:41.3801588
11/9/2024 12:08:40 PM   8       Finished message with id 92 in 00:00:41.9121719
11/9/2024 12:08:40 PM   8       Finished message with id 95 in 00:00:41.1802881
11/9/2024 12:08:40 PM   24      Finished message with id 96 in 00:00:40.9180345
11/9/2024 12:08:40 PM   24      Refreshed lease for message 91
11/9/2024 12:08:40 PM   52      Refreshed lease for message 85
11/9/2024 12:08:41 PM   24      Finished message with id 98 in 00:00:41.3325727
11/9/2024 12:08:41 PM   21      Finished message with id 91 in 00:00:43.1595798
11/9/2024 12:08:42 PM   24      Refreshed lease for message 94
11/9/2024 12:08:42 PM   42      Refreshed lease for message 92
11/9/2024 12:08:42 PM   24      Refreshed lease for message 99
11/9/2024 12:08:42 PM   8       Refreshed lease for message 97
11/9/2024 12:08:42 PM   42      Refreshed lease for message 96
11/9/2024 12:08:42 PM   24      Finished message with id 97 in 00:00:42.6127819
11/9/2024 12:08:42 PM   42      Refreshed lease for message 98
11/9/2024 12:08:42 PM   42      Finished message with id 99 in 00:00:40.4341505
11/9/2024 12:08:42 PM   7       Received messages 100, success 95, failed 0, still running 5
11/9/2024 12:08:42 PM   21      Refreshed lease for message 95
11/9/2024 12:08:42 PM   42      Refreshed lease for message 100
11/9/2024 12:08:43 PM   42      Refreshed lease for message 91
11/9/2024 12:08:43 PM   8       Finished message with id 100 in 00:00:41.1363357
11/9/2024 12:08:44 PM   21      Refreshed lease for message 97
11/9/2024 12:08:44 PM   52      Refreshed lease for message 99
11/9/2024 12:08:44 PM   52      Refreshed lease for message 100
11/9/2024 12:08:45 PM   7       Received messages 100, success 100, failed 0, still running 0

This time it took nearly 2 minutes to process all the messages. Each message is now taking around 40 seconds. Still, all worked.

Let’s now talk about threads. You can see that the examples use multiple threads to handle the messages. In the second execution, there were around 60 active messages at one time, so this created many threads (we can see that at least 50 threads were created based on the log above). Our application scales well and we can’t complain. Seems like async is doing a really good job!

However, what would happen if we moved this code to some other asynchronous platform? For instance, to Python’s asyncio that uses only single thread? We can emulate that in C# by running the code above in a WinForms context that forces continuations to go through one thread. Let’s change the CPU-bound operation duration to 100 milliseconds (to the original value) and let’s run this from the WinForms app now:

11/9/2024 12:19:20 PM   1       New message received with id 1
11/9/2024 12:19:20 PM   1       New message received with id 2
11/9/2024 12:19:20 PM   1       New message received with id 3
11/9/2024 12:19:21 PM   1       New message received with id 4
11/9/2024 12:19:21 PM   1       New message received with id 5
11/9/2024 12:19:22 PM   1       New message received with id 6
11/9/2024 12:19:22 PM   1       New message received with id 7
11/9/2024 12:19:22 PM   1       Refreshed lease for message 1
11/9/2024 12:19:22 PM   1       New message received with id 8
11/9/2024 12:19:22 PM   11      Received messages 8, success 0, failed 0, still running 8
11/9/2024 12:19:23 PM   1       Refreshed lease for message 2
11/9/2024 12:19:23 PM   1       New message received with id 9
11/9/2024 12:19:23 PM   1       Refreshed lease for message 3
11/9/2024 12:19:23 PM   1       New message received with id 10
11/9/2024 12:19:23 PM   1       Refreshed lease for message 4
11/9/2024 12:19:24 PM   1       Refreshed lease for message 5
11/9/2024 12:19:24 PM   1       New message received with id 11
11/9/2024 12:19:24 PM   1       Refreshed lease for message 6
11/9/2024 12:19:24 PM   1       New message received with id 12
11/9/2024 12:19:25 PM   1       Refreshed lease for message 7
11/9/2024 12:19:25 PM   1       Refreshed lease for message 1
11/9/2024 12:19:25 PM   1       Refreshed lease for message 8
11/9/2024 12:19:25 PM   1       New message received with id 13
11/9/2024 12:19:25 PM   11      Received messages 13, success 0, failed 0, still running 13
11/9/2024 12:19:26 PM   1       Refreshed lease for message 2
11/9/2024 12:19:26 PM   1       Refreshed lease for message 9
...
11/9/2024 12:21:15 PM   1       Refreshed lease for message 58
11/9/2024 12:21:15 PM   1       Refreshed lease for message 62
11/9/2024 12:21:15 PM   1       Finished message with id 46 in 00:00:42.5373955
11/9/2024 12:21:16 PM   1       Refreshed lease for message 49
11/9/2024 12:21:16 PM   1       Refreshed lease for message 51
11/9/2024 12:21:16 PM   1       Refreshed lease for message 53
11/9/2024 12:21:17 PM   1       New message received with id 65
11/9/2024 12:21:17 PM   1       Refreshed lease for message 55
11/9/2024 12:21:17 PM   1       Refreshed lease for message 46
11/9/2024 12:21:17 PM   1       Refreshed lease for message 57
11/9/2024 12:21:17 PM   1       Refreshed lease for message 59
11/9/2024 12:21:17 PM   11      Received messages 65, success 43, failed 3, still running 19
11/9/2024 12:21:17 PM   1       Refreshed lease for message 61
11/9/2024 12:21:17 PM   1       Refreshed lease for message 48
...
11/9/2024 12:22:53 PM   11      Received messages 100, success 90, failed 3, still running 7
11/9/2024 12:22:53 PM   1       Refreshed lease for message 97
11/9/2024 12:22:53 PM   1       Refreshed lease for message 99
11/9/2024 12:22:54 PM   1       Refreshed lease for message 94
11/9/2024 12:22:54 PM   1       Refreshed lease for message 96
11/9/2024 12:22:54 PM   1       Finished message with id 95 in 00:00:32.1189187
11/9/2024 12:22:54 PM   1       Refreshed lease for message 98
11/9/2024 12:22:54 PM   1       Refreshed lease for message 100
11/9/2024 12:22:55 PM   1       Refreshed lease for message 95
11/9/2024 12:22:56 PM   1       Finished message with id 96 in 00:00:31.0536654
11/9/2024 12:22:56 PM   1       Refreshed lease for message 99
11/9/2024 12:22:56 PM   1       Refreshed lease for message 97
11/9/2024 12:22:56 PM   11      Received messages 100, success 92, failed 3, still running 5
11/9/2024 12:22:56 PM   1       Refreshed lease for message 96
11/9/2024 12:22:57 PM   1       Refreshed lease for message 98
11/9/2024 12:22:57 PM   1       Refreshed lease for message 100
11/9/2024 12:22:57 PM   1       Finished message with id 97 in 00:00:30.1211740
11/9/2024 12:22:58 PM   1       Refreshed lease for message 99
11/9/2024 12:22:58 PM   1       Refreshed lease for message 97
11/9/2024 12:22:58 PM   1       Finished message with id 98 in 00:00:29.1238261
11/9/2024 12:22:59 PM   1       Refreshed lease for message 98
11/9/2024 12:22:59 PM   1       Refreshed lease for message 100
11/9/2024 12:22:59 PM   11      Received messages 100, success 95, failed 3, still running 2
11/9/2024 12:22:59 PM   1       Finished message with id 99 in 00:00:28.1643467
11/9/2024 12:23:00 PM   1       Refreshed lease for message 99
11/9/2024 12:23:00 PM   1       Finished message with id 100 in 00:00:27.2119750
11/9/2024 12:23:01 PM   1       Refreshed lease for message 100
11/9/2024 12:23:02 PM   11      Received messages 100, success 97, failed 3, still running 0

It wasn’t that bad and we can see that we indeed ran on a single thread. First, notice that now it took nearly 4 minutes to complete. That’s understandable as we now run things on a single thread. Also, notice that each message was taking around 30-40 seconds to complete. That is much longer than before. This is because messages compete for the CPU time and we don’t have any parallelism. It’s also worth noting that we lost 3 messages. That’s not that bad. The system overscaled just a bit and couldn’t deal with the load but the stabilized and finished processing.

Let’s now increase the CPU-bound duration to 1 second and try again:

11/9/2024 12:24:26 PM   1       New message received with id 1
11/9/2024 12:24:27 PM   1       New message received with id 2
11/9/2024 12:24:29 PM   12      Received messages 2, success 0, failed 0, still running 2
11/9/2024 12:24:29 PM   1       New message received with id 3
11/9/2024 12:24:29 PM   1       Refreshed lease for message 1
11/9/2024 12:24:32 PM   12      Received messages 3, success 0, failed 0, still running 3
11/9/2024 12:24:32 PM   1       Refreshed lease for message 2
11/9/2024 12:24:33 PM   1       New message received with id 4
11/9/2024 12:24:34 PM   1       Lost lease for message 3
11/9/2024 12:24:34 PM   1       Refreshed lease for message 1
11/9/2024 12:24:35 PM   12      Received messages 4, success 0, failed 1, still running 3
11/9/2024 12:24:38 PM   12      Received messages 4, success 0, failed 1, still running 3
11/9/2024 12:24:39 PM   1       Refreshed lease for message 2
11/9/2024 12:24:39 PM   1       New message received with id 5
11/9/2024 12:24:39 PM   1       Lost lease for message 4
11/9/2024 12:24:39 PM   1       Refreshed lease for message 1
11/9/2024 12:24:41 PM   12      Received messages 5, success 0, failed 2, still running 3
11/9/2024 12:24:43 PM   1       New message received with id 6
11/9/2024 12:24:44 PM   12      Received messages 6, success 0, failed 2, still running 4
11/9/2024 12:24:46 PM   1       Refreshed lease for message 5
...
11/9/2024 12:31:48 PM   1       Lost lease for message 98
11/9/2024 12:31:51 PM   12      Received messages 99, success 7, failed 88, still running 4
11/9/2024 12:31:52 PM   1       Refreshed lease for message 97
11/9/2024 12:31:52 PM   1       Refreshed lease for message 92
11/9/2024 12:31:52 PM   1       Refreshed lease for message 86
11/9/2024 12:31:54 PM   12      Received messages 99, success 7, failed 88, still running 4
11/9/2024 12:31:54 PM   1       New message received with id 100
11/9/2024 12:31:54 PM   1       Lost lease for message 99
11/9/2024 12:31:54 PM   1       Finished message with id 86 in 00:01:04.5795457
11/9/2024 12:31:57 PM   12      Received messages 100, success 7, failed 89, still running 4
11/9/2024 12:31:57 PM   1       Refreshed lease for message 86
11/9/2024 12:31:57 PM   1       Refreshed lease for message 97
11/9/2024 12:31:58 PM   1       Refreshed lease for message 92
11/9/2024 12:31:59 PM   1       Lost lease for message 100
11/9/2024 12:32:00 PM   12      Received messages 100, success 8, failed 90, still running 2
11/9/2024 12:32:01 PM   1       Refreshed lease for message 97
11/9/2024 12:32:02 PM   1       Refreshed lease for message 92
11/9/2024 12:32:03 PM   12      Received messages 100, success 8, failed 90, still running 2
11/9/2024 12:32:04 PM   1       Refreshed lease for message 97
11/9/2024 12:32:05 PM   1       Refreshed lease for message 92
11/9/2024 12:32:06 PM   12      Received messages 100, success 8, failed 90, still running 2
11/9/2024 12:32:08 PM   1       Refreshed lease for message 97
11/9/2024 12:32:09 PM   12      Received messages 100, success 8, failed 90, still running 2
11/9/2024 12:32:09 PM   1       Refreshed lease for message 92
11/9/2024 12:32:11 PM   1       Refreshed lease for message 97
11/9/2024 12:32:12 PM   12      Received messages 100, success 8, failed 90, still running 2
11/9/2024 12:32:12 PM   1       Finished message with id 92 in 00:00:55.4921671
11/9/2024 12:32:13 PM   1       Refreshed lease for message 92
11/9/2024 12:32:14 PM   1       Refreshed lease for message 97
11/9/2024 12:32:15 PM   12      Received messages 100, success 9, failed 90, still running 1
11/9/2024 12:32:17 PM   1       Refreshed lease for message 97
11/9/2024 12:32:18 PM   12      Received messages 100, success 9, failed 90, still running 1
11/9/2024 12:32:19 PM   1       Refreshed lease for message 97
11/9/2024 12:32:21 PM   12      Received messages 100, success 9, failed 90, still running 1
11/9/2024 12:32:21 PM   1       Refreshed lease for message 97
11/9/2024 12:32:24 PM   1       Refreshed lease for message 97
11/9/2024 12:32:24 PM   12      Received messages 100, success 9, failed 90, still running 1
11/9/2024 12:32:27 PM   1       Refreshed lease for message 97
11/9/2024 12:32:27 PM   12      Received messages 100, success 9, failed 90, still running 1
11/9/2024 12:32:28 PM   1       Finished message with id 97 in 00:00:50.4062632
11/9/2024 12:32:29 PM   1       Refreshed lease for message 97
11/9/2024 12:32:30 PM   12      Received messages 100, success 10, failed 90, still running 0

And here things start to collapse. It took us 8 minutes to process all messages, each of them was taking around 1 minute, and we failed to process 90 out of one hundred. We lost 90% of all of the messages. Our system became unreliable just because we increased the CPU-bound part of the message processing. But why did it break the application exactly? What happened?

You don’t control the priority of continuations

Our application runs three distinct operations in total:

  • Take the message from the queue
  • Refresh the lease of the message
  • Do some processing of the message

Every single time we await the task, we release the thread and let it do something else. Once the IO-bound operation finishes, we schedule it to run on the same thread. However, the order of continuations doesn’t reflect the importance of what we should do.

In order to keep the system stable, we need to refresh the leases. Therefore, if there is any continuation that wants to refresh the lease (the continuation in KeepLease method), it should run before everything else.

Once we don’t have any continuations for refreshing the leases, we should run continuations for message processing. Obviously, if some KeepLease continuation gets scheduled, it should preempt other continuations.

Finally, when we have no continuations for refreshing the leases or processing the messages, we should run the continuation for getting new message from the queue. In other words, we receive a new message only when we have some idle CPU time that we can use to process something more.

Unfortunately, the async in C# doesn’t let you easily prioritize the continuations. However, this is not a problem most of the times because C# uses multiple threads! Once a continuation is free to run, the thread pool will grow to run the continuation earlier if possible. This is not part of the async programming paradigm and you can’t take it for granted. However, when we run things on a single thread, then continuations have no priorities and message processing continuations may stop lease refreshing continuations from running. Even worse, we may run continuation that receives new message from the bus even though we are already overloaded.

Depending on the nature of your platform (be it C# with different synchronization context, Python with single-threaded asyncio, or JavaScript with one and only one thread), you may get different results. Your application may scale well or may fail badly.

Let’s fix it

We can fix this issue in many ways. Conceptually, we need three different queues: the first one represents the lease refreshments, the second is for message processing, and the third is for getting new message from the bus. We would then have one processor that would check each of the queues in order and execute the operations accordingly. Unfortunately, rewriting the application from async paradigm to a consumer with multiple queues is not straightforward.

Instead, we can reorder the continuations. The trick is to introduce a priority for each continuation. We do the following:

  1. We store a desired priority of continuations running on a thread
  2. When a continuation wants to run, it checks if the desired priority is equal to the priority of the continuation
  3. If it’s equal, then the continuation resets the desired priority to some invalid value and continues
  4. Otherwise, the continuation bumps the priority if possible and lets other continuations run

The protocol is based on the following idea: some continuation sets the desired priority to be at least the priority of the continuation and then lets other continuations to run. If there are other continuations of lower priority, they will simply release the CPU and reschedule themselves. If there are continuations of some higher priority, they will bump the desired priority. And if there are no continuations, then th original continuation will finally get the CPU, run the code, and reset the priority, so other continuations can run the same dance over and over. Here is the code:

public static async Task Prioritize(int priority)
{
	var cookie = Guid.NewGuid();

	while (true)
	{
		if (CurrentSettings.DesiredPriority == priority && CurrentSettings.Cookie == cookie)
		{
			CurrentSettings.DesiredPriority = -1;
			CurrentSettings.Cookie = Guid.Empty;
			return;
		}
		else
		{
			if (CurrentSettings.DesiredPriority < priority)
			{
				CurrentSettings.DesiredPriority = priority;
				CurrentSettings.Cookie = cookie;
				await Task.Yield();
				continue;
			}
			else
			{
				await Task.Yield();
				continue;
			}
		}
	}
}

We need to run this method every single time when we run await. For instance, KeepLease becomes this:

public static async Task KeepLease(Message message)
{
	await Task.Yield();
	await Prioritize(3);

	while (message.WasFinished == false) // This is unsafe according to memory model
	{
		await Task.Delay(CurrentSettings.RefreshDelay);
		await Prioritize(3);

		if (DateTime.Now > message.LastRefreshTime + CurrentSettings.MessageLeaseTimeout)
		{
			message.WasLost = true;
			CurrentStats.Lost++;
			Log($"Lost lease for message {message.Id}");
			return;
		}
		else
		{
			await Task.Delay(CurrentSettings.RefreshDuration);
			await Prioritize(3);
			Log($"Refreshed lease for message {message.Id}");
			message.LastRefreshTime = DateTime.Now;
		}
	}

	CurrentStats.ProcessedSuccessfully++;
}

You can find the full snippet here. Let’s see it in action:

11/9/2024 12:35:42 PM   1       New message received with id 1
11/9/2024 12:35:43 PM   1       New message received with id 2
11/9/2024 12:35:45 PM   12      Received messages 2, success 0, failed 0, still running 2
11/9/2024 12:35:45 PM   1       Refreshed lease for message 1
11/9/2024 12:35:46 PM   1       Refreshed lease for message 2
11/9/2024 12:35:48 PM   12      Received messages 2, success 0, failed 0, still running 2
11/9/2024 12:35:48 PM   1       Refreshed lease for message 1
11/9/2024 12:35:49 PM   1       Refreshed lease for message 2
11/9/2024 12:35:51 PM   12      Received messages 2, success 0, failed 0, still running 2
11/9/2024 12:35:51 PM   1       Refreshed lease for message 1
11/9/2024 12:35:52 PM   1       Refreshed lease for message 2
11/9/2024 12:35:53 PM   1       New message received with id 3
11/9/2024 12:35:54 PM   12      Received messages 3, success 0, failed 0, still running 3
11/9/2024 12:35:55 PM   1       Refreshed lease for message 1
11/9/2024 12:35:56 PM   1       Refreshed lease for message 3
11/9/2024 12:35:57 PM   12      Received messages 3, success 0, failed 0, still running 3
11/9/2024 12:35:58 PM   1       Refreshed lease for message 1
11/9/2024 12:35:58 PM   1       Refreshed lease for message 2
11/9/2024 12:35:59 PM   1       Refreshed lease for message 3
11/9/2024 12:36:00 PM   12      Received messages 3, success 0, failed 0, still running 3
11/9/2024 12:36:01 PM   1       Refreshed lease for message 1
11/9/2024 12:36:02 PM   1       Refreshed lease for message 3
11/9/2024 12:36:03 PM   12      Received messages 3, success 0, failed 0, still running 3
...
11/9/2024 1:09:00 PM    1       Refreshed lease for message 100
11/9/2024 1:09:00 PM    1       Refreshed lease for message 98
11/9/2024 1:09:02 PM    12      Received messages 100, success 97, failed 0, still running 3
11/9/2024 1:09:03 PM    1       Refreshed lease for message 99
11/9/2024 1:09:03 PM    1       Refreshed lease for message 100
11/9/2024 1:09:04 PM    1       Refreshed lease for message 98
11/9/2024 1:09:05 PM    1       Finished message with id 98 in 00:00:46.5274062
11/9/2024 1:09:05 PM    12      Received messages 100, success 97, failed 0, still running 3
11/9/2024 1:09:06 PM    1       Refreshed lease for message 100
11/9/2024 1:09:06 PM    1       Refreshed lease for message 99
11/9/2024 1:09:07 PM    1       Refreshed lease for message 98
11/9/2024 1:09:08 PM    12      Received messages 100, success 98, failed 0, still running 2
11/9/2024 1:09:09 PM    1       Refreshed lease for message 100
11/9/2024 1:09:09 PM    1       Refreshed lease for message 99
11/9/2024 1:09:11 PM    12      Received messages 100, success 98, failed 0, still running 2
11/9/2024 1:09:12 PM    1       Refreshed lease for message 99
11/9/2024 1:09:12 PM    1       Refreshed lease for message 100
11/9/2024 1:09:14 PM    12      Received messages 100, success 98, failed 0, still running 2
11/9/2024 1:09:15 PM    1       Refreshed lease for message 99
11/9/2024 1:09:15 PM    1       Refreshed lease for message 100
11/9/2024 1:09:17 PM    1       Finished message with id 99 in 00:00:51.5793525
11/9/2024 1:09:17 PM    1       Refreshed lease for message 99
11/9/2024 1:09:17 PM    1       Refreshed lease for message 100
11/9/2024 1:09:17 PM    12      Received messages 100, success 99, failed 0, still running 1
11/9/2024 1:09:19 PM    1       Refreshed lease for message 100
11/9/2024 1:09:20 PM    12      Received messages 100, success 99, failed 0, still running 1
11/9/2024 1:09:22 PM    1       Refreshed lease for message 100
11/9/2024 1:09:23 PM    12      Received messages 100, success 99, failed 0, still running 1
11/9/2024 1:09:25 PM    1       Refreshed lease for message 100
11/9/2024 1:09:26 PM    12      Received messages 100, success 99, failed 0, still running 1
11/9/2024 1:09:27 PM    1       Refreshed lease for message 100
11/9/2024 1:09:29 PM    12      Received messages 100, success 99, failed 0, still running 1
11/9/2024 1:09:29 PM    1       Refreshed lease for message 100
11/9/2024 1:09:30 PM    1       Finished message with id 100 in 00:00:54.5227973
11/9/2024 1:09:32 PM    1       Refreshed lease for message 100
11/9/2024 1:09:32 PM    12      Received messages 100, success 100, failed 0, still running 0

We can see that the code runs much slower and it takes 35 minutes to complete. However, all messages are processed successfully and the code scales automatically. We don’t need to manually control the thread pool size, but the application simply processes fewer or more messages depending on the actual CPU-bound processing time.

Summary

async programming is very hard. We were told many times that it’s as simple as putting async here and await there. C# did a lot to get rid of deadlocks as nearly every platform now uses no synchronization context (as compared with old ASP.NET which had its own context or all the desktop apps that were running with a single thread). Also, C# uses a thread pool and can fix many programmer’s mistakes that can limit the scalability.

However, asynchronous programming can be implemented in many other ways. You can’t assume that it will use many threads or that the coroutine will be triggered immediately. Many quirks can decrease the performance. Python’s asyncio is a great example of how asynchronous programming can work much differently, especially if you take the Python’s performance into consideration. What’s IO-bound in C#, can easily become CPU-bound in Python because Python is way slower.

]]>
https://blog.adamfurmanek.pl/2024/11/09/async-wandering-part-15/feed/ 0