Uncategorized – Random IT Utensils

State Machine Executor Part 5 — Streaming

afish — Fri, 24 Oct 2025 14:55:52 +0000

This is the fifth part of the State Machine Executor series. For your convenience you can find other parts in the table of contents in State Machine Executor Part 1 — Introduction

Being able to describe side effects instead of executing them may sound great, but it has one significant drawback – the side effects need to be described completely before they are handed off to the executor. Building an object describing the action may cause significant memory usage. Let’s see how to fix that.

Streaming

We’d like to be able to stream the data. Let’s say that we have the following Action describing web request to execute:

class HttpAction {
	public string Url;
	public string Method;
	public byte[] Body;
}

See the Body field. It holds the entire payload to be sent. Creating such a payload and storing it in memory will increase the memory usage and decrease scalability. To avoid that, we should have something like this:

class HttpAction {
	public string Url;
	public string Method;
	public Stream Body;
}

Looks great, but it doesn’t solve any problem. Remember that the state machine must create the action object and hand it over to the executor. The state machine won’t be able to run any code until the action is executed. This means that the Stream must be filled with the data, so we still have the problem with high memory usage.

Instead of passing the stream, we could pass a stream generator. That could be a lambda or some other interface with yield keyword:

class HttpAction {
	public string Url;
	public string Method;
	public IEnumerable Body;
}

Looks slightly better, but still has issues. If Body wraps any local variables into a closure, then the memory will not be released until the stream is read. Not to mention that it’s much harder to persist the HttpAction object to provide reliability.

Solution

To solve the problem, we need to effectively stream the data. However, since the actions are executed after the state machine is done, we need to stream the data somewhere else – to a local file.

The executor can provide the following abstraction:

class Env{
	public FileWrapper CreateFile();
	public FileWrapper ReadFile(string identifier);
}

class FileWrapper {
	public string Identifier;
	public File FileHandle;
	public void Commit();
}

Now, the state machine can call CreateFile to get a temporary file. Next, the state machine can stream the content to the file. Finally, the state machine calls Commit to indicate to the executor that the file is ready to be persisted. The executor can then upload the file to the persistent store.

Last but not least, we need to modify the action definition:

class HttpAction {
	public string Url;
	public string Method;
	public string BodyFileIdentifier;
}

The action executor can now stream the body from the file. If something fails, the file can be retrieved from the persistent storage and the action can be retried.

This solution is not perfect, though. The data is streamed twice which slows everything down. That’s an obvious trade-off.

State Machine Executor Part 2 — Fault tolerance

afish — Mon, 13 Oct 2025 08:05:00 +0000

This is the second part of the State Machine Executor series. For your convenience you can find other parts in the table of contents in State Machine Executor Part 1 — Introduction

The code we implemented in the last part is unable to recover from machine crashes. If the process dies midway, we need to start it from scratch. Let’s fix that.

Before going into details, let’s think how we could be triggering the state machine. We could run it on an API call – someone calls the endpoint, we start processing the request, and we trigger the execution along the way. If something dies, the caller will probably retry their call. Another approach is to use a queue. We receive a message from the queue, we start the processing, and we trigger the state machine. If something breaks, the message will get retried. Other scenarios may be similar.

In all of those scenarios, we get a retry due to some other mechanisms. Once we retry, we want to resume the state machine processing. This is very simple conceptually. We just need to recreate the state machine and retrigger the transition. Let’s do that.

State management

The hard part in retrying this way is recovering of the state. The state machine is most likely stateful and calculates something as it goes through the states. We can tackle this in many ways: preserve the whole state machine, provide an interface to read and write data that the state machine would use, or provide a temporary object.

Preserving the state machine in its entirety may be possible, but has many drawbacks. First, we may be unable to serialize the object as we don’t even know what it consists of (it may be loaded dynamically and not owned by us). Second, some objects may be not serializable by definition (like locks, things tied to OS data like threads, etc.). Third, this may impose technological limits (like the programming language you use etc.).

Another approach is to have an interface for the state machine to read and write some pieces of information. For instance, the state machine executor could expose a simple key-value store for the data. Each read and write would be effectively handled by the state machine executor. While this is quite easy, it lacks transactions interleaved with other side effects.

Another approach is a simple dictionary that the state machine can use. This lets the state machine effectively couple the transaction with other side effects. The state machine executor can persist both the changes to the dictionary and the description of the actions in one transaction.

Let’s take this last approach and see how it works. We now would like to have the following object for keeping the changes:

class StoreHolder {
	Dictionary Store;
}

Now, the state machine needs to describe modifications to this store:

class TransitionResult {
	...
	Dictionary StoreChanges;
}

Also, the state machine executor needs to pass this object to the state machine:

class StateMachine {
	...
	TransitionResult RunTransition(string transitionName, StoreHolder store) {...}
}

Finally, this is how we execute the state machine now:

void Run(StateMachine machine, string initialTransition){
	string state = null;
	string currentTransition = initialTransition;
	StoreHolder store = ReadStore();
	do {
		result = machine.RunTransition(currentTransition, store);
		state = result.CurrentState;
		currentTransition = result.NextTransition;
		MergeAndPersist(store, result.StoreChanges);
		ExecuteActions(result.ActionsToExecute);
	}while(!machine.IsCompleted(state));
}

Looks nice. Let’s see what problems we may have with this approach.

Persisting the store

Let’s now see some pros and cons of this approach.

By persisting the store at once, we can easily identify if there are two state machines executing at the same time. This would result in concurrent writes which we can find by using the versions or locks.

By saving the changes after the state machine finishes the transition, we can have the outbox behavior. We persist the store changes and the information what actions to execute. This way, when we can retry the actions in case of crashes. We’ll see that in details in the next part.

This approach is also technology-independent. It’s easy to serialize the key-value dictionary in any technology. However, if the state machine decides to put some complex objects in the store, they need to be serializable and deserializable. Also, they need to be backwards compatible when the state machine code changes. Let’s explore that a little more.

Let’s say that the state machine preserves something like Store["property"] = someObject. If the state machine executor would like to serialize the dictionary now, the someObject value must be serializable. While this sounds trivial, this is often not the case. For instance, many types in Python are not serializable by the built-in solutions like json package. Similarly, objects in Java must implement the Serializable interface or adhere to the requirements of the serialization library. While this is not a big issue, this puts some requirements on the state machine.

Much bigger issues may happen when deserializing the value. First, it may be impossible to deserialize the someObject value due to lack of parameterless constructor or other library requirements. This is not a rare issue.

Worse, we now need to deal with backward and forward compatibility. Let’s say that the state machine is paused and then resumed on some other node. This can be due to a retry or rolling deployment. When the execution is retried, it may happen on either newer or older code version. This means that the store must be deserialized using a different code. If you use a binary serializer, this will most likely cause problems. The same issue may happen if the newer code would like to examine the store written by some older version of the code, like some other state machine execution.

The easiest solution to this problem is to avoid storing complex object entirely. This simplifies the serialization and the deserialization process. However, it doesn’t solve the issue with schema changes and compatibility.

If you need to store complex objects and still want to access stores created by the older state machines, it may be beneficial to store two versions of the store. One version is serialized using a binary serializer that can serialize and deserialize objects of any kind. The other version is stored using some regular JSON serializer that can only serialize the data but can’t deserialize it into complex objects. You would then examine this JSON data as raw JSON objects.