This is the sixth part of the State Machine Executor series. For your convenience you can find other parts in the table of contents in State Machine Executor Part 1 — Introduction

Let’s revisit our execution function:

void Run(StateMachine machine, string initialTransition){
	string state = null;
	string currentTransition = initialTransition;
	StoreHolder store = ReadStore();
	bool wasTimedOut = false;
	do {
		if(machine.IsTimedOut(state)){
			wasTimedOut = true;
			currentTransition = "timeout-handler-transition";
		}
		bool hadException = false;
		try{
			result = machine.RunTransition(currentTransition, store);
		}catch(Exception e){
			hadException = true;
		}
		state = result.CurrentState;
		currentTransition = hadException ? "exception-handler-transition" : result.NextTransition;
		MergeAndPersist(store, result.StoreChanges);
		ExecuteActions(result.ActionsToExecute);
		if(result.Suspend) {
			break;
		}
		if(wasTimedOut){
			break;
		}
	}while(!machine.IsCompleted(state));
}

void Run(StateMachine machine, string initialTransition){

string state = null;

string currentTransition = initialTransition;

StoreHolder store = ReadStore();

bool wasTimedOut = false;

do {

if(machine.IsTimedOut(state)){

wasTimedOut = true;

currentTransition = "timeout-handler-transition";

}

bool hadException = false;

try{

result = machine.RunTransition(currentTransition, store);

}catch(Exception e){

hadException = true;

}

state = result.CurrentState;

currentTransition = hadException ? "exception-handler-transition" : result.NextTransition;

MergeAndPersist(store, result.StoreChanges);

ExecuteActions(result.ActionsToExecute);

if(result.Suspend) {

break;

}

if(wasTimedOut){

break;

}

}while(!machine.IsCompleted(state));

}

It’s quite quite already as we extended it with a support for timeouts, exception handling, state management, and actions. We’re going to make it even more complex now.

Specifically, we will focus on the following line:

result = machine.RunTransition(currentTransition, store);

1	result = machine.RunTransition(currentTransition, store);

This line triggers the transition and makes the state machine to execute one piece of code. In this post, we’re going to discuss how to run transitions in parallel.

Table of Contents

Problem statement

When discussing state machines, we typically think in terms of the state machine having one “state” at a time. However, that’s not very realistic. We often need to be able to run multiple things in parallel. Some of them are contained within a single state/transition (e.g., reading multiple files in parallel instead of one by one), some are independent sub-workflows (e.g., one workflow sending emails, another one uploading files), and some are tightly connected with each other (e.g., one workflow validates something and can affect the other workflow which tries to process the query at the same time).

The problem we’re facing is: how to run multiple transitions in parallel, manage the state changes happening with those multiple transitions, and how to deal with the interconnections between transitions. Let’s explore some solutions.

Some programming models

Many programming models for workflows have been developed over the years. Let’s see some of them, without going into many details and formalities.

Bulk-synchronous-parallel / Superstep

Bulk-synchronous-parallel (BSP) model consists of a series of supersteps with synchronization and communication in between. Each superstep is a logical computation performed by a node/processor/unit. Once the computation is done, nodes exchange data and wait until all nodes reach the same step.

This model is quite easy to analyze, but is rather rigid and inflexible in its structure. It typically goes with batch-like approach in which we divide the amount of work between nodes, send it for processing, and then wait for results.

This model is very popular in state machines.

Fork & Join

In this approach, we fork the process into many copies, and each copy performs similar work on its won. It’s more flexible than BSP because we can use work stealing and sometimes we can avoid synchronization.

This model is often used in parallel processing of collections or in handling web requests.

Threading

Threading is a low-level approach to computation. Each thread is completely independent and can do whatever it wants. Threads synchronize only when needed, and there is no clear structure of their behavior.

This model is very powerful, but quite hard to analyze and reason about.

Trails as a middle ground

BSP is often used in the state machines because it can be represented easily in terms of states and transitions. We can think of it as one transition between two states that is executed many times in parallel. While the model is simple, it’s also quite inflexible as it requires that all the parallelization does the same work (but with different data).

Threads on the other hand are very flexible, but they require synchronization. Effectively, each thread must have some kind of a handle to the other threads it wants to synchronize with. Things get much more complex when those other threads fork on their own, as now the synchronization involves “group of threads” which are often represented as jobs or with a parent-child relationship.

To keep the flexibility of threads but without the rigidness of BSP, we can introduce something in between – namely a trail.

Trail structure

Trail is like a thread but it doesn’t “synchronize on its own”. It only states the “requirements” for it to continue and the platform takes care of making sure the requirements are met. We can think of trails as of threads and named mutexes managed by the platform.

A trail is an object with the following properties:

class Trail {
    string Name;
    string CurrentState;
    string NextTransition;
    string[] BlockingTrails;
}

class Trail {

string Name;

string CurrentState;

string NextTransition;

string[] BlockingTrails;

}

Name is used to synchronized with other trails. CurrentState and NextTransition simply indicate where the trail is and what it’s going to do next. BlockingTrails is a collection of other trails that need to complete first before the current trail can move on.

When starting a state machine from scratch, we simply have one trail with the initial state and transition. It can have any name.

To implement trail spawning, we extend the result of a transition to have a list of trails to continue with:

class TransitionResult {
    Trail[] NextTrails;
}

class TransitionResult {

Trail[] NextTrails;

}

This way, one trail can fork into many sub-trails. Each sub-trail is independent and can do whatever it wants.

To implement joins, we simply deduplicate trails based on their name. We also assume that if duplicates appear, they must be in the same state and transition.

Let’s see how to use them.

Parallel collection processing

Let’s say that we want to process a collection of elements in parallel.

We start with the initial state that splits the work:

Trail initial = new Trail(Name="work"...);

1	Trail initial = new Trail(Name="work"...);

When called, the transition splits the work and returns one trail for each element:

return new TransitionResult{
   NextTrails = new {
       new Trail(Name="work.1", ...),
       new Trail(Name="work.2", ...),
       new Trail(Name="work.3", ...),
       ...
       new Trail(Name="work.n", ...),
   }
}

return new TransitionResult{

NextTrails = new {

new Trail(Name="work.1", ...),

new Trail(Name="work.2", ...),

new Trail(Name="work.3", ...),

...

new Trail(Name="work.n", ...),

}

All these trails are executed by the platform, hopefully in a parallel manner. Now, they need to synchronize at the very end, so each of those worker trails returns the same result of the final transition;

return new TransitionResult {
    NextTrails = new {
        new Trail(Name="work", BlockingTrails= new { "work." }, ...)
    ]
}

return new TransitionResult {

NextTrails = new {

new Trail(Name="work", BlockingTrails= new { "work." }, ...)

]

}

Now, the platform needs to do the following:

The platform needs to deduplicate all the trails. Each worker trail returns trails with the same name (work), so the platform knows what to do.
The platform needs to wait until all the worker trails finish processing items. This is achieved thanks to the BlockingTrails set to work. (notice the dot at the end). The platform needs to wait until all trails with names starting with work. finish the work, and then it can proceed with the new deduplicated trail

This way, we can achieve typical parallel collection processing.

Running child state machines

Running child state machines is quite straightforward. Let’s say that we want to continue execution and also start something completely independent on the side. We simply return:

return new TransitionResult{
   NextTrails = new {
       new Trail(Name="work", ...),
       new Trail(Name="some_side_work", ...)
   }
}

return new TransitionResult{

NextTrails = new {

new Trail(Name="work", ...),

new Trail(Name="some_side_work", ...)

}

At some point, the side work completes. It simply indicates that it reached the end by returning empty NextTrails collection.

Summary

Trails provide a flexible approach without the complex overhead of manual synchronization. They are more flexible than BSP which is crucial when we want to run independent child state machines.