Random IT Utensils https://blog.adamfurmanek.pl IT, operating systems, maths, and more. Sun, 18 Jan 2026 16:07:23 +0000 en-US hourly 1 https://wordpress.org/?v=6.7.1 Pitless Pit Part 3 — Can we interact with the outside world? https://blog.adamfurmanek.pl/2026/01/18/pitless-pit-part-3/ https://blog.adamfurmanek.pl/2026/01/18/pitless-pit-part-3/#respond Sun, 18 Jan 2026 16:06:40 +0000 https://blog.adamfurmanek.pl/?p=5243 Continue reading Pitless Pit Part 3 — Can we interact with the outside world?]]>

This is the third part of the Pitless Pit series. For your convenience you can find other parts in the table of contents in Part 1 — Furmanek Test for consciousness

Let’s assume that we live in a simulation. Can we interact with the outside world, i.e., with the world of the beings that simulate us? (Un)surprisingly, the answer may be “yes”. Let’s see how.

Can your script talk to you?

Can your script talk to you? It may seem impossible for a script/program/application/code to interact with the external world, however, it is in fact pretty reasonable. Take any of your scripts and see that it already interacts with our universe – the script most likely produces side effects like printing documents, sending packets over the network, displaying things on the screen, or generating sound. All of that impacts our “real” world, so your beloved Python script indeed interacts with the real, physical universe.

Can your script figure out how it impacted the real universe?

Now, can your beloved python script figure out how it impacted our real universe? Maybe. This depends on the “API” the script can use. Let’s see an example.

If your script has access to both speaker and microphone. In that case, the script can produce a sound, and then wait for the microphone to pick the sound back. Depending on how the script can control the speaker (its direction, power, etc.), it can measure how the sound bounces in the room. That would work as a very basic sonar.

In other words, your script could deduce the size and shape of the room the computer sits in. Obviously, that would be highly limited with speaker and microphone, but that could be improved with other devices. Long story short is, the script can learn a lot about the surrounding of the device running the script.

Can your script “break” your world?

Let’s carry on. Can the script “break” our universe? Can it change our world significantly? Again, that depends on the interface.

Let’s continue with the speaker metaphor. If the speaker is powerful enough, the script could produce a sound that would shatter the glass or break windows.

Is our universe a simulation?

Is our universe a simulation? That I don’t know. I don’t know if there is “an interface” that would help us interact with the beings that simulate us. However, if they do simulate us, then they probably do it for a reason, so they must have a way to peek into the state of the simulation. This means that there should be something that would let us “talk to them”, something equivalent to displaying labels in the UI.

]]>
https://blog.adamfurmanek.pl/2026/01/18/pitless-pit-part-3/feed/ 0
Pitless Pit Part 2 — How to prove that we live in a simulation? https://blog.adamfurmanek.pl/2026/01/17/pitless-pit-part-2/ https://blog.adamfurmanek.pl/2026/01/17/pitless-pit-part-2/#respond Sat, 17 Jan 2026 14:57:44 +0000 https://blog.adamfurmanek.pl/?p=5239 Continue reading Pitless Pit Part 2 — How to prove that we live in a simulation?]]>

This is the second part of the Pitless Pit series. For your convenience you can find other parts in the table of contents in Part 1 — Furmanek Test for consciousness

Do we live in simulation? I don’t know. I don’t know how to prove it either, especially that I can’t ever disprove solipsism. However, we can imagine what it would take to simulate a universe, and then reason whether it’s feasible or not.

First rule of simulation

Historically, we experiment with any kind of simulations for two reasons:

  • We want to experiment with something “new”. For example, we create esoteric programming languages to check how stronger type systems would perform with “real” code. We don’t focus to squeeze highest performance, but we want to run the simulation once, validate or disprove our hypothesis and then move on to something else.
  • We want to represent something from the real world and run it faster to get some meaningful results that would help us shape the future. In this case, we want to run the simulation multiple times, so we want to make it fast and performant.

It seems reasonable to assume that if we live in a simulation, then that they who simulate us are doing the same. If they want to experiment with something “new”, then we probably can’t deduce anything from our world.

However, if they want to represent the “real world”, then they’ll probably want to be efficient. This leads to the first rule of simulation – the time needed to generate the next simulation step should have a constant upper-bound. In other words, once we simulated the “current” state of the simulation, then we should know upfront how long it will take (in the worst case) to simulate the “immediately next” state of the simulation.

Let’s now assume that they go by the first rule of simulation. What does it mean in practice?

How to simulate our universe

Let’s now think for a second how we can simulate our universe. It seems that the simulation must be discreet, i.e., it must proceed in steps. Therefore, we would have a code like this:

var state = getInitialState();
for(var step = 0; ; ++step){
    state = generateNextState(state);
}

If the first rule of simulation holds, then generateNextState‘s execution time must have an upper bound. Let’s now consider how it could be implemented to simulate our universe:

  • Iterating over every point in space-time (assuming that it’s discreet) probably won’t meet the rule. We believe our universe expands, so each iteration would need to go over more and more coordinates. This would work if the size of the universe was constant, but we don’t think so as of now.
  • Iterating over particles won’t work either, as the number of particles changes.
  • Iterating over “energy” or “electric charge” or similar things might work, though. As long as these things are ultimately traveling in packets, we can iterate over every single packet and track how it evolves. This seems to be a feasible approach to simulation.
  • Representing the state as a polynomial (or other formula) with fixed number of coefficients might work as well. We could the generate the next state of the simulation by feeding the output of the previous calculation through the formula, similarly to what recurring neural nets do.

It seems there should be possible ways to simulate the universe. We can now draw the following conclusions;

  • If we are able to prove that some approach indeed works, then we probably should assume that we live in a simulation.
  • On the other hand, if we can prove that these approaches don’t work and there are no other ways to do that, then it would prove that we do not live in a simulation.

Can we prove or disprove any of the above? That seems to be unclear yet.

]]>
https://blog.adamfurmanek.pl/2026/01/17/pitless-pit-part-2/feed/ 0
Types and Programming Languages Part 21 – Code is read many times https://blog.adamfurmanek.pl/2026/01/10/types-and-programming-languages-part-21/ https://blog.adamfurmanek.pl/2026/01/10/types-and-programming-languages-part-21/#respond Sat, 10 Jan 2026 07:13:02 +0000 https://blog.adamfurmanek.pl/?p=5222 Continue reading Types and Programming Languages Part 21 – Code is read many times]]>

This is the twentieth first part of the Types and Programming Languages series. For your convenience you can find other parts in the table of contents in Part 1 — Do not return in finally

We often hear that code is written once but read many times. Due to that, we favor code “quality”, “clarity”, or “readability” in other to make our lives easier. Unfortunately, it’s not that simple and optimizing for “code being read many times” is not what we really need in business line applications (and most of the software) because we change the hats that we wear.

Why do we read code

It’s true that we read code many times. But why do we read code? It’s not like we do that for the sake of reading. We don’t read it from cover to cover and we don’t try to memorize it. So why do we read code?

Code is a representation of concepts, processes, and standard operating procedures. Code is not the crucial part or the main citizen, but it’s just a medium that we use to encode something else. Therefore, we read the code to understand the concept behind the code. But again, why do we do that?

We want to understand the concept because we either want to change it, or we need to troubleshoot it.

Changing the concept (= changing the code) requires us to understand many moving parts, how they interoperate, how they deal with data consistency, concurrency, side effects, and many other things. Changing the code forces us to think about the future, foresee challenges, predict future changes to the code and how it’s going to evolve. When thinking about the flow, we start “somewhere” and think where we can get to from that place. We take the starting point and think about many possible evolutions.

Troubleshooting the code is much different. When troubleshooting, we think about the past. We already know something happened. We already know our assumptions were wrong, we missed some edge case, or we know for a fact something has happened. When troubleshooting the code we much more often trace back. We take the end situation and want to trace its past to understand “how we got here”.

Practically speaking, this means that when we want to change the code, we often start from the top and go from there. We start with an API entrypoint, a facade, a main object that triggers the processing. When we troubleshoot, we often start from the very nested piece of code that caused a side effect that we observed somewhere else. The side effect that we deem invalid, and which we want to troubleshoot.

But why do we want to change the code or troubleshoot it? Now it becomes tricky, as each audience have different reasons. Depending on which hat we’re wearing at the moment, we’ll have different reasons to look into the code.

Not every line of code serves the same purpose

We need to understand that every line of code has a purpose, but not every line of code serves the same purpose. Even lines standing next to each other may have completely different purpose. Let’s see a sample pseudocode:

def foo(param1: Type1, param2: Type2){
    recordEnteringMethod(foo, param1, param2);
    log(param1, param2);
    throwIfInvalid(param1, param2);
    let result = triggerBusinessProcessing(param1, param2);
    logDebuggingMessage(result);
    modifyExternalSystems(result);
    updateUi(result);
    recordExitingMethod(foo, param1, param2, result);
}

This sample shows a typical business code. At first, we may say that this code serves a business purpose – it calculates something and shows it to the user. It’s probably a facade hiding some complexity. However, things look different when we break it down line by line.

Metrics

First, we recordEnteringMethod. This is purely for monitoring and observability perspective. End users don’t care about these metrics, developers rarely care about that either, but ops team is very interested in this piece. This line of code is for “observability”. Similarly, the last line recordExitingMethod provides metric data points or traces.

These two lines repeat in many places, and we (developers) often consider them a noise. That’s why we try to hide them with Aspect Oriented Programming, dynamic code modifications, attributes/annotations, and other trickery that simply hides these lines from the code base.

Logging

Next, we log. Here, we want to call log.info or log.debug, depending on our environment.

We want to put a message in our logs that will help us troubleshoot what happened. We may need these logs for two things: first, to configure observability like exception monitoring or anomaly analysis; second, to troubleshoot what happened

The former case is every similar to recording metrics. We want to capture in a structured way what happened. We do this to support the latter, so we can travel the time and trace back what happened when we troubleshoot the code. End users don’t care about these logs. Ops team is partially interested. This line is however super important for the developers providing support.

Contracts

Moving on, we throwIfInvalid. This line serves many purposes.

First, it may be helpful for the caller to show them that they messed up. They broke the contract, didn’t adhere to the requirements, didn’t meet the preconditions, etc. This line of code supports the “unhappy path”. To put it differently, this line of code only slows us down on a “happy path”. In a perfect world, this line would never be needed as we would never end up in an invalid state (that’s an oversimplification, I know).

Second, this line protects our code from running into an even worse state. If we already know something is wrong, it’s generally good to terminate and crash. Otherwise, we risk breaking the data even more and causing irreversible damages.

End users don’t care about this line of code. They do care about having their data consistent, though. However, the data should also be protected in other places, like in the UI.

Ops teams don’t care much about this line. As long as the software “works” as in “executes all the lines successfully”, ops teams are “okay” with breaking the data consistency and other bugs. Obviously, this may later turn into metric spikes and alerts going off, so the ops teams are indirectly interested in making sure the data is not broken.

Last but not least, it’s developers who care about this line. However, it’s often not “us”. It’s “these other developers” that called our code. It’s our callers that are interested in this line. Our code could just go and try to apply the (invalid) changes the caller wants, but we don’t want that. However, the exception will most likely be visible in the caller’s space and will ping the caller’s ops team.

Business code

Now, we triggerBusinessProcessing. This is the place that all parties are interested in, but in a very different way.

End users are obviously interested in getting things done. They don’t care about our code structure or how we do things internally. They are interested in the side effects, though. Therefore, this line is not interesting for them yet. They ultimately want us to modifyExternalSystems and updateUi. However, even this is misleading. End users don’t care about our databases or the state we preserve, because they don’t interact with them. They care about what they see, so from their perspective the UI is the ultimate source of truth. So the end users care the most about their UIs, not the backends.

Ops teams care slightly less about how we calculate the business result (as long as it doesn’t hammer the CPU and memory), but they are more interested in how we modifyExternalSystems. They care about performance, so they want to make sure we don’t trigger a cascade of crazy modifications that will bring the whole system down. Most importantly, ops teams often see the very very end effect of our actions. They will see that the database CPU spikes, but they don’t understand that it’s because we changed the SELECT query to extract another column which resulted in table scan instead of using an index.

Finally, developers care about triggerBusinessProcessing a lot. They need to make sure the data is consistent and the results match the business documentation. This time, it’s “us” who care. It’s not “these other developers calling our code”.

What we should optimize for

It’s not enough to say that we should optimize for readability because it’s not the same for everyone. Let’s see what we should optimize for then.

Searchability

To read the code, we need to find it. So first, the code must be optimized for searchability. But does it?

When trying to change the code, we need to understand many moving parts. We have enough time to read through it, probably run it locally, or even step through with debuggers. We have IDEs, AI, static analysis, debuggers, web proxies, and other tools helping us to build the comprehensive picture of everything involved. Most importantly, we can start from the top.

Things are different when we troubleshoot. When it’s 4AM on Saturday morning and we are paged by the monitoring system, we need to act fast and avoid wasting time on false positives or dead ends. We know “something happened”, we see the “side effects” like metric spikes, exceptions, weird logs. We don’t have time to build the big picture. Most importantly, we want to find the bottom where it all manifests problems.

How can we find this “bottom”? Here is where the code should be optimized for searchability. As we saw in the previous section, every line serves a different purpose, therefore every line must be optimized differently. Lines like triggerBusinessProcessing are probably not the ones that we’ll see in external systems. However, throwIfInvalid or updateUi will manifest themselves outside of our code and will serve as the starting point.

To make it more specific, we’ll typically start looking from the following:

  • A particular message that appeared somewhere in the UI, in the logs, or in the data entity
  • A metric name that we observed in monitoring systems
  • A static UI element, like a label or name
  • An event name that we observe in queuing systems
  • An endpoint name that we see in the browser’s dev tools

There are many more things that we may start with. They all share the same characteristic, though – we know something for a fact, and now we want to find where this “fact” emerged from. Therefore, we want to optimize for static code analysis and no false positives as much as possible:

  • When emitting metrics, do not concatenate their names dynamically. People will search the codebase for your “full.metric.name.emitted”, so it’s better to have the code use this string in this exact form and in one place only
  • Similarly, when creating UI elements, avoid concatenation where possible
  • Keep your translated elements in one place. People will not look for your variables’ names. They will look for translated messages in their local language that you can’t speak
  • Do not reuse the same message in two distinct places. People may get lost as they don’t know which place they are looking for
  • Have “distinctive” names as much as possible to simplify searchability. “Name” is harder to find than the “Customer Name” which is still harder than the “Customer Name in Local Branch”. The more distinctive the element is, the better
  • Emit a particular metric from one place only and make it carry just one scenario. Metric like success is useless as it doesn’t indicate which operation it refers to. Metric like businessProcessA.success is better. You can also use dimensions or tags
  • Merge your success and error metrics into one. You don’t want to have businessProcessA.success and businessProcessA.error because the ops teams will not want to have two charts on their dashboard. All they are interested in “is it working”. Therefore, have only the businessProcessA.error metric and emit non-zero value when things go wrong
  • Emit zeros on success. Similarly, when things are correct, emit zero to “businessProcessA.error” metric. This makes sure the data points are always there, so the ops team can validate they don’t have a typo in their dashboard configuration. What’s more, they can now configure alerts for missing data points scenario
  • Have predictable names that adhere to the convention. Do not use rare synonyms or uncommon naming schemes, even when the common practices “make no sense” or are “wrong”. The world is not perfect and sometimes we just need to follow the crowd doing inefficient things in order to help the crowd achieve what they need

And here is my hot take: these suggestions may result in less clean code which is okay! For instance:

  • We may get code duplication because we repeat the metric prefix in many places. Yes, this is annoying when writing the code and we need to find ways to avoid that (by generating code, using some readable string interpolation, etc.), but metrics are optimized for the ops team which searches the code at 4AM on Saturday. We do have time to update the code in many places. They don’t have time to understand clever ways of removing duplication at night.
  • Methods with “weird parameters”. For instance, you may have a method emitMetric(bool success, Exception? exception). You may be tempted to have two methods instead: emitSuccess() and emitException(Exception exception) but that will result in emitting the metric in two places. There are ways to deal with that, for instance you can wrap the parameters into an object that validates the scenario and prevents situations like emitMetric(true, new Exception()), or you can deal with these in other ways depending on your programming language. However, we again want to help the ops team to find the source of the metric at night.
  • We may get duplication in our translation files. “Activity Report” may be used in many screens, but we should still have distinct entries in i18n files that will have the same value. Yes, it’s harder to update these translations when changing the code, but it helps people to find the place they are looking for. However, UI elements are less often searched for at night, so here we may look for a better balance between removing code duplication and helping searchability
  • We may need to put foreign names in our code base. We typically write code in English, but it’s sometimes good to use foreign names if they are very visible in other systems (like in UI, metrics, or in business documentation). For instance, many years ago there was an insurance company that wanted to offer a new product. This product was then promoted by a commercial starring an actor (let’s call him Wiktor) that everyone knew from a popular movie. The business team often referred to this product as “Ubezpieczenie Wiktorowe” which means “Wiktor’s insurance”. Wiktor had no idea at all that the whole company was using his name when talking about a business product. Now, would you rather look for a method named specialStartingDiscountOfferHandler or wiktorsInsuranceOfferHandler or even ubezpieczenieWiktoroweHandler? I can tell you the last version was pretty effective

Even though the code is less “clean”, it lets us get the job done faster. It may be easy for you to find the code when you know the code base, but the ops team may struggle a lot at 4AM. It’s up to you if you want to be right or have a good night sleep.

Easy navigation

Next, we should optimize our for easy navigation. Once we find the code line that we are interested in, we most likely need to follow the code either to the places it calls or to the places that call this line.

Things are easy when we are using IDEs. They are very good in deciphering object types, method overloads, polymorphism, implicit parameters, defaults, and so on. Unfortunately, things are much harder when we are not using any IDE, but a basic text editor or a web browser.

The reality is that we browse the code much more often in a web browser than in an IDE. Sure, you have your project checked out locally and can spin up your IDE in seconds. Can your ops team do the same? Can you do that for projects from your neighbour teams? Can you do that easily when navigating between many layers of code across different projects and repositories? Sorry, no way.

People often browse the code using their web browsers. They don’t download the repository locally. They use GitHub’s search engine, in-house code explorers, or even Google or AI. These solutions don’t support any “go to definition” or “find all references”. People need to use basic exact match search or some full-text search that is often flaky when applied to source code. Not to mention that we have dynamic languages that are even harder to traverse and navigate.

Therefore, we should help using basic tool, for instance:

  • Do not overuse var and other type inference. The type should be clear when reading the code in notepad
  • Be careful with implicit parameters, overloads, and fancy inheritance hierarchies
  • Be mindful when using polymorphism
  • Have distinctive method names so they can be easily found when looking for a full method name using basic exact term search
  • Prefer named types over unnamed tuples, as the named types can be easier to follow
  • Avoid deep call chains that are hard to navigate

Again, these rules may result in the code being less “clean”. And again, this doesn’t apply to every single line of code. Think about what hat you are wearing and who would benefit from the particular piece of code.

Predictability

If you ever read Thinking Fast and Slow, you know that people tend to think fast to make their lives easier. We all do that. We follow stereotypes, patterns, routines, and avoid thinking as much as possible.

Your code should be optimized for that. Yes, we may complain that people are unwise or lazy, but think to yourself if you prefer to be paged at 4AM or if you prefer your ops team to deal with the issue on their own without calling you. Therefore,

  • Follow common practices for naming, organizing the code structure, implementing interfaces, or creating overloads. If that results in code duplication or some “imperfect” solutions – so be it
  • Use common patterns. People recognize adapters and factories much easier than double dispatch. The latter will make your code cleaner and smarter, the former will let you have a good night sleep
  • Make the code predictable. Have methods like EmitX where X is the metric name. This will let the search engine find it faster, and will also make readers more confident that they are reading the right part of code
  • Follow the industry. Many things are “incorrect” because reasons (history, compatibility, typos, etc.). Just follow them because this is what people are used to

Again, you can often make your code much smarter at the cost of predictability. Think if it’s worth it.

Code is just a tool

Last but not least, remember that code is just a tool. Don’t think about the code, think about the purpose.

I already mentioned this, but let me reiterate. Your code serves different purposes and is read differently by various people. Fellow developers are interested in how to call your code properly, so you build facades. Support teams are interested in finding metrics or UI elements. You are interested in maintaining the business logic over many years. Your code must be readable for all these groups, so you have to optimize it for these groups separately. Or you may be the “know it all” that needs to be paged at 4AM because nobody else can figure out how to troubleshoot.

This is much bigger than just the ops team. Your code will be read by people from other companies (like when you do open source or when you involve an independent consultant). It’s in your best interest to make them understand the code faster. It’s up to you if you make the code readable for them, or if you pay more to your consultants, or if you deal with more support cases.

This also makes you a “go to person”. Others prefer to ask you for help instead of figuring out things on their own because asking you is just faster. Newcomers waste tons of time on running into dead ends because they can’t navigate the code the same way that you can. It’s up to you if you prefer to be the documentation or if you want to make people independent. The latter requires lowering the bar sometimes.

Also, not everyone is tech savvy. We need to speak with product managers, business experts, end users. It’s much easier to communicate with them when we have the same representation of concepts. Parity between code and documentation is great for reducing this cognitive barrier. If your business analysts document a process, then it’s easier to read the code if it has exactly the same structure as things in the documentation. This goes even further and is called Conway’s law. We want to align our code with the communication paths and communication terms (ever heard about the ubiquitous language?) rather than the engineering practices.

The ultimate takeaway is: we don’t read the code for the sake of reading. We read the code to understand concepts. We want to understand them to be able to change the future, or to troubleshoot the past. These things are distinct and should be optimized for differently. There is no single “readable code” because readability depends on the purpose.

]]>
https://blog.adamfurmanek.pl/2026/01/10/types-and-programming-languages-part-21/feed/ 0
Availability Anywhere Part 29 — Using all remote solutions in parallel https://blog.adamfurmanek.pl/2025/12/12/availability-anywhere-part-29/ https://blog.adamfurmanek.pl/2025/12/12/availability-anywhere-part-29/#respond Fri, 12 Dec 2025 19:00:25 +0000 https://blog.adamfurmanek.pl/?p=5216 Continue reading Availability Anywhere Part 29 — Using all remote solutions in parallel]]>

This is the twentieth ninth part of the Availability Anywhere series. For your convenience you can find other parts in the table of contents in Part 1 – Connecting to SSH tunnel automatically in Windows

There are so many protocols for remote access. Why not use all of them at the same time? Let’s see how.

Back to basics

“Remote access” is very misleading as it often means different things between users. We already covered that in the part 27 where I described session management, input and output, screen geometry, and more.

This results in no single solution that would fit all. In my case it’s the following:

  • RDP is great as it is the fastest, has great quality, supports keyboard and touch properly even in nested sessions, supports custom geometry, and handles incoming sound. However, it can’t be shared between many devices in parallel.
  • VNC is great as it’s fast, has good enough quality, can be shared between devices, and supports watching regions of the geometry. It also works in browser on nearly any device. But it doesn’t deal well with keyboard (especially in nested sessions), doesn’t support audio, and requires the session to exist in the machine.
  • RustDesk is cool in terms of keyboard handling and picture quality, but it’s too slow for the day to day work. It’s cool for ad-hoc connections and can be shared between devices. However, it requires the session and doesn’t deal with UAC that great.
  • NoMachine is similar to RustDesk but has worse keyboard support.
  • vSpatial is fast enough and supports VR, but can’t be shared between devices.

I could go on and on with listing pros and cons of each solution, but it should be clear that there is no single solution that would work for all my needs.

So what can we do about that? Let’s use all of them in parallel.

Requirements at a glance

Before figuring out what to do, let’s see what we’d like to achieve. I’d like to have the following:

  • Being able to connect remotely to a machine from multiple devices in parallel (laptops, smartphones, VR goggles, etc.)
  • Supporting 3+ monitors
  • Keeping the session alive even when I’m not connected
  • Supporting sound in both directions and camera feed going into the remote machine
  • Being able to copy and paste text easily. Similarly for files
  • Adapting the screen to my physical device (it should stretch if needed to support fullscreen)
  • Remote machine is Windows

It’s quite a lot and there is no single solution doing all of that. Let’s build it step by step.

Configuring the session

First, we have to create a session in the remote machine. There are generally two ways.

If the remote machine is a virtual machine that we can control, then we can connect to it via KVM (like basic session in Hyper-V or other mechanisms built into hypervisors). This creates the regular CONSOLE session which we can now adapt in any way we need. To create virtual screens, we can use any fork of IddSampleDriver that will support the resolution we need etc.

If the remote machine is not a VM that we can control, then let’s still do the same. Just create your own VM, configure it accordingly, and then RDP into the remote machine.

To keep the session alive even if you are not connected to it, just keep your VM somewhere where it doesn’t turn off, like Azure VM or any other VPS.

Connecting in parallel

Now we can connect to the remote machine using many solutions that connect to the existing session. The question is: how to run all of them together?

The trick is to make windows transparent. For instance, first connect to the remote machine over VNC and make the client full screen. Next, connect to the remote machine using RustDesk and make the client full screen again. Finally, use something like See Through Windows to make RustDesk fully transparent. This way you can use keyboard and clipboard via RustDesk and watch the screen via VNC.

Nothing stops you from connecting with more solutions like this. You can also fork the See Through Windows and automate it any way you wish.

What’s more, you can use your local machine to create multiple virtual desktops and have different set of clients on different desktops.

Summary

Making windows transparent is a nice hack that lets you use multiple applications in parallel. You could obviously fork RustDesk or VNC clients and adjust features as you need.

]]>
https://blog.adamfurmanek.pl/2025/12/12/availability-anywhere-part-29/feed/ 0
State Machine Executor Part 6 — Forking https://blog.adamfurmanek.pl/2025/11/21/state-machine-executor-part-6/ https://blog.adamfurmanek.pl/2025/11/21/state-machine-executor-part-6/#respond Fri, 21 Nov 2025 01:30:10 +0000 https://blog.adamfurmanek.pl/?p=5204 Continue reading State Machine Executor Part 6 — Forking]]>

This is the sixth part of the State Machine Executor series. For your convenience you can find other parts in the table of contents in State Machine Executor Part 1 — Introduction

Let’s revisit our execution function:

void Run(StateMachine machine, string initialTransition){
	string state = null;
	string currentTransition = initialTransition;
	StoreHolder store = ReadStore();
	bool wasTimedOut = false;
	do {
		if(machine.IsTimedOut(state)){
			wasTimedOut = true;
			currentTransition = "timeout-handler-transition";
		}
		bool hadException = false;
		try{
			result = machine.RunTransition(currentTransition, store);
		}catch(Exception e){
			hadException = true;
		}
		state = result.CurrentState;
		currentTransition = hadException ? "exception-handler-transition" : result.NextTransition;
		MergeAndPersist(store, result.StoreChanges);
		ExecuteActions(result.ActionsToExecute);
		if(result.Suspend) {
			break;
		}
		if(wasTimedOut){
			break;
		}
	}while(!machine.IsCompleted(state));
}

It’s quite quite already as we extended it with a support for timeouts, exception handling, state management, and actions. We’re going to make it even more complex now.

Specifically, we will focus on the following line:

result = machine.RunTransition(currentTransition, store);

This line triggers the transition and makes the state machine to execute one piece of code. In this post, we’re going to discuss how to run transitions in parallel.

Problem statement

When discussing state machines, we typically think in terms of the state machine having one “state” at a time. However, that’s not very realistic. We often need to be able to run multiple things in parallel. Some of them are contained within a single state/transition (e.g., reading multiple files in parallel instead of one by one), some are independent sub-workflows (e.g., one workflow sending emails, another one uploading files), and some are tightly connected with each other (e.g., one workflow validates something and can affect the other workflow which tries to process the query at the same time).

The problem we’re facing is: how to run multiple transitions in parallel, manage the state changes happening with those multiple transitions, and how to deal with the interconnections between transitions. Let’s explore some solutions.

Some programming models

Many programming models for workflows have been developed over the years. Let’s see some of them, without going into many details and formalities.

Bulk-synchronous-parallel / Superstep

Bulk-synchronous-parallel (BSP) model consists of a series of supersteps with synchronization and communication in between. Each superstep is a logical computation performed by a node/processor/unit. Once the computation is done, nodes exchange data and wait until all nodes reach the same step.

This model is quite easy to analyze, but is rather rigid and inflexible in its structure. It typically goes with batch-like approach in which we divide the amount of work between nodes, send it for processing, and then wait for results.

This model is very popular in state machines.

Fork & Join

In this approach, we fork the process into many copies, and each copy performs similar work on its won. It’s more flexible than BSP because we can use work stealing and sometimes we can avoid synchronization.

This model is often used in parallel processing of collections or in handling web requests.

Threading

Threading is a low-level approach to computation. Each thread is completely independent and can do whatever it wants. Threads synchronize only when needed, and there is no clear structure of their behavior.

This model is very powerful, but quite hard to analyze and reason about.

Trails as a middle ground

BSP is often used in the state machines because it can be represented easily in terms of states and transitions. We can think of it as one transition between two states that is executed many times in parallel. While the model is simple, it’s also quite inflexible as it requires that all the parallelization does the same work (but with different data).

Threads on the other hand are very flexible, but they require synchronization. Effectively, each thread must have some kind of a handle to the other threads it wants to synchronize with. Things get much more complex when those other threads fork on their own, as now the synchronization involves “group of threads” which are often represented as jobs or with a parent-child relationship.

To keep the flexibility of threads but without the rigidness of BSP, we can introduce something in between – namely a trail.

Trail structure

Trail is like a thread but it doesn’t “synchronize on its own”. It only states the “requirements” for it to continue and the platform takes care of making sure the requirements are met. We can think of trails as of threads and named mutexes managed by the platform.

A trail is an object with the following properties:

class Trail {
    string Name;
    string CurrentState;
    string NextTransition;
    string[] BlockingTrails;
}

Name is used to synchronized with other trails. CurrentState and NextTransition simply indicate where the trail is and what it’s going to do next. BlockingTrails is a collection of other trails that need to complete first before the current trail can move on.

When starting a state machine from scratch, we simply have one trail with the initial state and transition. It can have any name.

To implement trail spawning, we extend the result of a transition to have a list of trails to continue with:

class TransitionResult {
    Trail[] NextTrails;
}

This way, one trail can fork into many sub-trails. Each sub-trail is independent and can do whatever it wants.

To implement joins, we simply deduplicate trails based on their name. We also assume that if duplicates appear, they must be in the same state and transition.

Let’s see how to use them.

Parallel collection processing

Let’s say that we want to process a collection of elements in parallel.

We start with the initial state that splits the work:

Trail initial = new Trail(Name="work"...);

When called, the transition splits the work and returns one trail for each element:

return new TransitionResult{
   NextTrails = new {
       new Trail(Name="work.1", ...),
       new Trail(Name="work.2", ...),
       new Trail(Name="work.3", ...),
       ...
       new Trail(Name="work.n", ...),
   }
}

All these trails are executed by the platform, hopefully in a parallel manner. Now, they need to synchronize at the very end, so each of those worker trails returns the same result of the final transition;

return new TransitionResult {
    NextTrails = new {
        new Trail(Name="work", BlockingTrails= new { "work." }, ...)
    ]
}

Now, the platform needs to do the following:

  1. The platform needs to deduplicate all the trails. Each worker trail returns trails with the same name (work), so the platform knows what to do.
  2. The platform needs to wait until all the worker trails finish processing items. This is achieved thanks to the BlockingTrails set to work. (notice the dot at the end). The platform needs to wait until all trails with names starting with work. finish the work, and then it can proceed with the new deduplicated trail

This way, we can achieve typical parallel collection processing.

Running child state machines

Running child state machines is quite straightforward. Let’s say that we want to continue execution and also start something completely independent on the side. We simply return:

return new TransitionResult{
   NextTrails = new {
       new Trail(Name="work", ...),
       new Trail(Name="some_side_work", ...)
   }
}

At some point, the side work completes. It simply indicates that it reached the end by returning empty NextTrails collection.

Summary

Trails provide a flexible approach without the complex overhead of manual synchronization. They are more flexible than BSP which is crucial when we want to run independent child state machines.

]]>
https://blog.adamfurmanek.pl/2025/11/21/state-machine-executor-part-6/feed/ 0
Non-atomic assignments in Python https://blog.adamfurmanek.pl/2025/11/20/non-atomic-assignments-in-python/ https://blog.adamfurmanek.pl/2025/11/20/non-atomic-assignments-in-python/#respond Thu, 20 Nov 2025 12:26:12 +0000 https://blog.adamfurmanek.pl/?p=5202 Continue reading Non-atomic assignments in Python]]> It’s not a hidden knowledge that many assignments are not atomic and we can face the word tearing. They are mostly related to CPU word length, synchronization, concurrency, etc.

However, things can be much worse when we’re dealing with interpreted languages or languages with no strict schema. In these languages, a “regular” assignment can also be non-atomic.

Let’s take Python. In Python, every object can be considered a dictionary of fields. This means that a single assignment may result in expanding the forementioned dictionary which may cause issues for some other thread. Let’s see an example:

import jsonpickle
from concurrent.futures import ThreadPoolExecutor
 
threads = 40
iterations = 1000
promises = []
 
class Sample:
	def __init__(self):
		self.big_property = [x for x in range(100000)]
 
 
def serializer(s):
	jsonpickle.dumps(s)
 
def result_setter(s):
	s.abc = "abc"
 
with ThreadPoolExecutor(max_workers=threads) as executor:
	for x in range(iterations):
		s = Sample()
		promises.append(executor.submit(result_setter, s))
		promises.append(executor.submit(serializer, s))
 
for promise in promises:
	promise.result()

We have a Sample class that has one field initially, namely big_property.

We have two different types of tasks: the first one serializer uses the jsonpickle library to serialize an object to a JSON string. The second task result_setter sets a field on the object. We then run sufficiently many tasks to observe the issue.

If we’re unlucky enough, we’ll hit the following race condition: the first task starts serializing the object, then the first task is paused and the second task kicks in. The second tasks sets a field on the object. Normally, we could think this assignment is “atomic” as we only set a reference into a field. However, since the Python object is a dictionary of fields, we need to add new entry to the dictionary. Once the first task is resumed, it throws the following error:

Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "/usr/lib/python3.11/concurrent/futures/_base.py", line 449, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/usr/lib/python3.11/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<stdin>", line 2, in serializer
  File "/.venv/lib/python3.11/site-packages/jsonpickle/pickler.py", line 166, in encode
    context.flatten(value, reset=reset), indent=indent, separators=separators
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/.venv/lib/python3.11/site-packages/jsonpickle/pickler.py", line 366, in flatten
    return self._flatten(obj)
           ^^^^^^^^^^^^^^^^^^
  File "/.venv/lib/python3.11/site-packages/jsonpickle/pickler.py", line 326, in _flatten
    result = self._flatten_impl(obj)
             ^^^^^^^^^^^^^^^^^^^^^^^
  File "/.venv/lib/python3.11/site-packages/jsonpickle/pickler.py", line 386, in _flatten_impl
    return self._pop(self._flatten_obj(obj))
                     ^^^^^^^^^^^^^^^^^^^^^^
  File "/.venv/lib/python3.11/site-packages/jsonpickle/pickler.py", line 419, in _flatten_obj
    raise e
  File "/.venv/lib/python3.11/site-packages/jsonpickle/pickler.py", line 413, in _flatten_obj
    return flatten_func(obj)
           ^^^^^^^^^^^^^^^^^
  File "/.venv/lib/python3.11/site-packages/jsonpickle/pickler.py", line 716, in _ref_obj_instance
    return self._flatten_obj_instance(obj)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/.venv/lib/python3.11/site-packages/jsonpickle/pickler.py", line 697, in _flatten_obj_instance
    return self._flatten_dict_obj(obj.__dict__, data, exclude=exclude)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/.venv/lib/python3.11/site-packages/jsonpickle/pickler.py", line 794, in _flatten_dict_obj
    for k, v in util.items(obj, exclude=exclude):
  File "/.venv/lib/python3.11/site-packages/jsonpickle/util.py", line 584, in items
    for k, v in obj.items():
RuntimeError: dictionary changed size during iteration

We can see the dictionary of the fields changed the size because of the assignment. This would not be the case if we initialized the field in the constructor (i.e., if we didn’t need to add a new field to the object but to modify an existing one).

]]>
https://blog.adamfurmanek.pl/2025/11/20/non-atomic-assignments-in-python/feed/ 0
State Machine Executor Part 5 — Streaming https://blog.adamfurmanek.pl/2025/10/24/state-machine-executor-part-5/ https://blog.adamfurmanek.pl/2025/10/24/state-machine-executor-part-5/#respond Fri, 24 Oct 2025 14:55:52 +0000 https://blog.adamfurmanek.pl/?p=5199 Continue reading State Machine Executor Part 5 — Streaming]]>

This is the fifth part of the State Machine Executor series. For your convenience you can find other parts in the table of contents in State Machine Executor Part 1 — Introduction

Being able to describe side effects instead of executing them may sound great, but it has one significant drawback – the side effects need to be described completely before they are handed off to the executor. Building an object describing the action may cause significant memory usage. Let’s see how to fix that.

Streaming

We’d like to be able to stream the data. Let’s say that we have the following Action describing web request to execute:

class HttpAction {
	public string Url;
	public string Method;
	public byte[] Body;
}

See the Body field. It holds the entire payload to be sent. Creating such a payload and storing it in memory will increase the memory usage and decrease scalability. To avoid that, we should have something like this:

class HttpAction {
	public string Url;
	public string Method;
	public Stream<byte> Body;
}

Looks great, but it doesn’t solve any problem. Remember that the state machine must create the action object and hand it over to the executor. The state machine won’t be able to run any code until the action is executed. This means that the Stream must be filled with the data, so we still have the problem with high memory usage.

Instead of passing the stream, we could pass a stream generator. That could be a lambda or some other interface with yield keyword:

class HttpAction {
	public string Url;
	public string Method;
	public IEnumerable<byte> Body;
}

Looks slightly better, but still has issues. If Body wraps any local variables into a closure, then the memory will not be released until the stream is read. Not to mention that it’s much harder to persist the HttpAction object to provide reliability.

Solution

To solve the problem, we need to effectively stream the data. However, since the actions are executed after the state machine is done, we need to stream the data somewhere else – to a local file.

The executor can provide the following abstraction:

class Env{
	public FileWrapper CreateFile();
	public FileWrapper ReadFile(string identifier);
}

class FileWrapper {
	public string Identifier;
	public File FileHandle;
	public void Commit();
}

Now, the state machine can call CreateFile to get a temporary file. Next, the state machine can stream the content to the file. Finally, the state machine calls Commit to indicate to the executor that the file is ready to be persisted. The executor can then upload the file to the persistent store.

Last but not least, we need to modify the action definition:

class HttpAction {
	public string Url;
	public string Method;
	public string BodyFileIdentifier;
}

The action executor can now stream the body from the file. If something fails, the file can be retrieved from the persistent storage and the action can be retried.

This solution is not perfect, though. The data is streamed twice which slows everything down. That’s an obvious trade-off.

]]>
https://blog.adamfurmanek.pl/2025/10/24/state-machine-executor-part-5/feed/ 0
State Machine Executor Part 4 — Timeouts, exceptions, suspending https://blog.adamfurmanek.pl/2025/10/21/state-machine-executor-part-4/ https://blog.adamfurmanek.pl/2025/10/21/state-machine-executor-part-4/#respond Tue, 21 Oct 2025 14:31:14 +0000 https://blog.adamfurmanek.pl/?p=5195 Continue reading State Machine Executor Part 4 — Timeouts, exceptions, suspending]]>

This is the fourth part of the State Machine Executor series. For your convenience you can find other parts in the table of contents in State Machine Executor Part 1 — Introduction

Let’s discuss how to improve reliability of our state machines.

How machines are executed

In part 1, we defined the contract for triggering a single transition. Each transition returns instructions what actions to execute and what transition to call next. We then run in a loop until the state machine is completed.

We can modify this mechanism to deal with crashes, errors, and other undersired effects. Let’s revisit the loop that we defined in part 2:

void Run(StateMachine machine, string initialTransition){
	string state = null;
	string currentTransition = initialTransition;
	StoreHolder store = ReadStore();
	do {
		result = machine.RunTransition(currentTransition, store);
		state = result.CurrentState;
		currentTransition = result.NextTransition;
		MergeAndPersist(store, result.StoreChanges);
		ExecuteActions(result.ActionsToExecute);
	}while(!machine.IsCompleted(state));
}

We read the store before entering the loop. In each loop iteration, we pass the store to the transition, and then update the state and execute actions. We’re now going to modify this solution.

Suspending

The first thing to support is suspension of the state machine. If the machine decides that it needs to wait, it can indicate that in the TransitionResult:

class TransitionResult {
    ....
    bool Suspend;
}

We can now include that in the loop handling:

void Run(StateMachine machine, string initialTransition){
	string state = null;
	string currentTransition = initialTransition;
	StoreHolder store = ReadStore();
	do {
		result = machine.RunTransition(currentTransition, store);
		state = result.CurrentState;
		currentTransition = result.NextTransition;
		MergeAndPersist(store, result.StoreChanges);
		ExecuteActions(result.ActionsToExecute);
		if(result.Suspend) {
			break;
		}
	}while(!machine.IsCompleted(state));
}

We can then proceed with the state machine when the time comes. We can obviously extend that to support sleep or waiting for some condition.

Exceptions

We need to handle unexpected crashes as well. We simply catch the exception and then we need to let the state machine know it happened. We can do that by redirecting the state machine to a well-known transition:

void Run(StateMachine machine, string initialTransition){
	string state = null;
	string currentTransition = initialTransition;
	StoreHolder store = ReadStore();
	do {
		bool hadException = false;
		try{
			result = machine.RunTransition(currentTransition, store);
		}catch(Exception e){
			hadException = true;
		}
		state = result.CurrentState;
		currentTransition = hadException ? "exception-handler-transition" : result.NextTransition;
		MergeAndPersist(store, result.StoreChanges);
		ExecuteActions(result.ActionsToExecute);
		if(result.Suspend) {
			break;
		}
	}while(!machine.IsCompleted(state));
}

We can obviously extend that to give access to the exception or add any additional details.

Timeouts

We would also like to terminate the state machine if it runs for tool long. There are two ways to do that: we can terminate it the hard way by interrupting the thread (in a preemtive way), or we can wait for it to complete the transition (in a cooperative way). No matter what happens, we may want to redirect the state machine to a well-known transition for handling timeouts:

void Run(StateMachine machine, string initialTransition){
	string state = null;
	string currentTransition = initialTransition;
	StoreHolder store = ReadStore();
	bool wasTimedOut = false;
	do {
		if(machine.IsTimedOut(state)){
			wasTimedOut = true;
			currentTransition = "timeout-handler-transition";
		}
		bool hadException = false;
		try{
			result = machine.RunTransition(currentTransition, store);
		}catch(Exception e){
			hadException = true;
		}
		state = result.CurrentState;
		currentTransition = hadException ? "exception-handler-transition" : result.NextTransition;
		MergeAndPersist(store, result.StoreChanges);
		ExecuteActions(result.ActionsToExecute);
		if(result.Suspend) {
			break;
		}
		if(wasTimedOut){
			break;
		}
	}while(!machine.IsCompleted(state));
}

Notice that we stop processing after the timeout transition. Had we not do that, we would run in an infinite loop. If you don’t want to terminate the processing, then make sure you don’t run into rerouting the state machine constantly.

Summary

Next time, we’re going to see how to deal with data streaming and why it’s needed.

]]>
https://blog.adamfurmanek.pl/2025/10/21/state-machine-executor-part-4/feed/ 0
State Machine Executor Part 3 — Actions and history https://blog.adamfurmanek.pl/2025/10/14/state-machine-executor-part-3/ https://blog.adamfurmanek.pl/2025/10/14/state-machine-executor-part-3/#respond Tue, 14 Oct 2025 12:54:39 +0000 https://blog.adamfurmanek.pl/?p=5192 Continue reading State Machine Executor Part 3 — Actions and history]]>

This is the third part of the State Machine Executor series. For your convenience you can find other parts in the table of contents in State Machine Executor Part 1 — Introduction

Our state machines can execute side-effectful actions. But how do they read results?

One approach is to write the result back to the StoreHolder we designed last time. After executing an action, the executor would write the result back as a property specified by the state machine. This works but is much more complex than just one property. What about retries? What about exceptions? What if the property is already there?

Another approach is to keep the list of all executed actions in some kind of even store. Executing an action would generate a new event indicating that the action has been executed. The state machine would then look check the events and act accordingly. If we need to retry the action, we can simply model that as a yet another event. If we have an exception, then it’s another event. And so on.

Effectively, we can model that in the following way:

class StoreHolder {
	...
	IList<Event> Events;
}

We can have an event indicating the result of an action:

class ActionExecuted<T> {
	Action ExecutedAction;
	T Result;
	Exception? Exception;
}

We can add many more properties to indicate what exactly happened. We may also consider adding unique identifiers to events, order them based on the timestamps, etc.

Finaly, the state machine can simply traverse the list and find the events it needs.

There is more. Since this is a very generic mechanism, we can also add any sort of communication between the executor and the state machine. For instance, you can initialize the store with some events representing the initial input.

]]>
https://blog.adamfurmanek.pl/2025/10/14/state-machine-executor-part-3/feed/ 0
State Machine Executor Part 2 — Fault tolerance https://blog.adamfurmanek.pl/2025/10/13/state-machine-executor-part-2/ https://blog.adamfurmanek.pl/2025/10/13/state-machine-executor-part-2/#respond Mon, 13 Oct 2025 08:05:00 +0000 https://blog.adamfurmanek.pl/?p=5184 Continue reading State Machine Executor Part 2 — Fault tolerance]]>

This is the second part of the State Machine Executor series. For your convenience you can find other parts in the table of contents in State Machine Executor Part 1 — Introduction

The code we implemented in the last part is unable to recover from machine crashes. If the process dies midway, we need to start it from scratch. Let’s fix that.

Before going into details, let’s think how we could be triggering the state machine. We could run it on an API call – someone calls the endpoint, we start processing the request, and we trigger the execution along the way. If something dies, the caller will probably retry their call. Another approach is to use a queue. We receive a message from the queue, we start the processing, and we trigger the state machine. If something breaks, the message will get retried. Other scenarios may be similar.

In all of those scenarios, we get a retry due to some other mechanisms. Once we retry, we want to resume the state machine processing. This is very simple conceptually. We just need to recreate the state machine and retrigger the transition. Let’s do that.

State management

The hard part in retrying this way is recovering of the state. The state machine is most likely stateful and calculates something as it goes through the states. We can tackle this in many ways: preserve the whole state machine, provide an interface to read and write data that the state machine would use, or provide a temporary object.

Preserving the state machine in its entirety may be possible, but has many drawbacks. First, we may be unable to serialize the object as we don’t even know what it consists of (it may be loaded dynamically and not owned by us). Second, some objects may be not serializable by definition (like locks, things tied to OS data like threads, etc.). Third, this may impose technological limits (like the programming language you use etc.).

Another approach is to have an interface for the state machine to read and write some pieces of information. For instance, the state machine executor could expose a simple key-value store for the data. Each read and write would be effectively handled by the state machine executor. While this is quite easy, it lacks transactions interleaved with other side effects.

Another approach is a simple dictionary that the state machine can use. This lets the state machine effectively couple the transaction with other side effects. The state machine executor can persist both the changes to the dictionary and the description of the actions in one transaction.

Let’s take this last approach and see how it works. We now would like to have the following object for keeping the changes:

class StoreHolder {
	Dictionary<string, object> Store;
}

Now, the state machine needs to describe modifications to this store:

class TransitionResult {
	...
	Dictionary<string, object> StoreChanges;
}

Also, the state machine executor needs to pass this object to the state machine:

class StateMachine {
	...
	TransitionResult RunTransition(string transitionName, StoreHolder store) {...}
}

Finally, this is how we execute the state machine now:

void Run(StateMachine machine, string initialTransition){
	string state = null;
	string currentTransition = initialTransition;
	StoreHolder store = ReadStore();
	do {
		result = machine.RunTransition(currentTransition, store);
		state = result.CurrentState;
		currentTransition = result.NextTransition;
		MergeAndPersist(store, result.StoreChanges);
		ExecuteActions(result.ActionsToExecute);
	}while(!machine.IsCompleted(state));
}

Looks nice. Let’s see what problems we may have with this approach.

Persisting the store

Let’s now see some pros and cons of this approach.

By persisting the store at once, we can easily identify if there are two state machines executing at the same time. This would result in concurrent writes which we can find by using the versions or locks.

By saving the changes after the state machine finishes the transition, we can have the outbox behavior. We persist the store changes and the information what actions to execute. This way, when we can retry the actions in case of crashes. We’ll see that in details in the next part.

This approach is also technology-independent. It’s easy to serialize the key-value dictionary in any technology. However, if the state machine decides to put some complex objects in the store, they need to be serializable and deserializable. Also, they need to be backwards compatible when the state machine code changes. Let’s explore that a little more.

Let’s say that the state machine preserves something like Store["property"] = someObject. If the state machine executor would like to serialize the dictionary now, the someObject value must be serializable. While this sounds trivial, this is often not the case. For instance, many types in Python are not serializable by the built-in solutions like json package. Similarly, objects in Java must implement the Serializable interface or adhere to the requirements of the serialization library. While this is not a big issue, this puts some requirements on the state machine.

Much bigger issues may happen when deserializing the value. First, it may be impossible to deserialize the someObject value due to lack of parameterless constructor or other library requirements. This is not a rare issue.

Worse, we now need to deal with backward and forward compatibility. Let’s say that the state machine is paused and then resumed on some other node. This can be due to a retry or rolling deployment. When the execution is retried, it may happen on either newer or older code version. This means that the store must be deserialized using a different code. If you use a binary serializer, this will most likely cause problems. The same issue may happen if the newer code would like to examine the store written by some older version of the code, like some other state machine execution.

The easiest solution to this problem is to avoid storing complex object entirely. This simplifies the serialization and the deserialization process. However, it doesn’t solve the issue with schema changes and compatibility.

If you need to store complex objects and still want to access stores created by the older state machines, it may be beneficial to store two versions of the store. One version is serialized using a binary serializer that can serialize and deserialize objects of any kind. The other version is stored using some regular JSON serializer that can only serialize the data but can’t deserialize it into complex objects. You would then examine this JSON data as raw JSON objects.

]]>
https://blog.adamfurmanek.pl/2025/10/13/state-machine-executor-part-2/feed/ 0