Random IT Utensils

Availability Anywhere Part 29 — Using all remote solutions in parallel

afish — Fri, 12 Dec 2025 19:00:25 +0000

This is the twentieth ninth part of the Availability Anywhere series. For your convenience you can find other parts in the table of contents in Part 1 – Connecting to SSH tunnel automatically in Windows

There are so many protocols for remote access. Why not use all of them at the same time? Let’s see how.

Back to basics

“Remote access” is very misleading as it often means different things between users. We already covered that in the part 27 where I described session management, input and output, screen geometry, and more.

This results in no single solution that would fit all. In my case it’s the following:

RDP is great as it is the fastest, has great quality, supports keyboard and touch properly even in nested sessions, supports custom geometry, and handles incoming sound. However, it can’t be shared between many devices in parallel.
VNC is great as it’s fast, has good enough quality, can be shared between devices, and supports watching regions of the geometry. It also works in browser on nearly any device. But it doesn’t deal well with keyboard (especially in nested sessions), doesn’t support audio, and requires the session to exist in the machine.
RustDesk is cool in terms of keyboard handling and picture quality, but it’s too slow for the day to day work. It’s cool for ad-hoc connections and can be shared between devices. However, it requires the session and doesn’t deal with UAC that great.
NoMachine is similar to RustDesk but has worse keyboard support.
vSpatial is fast enough and supports VR, but can’t be shared between devices.

I could go on and on with listing pros and cons of each solution, but it should be clear that there is no single solution that would work for all my needs.

So what can we do about that? Let’s use all of them in parallel.

Requirements at a glance

Before figuring out what to do, let’s see what we’d like to achieve. I’d like to have the following:

Being able to connect remotely to a machine from multiple devices in parallel (laptops, smartphones, VR goggles, etc.)
Supporting 3+ monitors
Keeping the session alive even when I’m not connected
Supporting sound in both directions and camera feed going into the remote machine
Being able to copy and paste text easily. Similarly for files
Adapting the screen to my physical device (it should stretch if needed to support fullscreen)
Remote machine is Windows

It’s quite a lot and there is no single solution doing all of that. Let’s build it step by step.

Configuring the session

First, we have to create a session in the remote machine. There are generally two ways.

If the remote machine is a virtual machine that we can control, then we can connect to it via KVM (like basic session in Hyper-V or other mechanisms built into hypervisors). This creates the regular CONSOLE session which we can now adapt in any way we need. To create virtual screens, we can use any fork of IddSampleDriver that will support the resolution we need etc.

If the remote machine is not a VM that we can control, then let’s still do the same. Just create your own VM, configure it accordingly, and then RDP into the remote machine.

To keep the session alive even if you are not connected to it, just keep your VM somewhere where it doesn’t turn off, like Azure VM or any other VPS.

Connecting in parallel

Now we can connect to the remote machine using many solutions that connect to the existing session. The question is: how to run all of them together?

The trick is to make windows transparent. For instance, first connect to the remote machine over VNC and make the client full screen. Next, connect to the remote machine using RustDesk and make the client full screen again. Finally, use something like See Through Windows to make RustDesk fully transparent. This way you can use keyboard and clipboard via RustDesk and watch the screen via VNC.

Nothing stops you from connecting with more solutions like this. You can also fork the See Through Windows and automate it any way you wish.

What’s more, you can use your local machine to create multiple virtual desktops and have different set of clients on different desktops.

Summary

Making windows transparent is a nice hack that lets you use multiple applications in parallel. You could obviously fork RustDesk or VNC clients and adjust features as you need.

State Machine Executor Part 6 — Forking

afish — Fri, 21 Nov 2025 01:30:10 +0000

This is the sixth part of the State Machine Executor series. For your convenience you can find other parts in the table of contents in State Machine Executor Part 1 — Introduction

Let’s revisit our execution function:

void Run(StateMachine machine, string initialTransition){
	string state = null;
	string currentTransition = initialTransition;
	StoreHolder store = ReadStore();
	bool wasTimedOut = false;
	do {
		if(machine.IsTimedOut(state)){
			wasTimedOut = true;
			currentTransition = "timeout-handler-transition";
		}
		bool hadException = false;
		try{
			result = machine.RunTransition(currentTransition, store);
		}catch(Exception e){
			hadException = true;
		}
		state = result.CurrentState;
		currentTransition = hadException ? "exception-handler-transition" : result.NextTransition;
		MergeAndPersist(store, result.StoreChanges);
		ExecuteActions(result.ActionsToExecute);
		if(result.Suspend) {
			break;
		}
		if(wasTimedOut){
			break;
		}
	}while(!machine.IsCompleted(state));
}

It’s quite quite already as we extended it with a support for timeouts, exception handling, state management, and actions. We’re going to make it even more complex now.

Specifically, we will focus on the following line:

result = machine.RunTransition(currentTransition, store);

This line triggers the transition and makes the state machine to execute one piece of code. In this post, we’re going to discuss how to run transitions in parallel.

Problem statement

When discussing state machines, we typically think in terms of the state machine having one “state” at a time. However, that’s not very realistic. We often need to be able to run multiple things in parallel. Some of them are contained within a single state/transition (e.g., reading multiple files in parallel instead of one by one), some are independent sub-workflows (e.g., one workflow sending emails, another one uploading files), and some are tightly connected with each other (e.g., one workflow validates something and can affect the other workflow which tries to process the query at the same time).

The problem we’re facing is: how to run multiple transitions in parallel, manage the state changes happening with those multiple transitions, and how to deal with the interconnections between transitions. Let’s explore some solutions.

Some programming models

Many programming models for workflows have been developed over the years. Let’s see some of them, without going into many details and formalities.

Bulk-synchronous-parallel / Superstep

Bulk-synchronous-parallel (BSP) model consists of a series of supersteps with synchronization and communication in between. Each superstep is a logical computation performed by a node/processor/unit. Once the computation is done, nodes exchange data and wait until all nodes reach the same step.

This model is quite easy to analyze, but is rather rigid and inflexible in its structure. It typically goes with batch-like approach in which we divide the amount of work between nodes, send it for processing, and then wait for results.

This model is very popular in state machines.

Fork & Join

In this approach, we fork the process into many copies, and each copy performs similar work on its won. It’s more flexible than BSP because we can use work stealing and sometimes we can avoid synchronization.

This model is often used in parallel processing of collections or in handling web requests.

Threading

Threading is a low-level approach to computation. Each thread is completely independent and can do whatever it wants. Threads synchronize only when needed, and there is no clear structure of their behavior.

This model is very powerful, but quite hard to analyze and reason about.

Trails as a middle ground

BSP is often used in the state machines because it can be represented easily in terms of states and transitions. We can think of it as one transition between two states that is executed many times in parallel. While the model is simple, it’s also quite inflexible as it requires that all the parallelization does the same work (but with different data).

Threads on the other hand are very flexible, but they require synchronization. Effectively, each thread must have some kind of a handle to the other threads it wants to synchronize with. Things get much more complex when those other threads fork on their own, as now the synchronization involves “group of threads” which are often represented as jobs or with a parent-child relationship.

To keep the flexibility of threads but without the rigidness of BSP, we can introduce something in between – namely a trail.

Trail structure

Trail is like a thread but it doesn’t “synchronize on its own”. It only states the “requirements” for it to continue and the platform takes care of making sure the requirements are met. We can think of trails as of threads and named mutexes managed by the platform.

A trail is an object with the following properties:

class Trail {
    string Name;
    string CurrentState;
    string NextTransition;
    string[] BlockingTrails;
}

Name is used to synchronized with other trails. CurrentState and NextTransition simply indicate where the trail is and what it’s going to do next. BlockingTrails is a collection of other trails that need to complete first before the current trail can move on.

When starting a state machine from scratch, we simply have one trail with the initial state and transition. It can have any name.

To implement trail spawning, we extend the result of a transition to have a list of trails to continue with:

class TransitionResult {
    Trail[] NextTrails;
}

This way, one trail can fork into many sub-trails. Each sub-trail is independent and can do whatever it wants.

To implement joins, we simply deduplicate trails based on their name. We also assume that if duplicates appear, they must be in the same state and transition.

Let’s see how to use them.

Parallel collection processing

Let’s say that we want to process a collection of elements in parallel.

We start with the initial state that splits the work:

Trail initial = new Trail(Name="work"...);

When called, the transition splits the work and returns one trail for each element:

return new TransitionResult{
   NextTrails = new {
       new Trail(Name="work.1", ...),
       new Trail(Name="work.2", ...),
       new Trail(Name="work.3", ...),
       ...
       new Trail(Name="work.n", ...),
   }
}

All these trails are executed by the platform, hopefully in a parallel manner. Now, they need to synchronize at the very end, so each of those worker trails returns the same result of the final transition;

return new TransitionResult {
    NextTrails = new {
        new Trail(Name="work", BlockingTrails= new { "work." }, ...)
    ]
}

Now, the platform needs to do the following:

The platform needs to deduplicate all the trails. Each worker trail returns trails with the same name (work), so the platform knows what to do.
The platform needs to wait until all the worker trails finish processing items. This is achieved thanks to the BlockingTrails set to work. (notice the dot at the end). The platform needs to wait until all trails with names starting with work. finish the work, and then it can proceed with the new deduplicated trail

This way, we can achieve typical parallel collection processing.

Running child state machines

Running child state machines is quite straightforward. Let’s say that we want to continue execution and also start something completely independent on the side. We simply return:

return new TransitionResult{
   NextTrails = new {
       new Trail(Name="work", ...),
       new Trail(Name="some_side_work", ...)
   }
}

At some point, the side work completes. It simply indicates that it reached the end by returning empty NextTrails collection.

Summary

Trails provide a flexible approach without the complex overhead of manual synchronization. They are more flexible than BSP which is crucial when we want to run independent child state machines.

Non-atomic assignments in Python

afish — Thu, 20 Nov 2025 12:26:12 +0000

It’s not a hidden knowledge that many assignments are not atomic and we can face the word tearing. They are mostly related to CPU word length, synchronization, concurrency, etc.

However, things can be much worse when we’re dealing with interpreted languages or languages with no strict schema. In these languages, a “regular” assignment can also be non-atomic.

Let’s take Python. In Python, every object can be considered a dictionary of fields. This means that a single assignment may result in expanding the forementioned dictionary which may cause issues for some other thread. Let’s see an example:

import jsonpickle
from concurrent.futures import ThreadPoolExecutor
 
threads = 40
iterations = 1000
promises = []
 
class Sample:
	def __init__(self):
		self.big_property = [x for x in range(100000)]
 
 
def serializer(s):
	jsonpickle.dumps(s)
 
def result_setter(s):
	s.abc = "abc"
 
with ThreadPoolExecutor(max_workers=threads) as executor:
	for x in range(iterations):
		s = Sample()
		promises.append(executor.submit(result_setter, s))
		promises.append(executor.submit(serializer, s))
 
for promise in promises:
	promise.result()

We have a Sample class that has one field initially, namely big_property.

We have two different types of tasks: the first one serializer uses the jsonpickle library to serialize an object to a JSON string. The second task result_setter sets a field on the object. We then run sufficiently many tasks to observe the issue.

If we’re unlucky enough, we’ll hit the following race condition: the first task starts serializing the object, then the first task is paused and the second task kicks in. The second tasks sets a field on the object. Normally, we could think this assignment is “atomic” as we only set a reference into a field. However, since the Python object is a dictionary of fields, we need to add new entry to the dictionary. Once the first task is resumed, it throws the following error:

Traceback (most recent call last):
  File "", line 2, in 
  File "/usr/lib/python3.11/concurrent/futures/_base.py", line 449, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/usr/lib/python3.11/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "", line 2, in serializer
  File "/.venv/lib/python3.11/site-packages/jsonpickle/pickler.py", line 166, in encode
    context.flatten(value, reset=reset), indent=indent, separators=separators
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/.venv/lib/python3.11/site-packages/jsonpickle/pickler.py", line 366, in flatten
    return self._flatten(obj)
           ^^^^^^^^^^^^^^^^^^
  File "/.venv/lib/python3.11/site-packages/jsonpickle/pickler.py", line 326, in _flatten
    result = self._flatten_impl(obj)
             ^^^^^^^^^^^^^^^^^^^^^^^
  File "/.venv/lib/python3.11/site-packages/jsonpickle/pickler.py", line 386, in _flatten_impl
    return self._pop(self._flatten_obj(obj))
                     ^^^^^^^^^^^^^^^^^^^^^^
  File "/.venv/lib/python3.11/site-packages/jsonpickle/pickler.py", line 419, in _flatten_obj
    raise e
  File "/.venv/lib/python3.11/site-packages/jsonpickle/pickler.py", line 413, in _flatten_obj
    return flatten_func(obj)
           ^^^^^^^^^^^^^^^^^
  File "/.venv/lib/python3.11/site-packages/jsonpickle/pickler.py", line 716, in _ref_obj_instance
    return self._flatten_obj_instance(obj)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/.venv/lib/python3.11/site-packages/jsonpickle/pickler.py", line 697, in _flatten_obj_instance
    return self._flatten_dict_obj(obj.__dict__, data, exclude=exclude)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/.venv/lib/python3.11/site-packages/jsonpickle/pickler.py", line 794, in _flatten_dict_obj
    for k, v in util.items(obj, exclude=exclude):
  File "/.venv/lib/python3.11/site-packages/jsonpickle/util.py", line 584, in items
    for k, v in obj.items():
RuntimeError: dictionary changed size during iteration

We can see the dictionary of the fields changed the size because of the assignment. This would not be the case if we initialized the field in the constructor (i.e., if we didn’t need to add a new field to the object but to modify an existing one).

State Machine Executor Part 5 — Streaming

afish — Fri, 24 Oct 2025 14:55:52 +0000

This is the fifth part of the State Machine Executor series. For your convenience you can find other parts in the table of contents in State Machine Executor Part 1 — Introduction

Being able to describe side effects instead of executing them may sound great, but it has one significant drawback – the side effects need to be described completely before they are handed off to the executor. Building an object describing the action may cause significant memory usage. Let’s see how to fix that.

Streaming

We’d like to be able to stream the data. Let’s say that we have the following Action describing web request to execute:

class HttpAction {
	public string Url;
	public string Method;
	public byte[] Body;
}

See the Body field. It holds the entire payload to be sent. Creating such a payload and storing it in memory will increase the memory usage and decrease scalability. To avoid that, we should have something like this:

class HttpAction {
	public string Url;
	public string Method;
	public Stream Body;
}

Looks great, but it doesn’t solve any problem. Remember that the state machine must create the action object and hand it over to the executor. The state machine won’t be able to run any code until the action is executed. This means that the Stream must be filled with the data, so we still have the problem with high memory usage.

Instead of passing the stream, we could pass a stream generator. That could be a lambda or some other interface with yield keyword:

class HttpAction {
	public string Url;
	public string Method;
	public IEnumerable Body;
}

Looks slightly better, but still has issues. If Body wraps any local variables into a closure, then the memory will not be released until the stream is read. Not to mention that it’s much harder to persist the HttpAction object to provide reliability.

Solution

To solve the problem, we need to effectively stream the data. However, since the actions are executed after the state machine is done, we need to stream the data somewhere else – to a local file.

The executor can provide the following abstraction:

class Env{
	public FileWrapper CreateFile();
	public FileWrapper ReadFile(string identifier);
}

class FileWrapper {
	public string Identifier;
	public File FileHandle;
	public void Commit();
}

Now, the state machine can call CreateFile to get a temporary file. Next, the state machine can stream the content to the file. Finally, the state machine calls Commit to indicate to the executor that the file is ready to be persisted. The executor can then upload the file to the persistent store.

Last but not least, we need to modify the action definition:

class HttpAction {
	public string Url;
	public string Method;
	public string BodyFileIdentifier;
}

The action executor can now stream the body from the file. If something fails, the file can be retrieved from the persistent storage and the action can be retried.

This solution is not perfect, though. The data is streamed twice which slows everything down. That’s an obvious trade-off.

State Machine Executor Part 4 — Timeouts, exceptions, suspending

afish — Tue, 21 Oct 2025 14:31:14 +0000

This is the fourth part of the State Machine Executor series. For your convenience you can find other parts in the table of contents in State Machine Executor Part 1 — Introduction

Let’s discuss how to improve reliability of our state machines.

How machines are executed

In part 1, we defined the contract for triggering a single transition. Each transition returns instructions what actions to execute and what transition to call next. We then run in a loop until the state machine is completed.

We can modify this mechanism to deal with crashes, errors, and other undersired effects. Let’s revisit the loop that we defined in part 2:

void Run(StateMachine machine, string initialTransition){
	string state = null;
	string currentTransition = initialTransition;
	StoreHolder store = ReadStore();
	do {
		result = machine.RunTransition(currentTransition, store);
		state = result.CurrentState;
		currentTransition = result.NextTransition;
		MergeAndPersist(store, result.StoreChanges);
		ExecuteActions(result.ActionsToExecute);
	}while(!machine.IsCompleted(state));
}

We read the store before entering the loop. In each loop iteration, we pass the store to the transition, and then update the state and execute actions. We’re now going to modify this solution.

Suspending

The first thing to support is suspension of the state machine. If the machine decides that it needs to wait, it can indicate that in the TransitionResult:

class TransitionResult {
    ....
    bool Suspend;
}

We can now include that in the loop handling:

void Run(StateMachine machine, string initialTransition){
	string state = null;
	string currentTransition = initialTransition;
	StoreHolder store = ReadStore();
	do {
		result = machine.RunTransition(currentTransition, store);
		state = result.CurrentState;
		currentTransition = result.NextTransition;
		MergeAndPersist(store, result.StoreChanges);
		ExecuteActions(result.ActionsToExecute);
		if(result.Suspend) {
			break;
		}
	}while(!machine.IsCompleted(state));
}

We can then proceed with the state machine when the time comes. We can obviously extend that to support sleep or waiting for some condition.

Exceptions

We need to handle unexpected crashes as well. We simply catch the exception and then we need to let the state machine know it happened. We can do that by redirecting the state machine to a well-known transition:

void Run(StateMachine machine, string initialTransition){
	string state = null;
	string currentTransition = initialTransition;
	StoreHolder store = ReadStore();
	do {
		bool hadException = false;
		try{
			result = machine.RunTransition(currentTransition, store);
		}catch(Exception e){
			hadException = true;
		}
		state = result.CurrentState;
		currentTransition = hadException ? "exception-handler-transition" : result.NextTransition;
		MergeAndPersist(store, result.StoreChanges);
		ExecuteActions(result.ActionsToExecute);
		if(result.Suspend) {
			break;
		}
	}while(!machine.IsCompleted(state));
}

We can obviously extend that to give access to the exception or add any additional details.

Timeouts

We would also like to terminate the state machine if it runs for tool long. There are two ways to do that: we can terminate it the hard way by interrupting the thread (in a preemtive way), or we can wait for it to complete the transition (in a cooperative way). No matter what happens, we may want to redirect the state machine to a well-known transition for handling timeouts:

void Run(StateMachine machine, string initialTransition){
	string state = null;
	string currentTransition = initialTransition;
	StoreHolder store = ReadStore();
	bool wasTimedOut = false;
	do {
		if(machine.IsTimedOut(state)){
			wasTimedOut = true;
			currentTransition = "timeout-handler-transition";
		}
		bool hadException = false;
		try{
			result = machine.RunTransition(currentTransition, store);
		}catch(Exception e){
			hadException = true;
		}
		state = result.CurrentState;
		currentTransition = hadException ? "exception-handler-transition" : result.NextTransition;
		MergeAndPersist(store, result.StoreChanges);
		ExecuteActions(result.ActionsToExecute);
		if(result.Suspend) {
			break;
		}
		if(wasTimedOut){
			break;
		}
	}while(!machine.IsCompleted(state));
}

Notice that we stop processing after the timeout transition. Had we not do that, we would run in an infinite loop. If you don’t want to terminate the processing, then make sure you don’t run into rerouting the state machine constantly.

Summary

Next time, we’re going to see how to deal with data streaming and why it’s needed.

State Machine Executor Part 3 — Actions and history

afish — Tue, 14 Oct 2025 12:54:39 +0000

This is the third part of the State Machine Executor series. For your convenience you can find other parts in the table of contents in State Machine Executor Part 1 — Introduction

Our state machines can execute side-effectful actions. But how do they read results?

One approach is to write the result back to the StoreHolder we designed last time. After executing an action, the executor would write the result back as a property specified by the state machine. This works but is much more complex than just one property. What about retries? What about exceptions? What if the property is already there?

Another approach is to keep the list of all executed actions in some kind of even store. Executing an action would generate a new event indicating that the action has been executed. The state machine would then look check the events and act accordingly. If we need to retry the action, we can simply model that as a yet another event. If we have an exception, then it’s another event. And so on.

Effectively, we can model that in the following way:

class StoreHolder {
	...
	IList Events;
}

We can have an event indicating the result of an action:

class ActionExecuted {
	Action ExecutedAction;
	T Result;
	Exception? Exception;
}

We can add many more properties to indicate what exactly happened. We may also consider adding unique identifiers to events, order them based on the timestamps, etc.

Finaly, the state machine can simply traverse the list and find the events it needs.

There is more. Since this is a very generic mechanism, we can also add any sort of communication between the executor and the state machine. For instance, you can initialize the store with some events representing the initial input.

State Machine Executor Part 2 — Fault tolerance

afish — Mon, 13 Oct 2025 08:05:00 +0000

This is the second part of the State Machine Executor series. For your convenience you can find other parts in the table of contents in State Machine Executor Part 1 — Introduction

The code we implemented in the last part is unable to recover from machine crashes. If the process dies midway, we need to start it from scratch. Let’s fix that.

Before going into details, let’s think how we could be triggering the state machine. We could run it on an API call – someone calls the endpoint, we start processing the request, and we trigger the execution along the way. If something dies, the caller will probably retry their call. Another approach is to use a queue. We receive a message from the queue, we start the processing, and we trigger the state machine. If something breaks, the message will get retried. Other scenarios may be similar.

In all of those scenarios, we get a retry due to some other mechanisms. Once we retry, we want to resume the state machine processing. This is very simple conceptually. We just need to recreate the state machine and retrigger the transition. Let’s do that.

State management

The hard part in retrying this way is recovering of the state. The state machine is most likely stateful and calculates something as it goes through the states. We can tackle this in many ways: preserve the whole state machine, provide an interface to read and write data that the state machine would use, or provide a temporary object.

Preserving the state machine in its entirety may be possible, but has many drawbacks. First, we may be unable to serialize the object as we don’t even know what it consists of (it may be loaded dynamically and not owned by us). Second, some objects may be not serializable by definition (like locks, things tied to OS data like threads, etc.). Third, this may impose technological limits (like the programming language you use etc.).

Another approach is to have an interface for the state machine to read and write some pieces of information. For instance, the state machine executor could expose a simple key-value store for the data. Each read and write would be effectively handled by the state machine executor. While this is quite easy, it lacks transactions interleaved with other side effects.

Another approach is a simple dictionary that the state machine can use. This lets the state machine effectively couple the transaction with other side effects. The state machine executor can persist both the changes to the dictionary and the description of the actions in one transaction.

Let’s take this last approach and see how it works. We now would like to have the following object for keeping the changes:

class StoreHolder {
	Dictionary Store;
}

Now, the state machine needs to describe modifications to this store:

class TransitionResult {
	...
	Dictionary StoreChanges;
}

Also, the state machine executor needs to pass this object to the state machine:

class StateMachine {
	...
	TransitionResult RunTransition(string transitionName, StoreHolder store) {...}
}

Finally, this is how we execute the state machine now:

void Run(StateMachine machine, string initialTransition){
	string state = null;
	string currentTransition = initialTransition;
	StoreHolder store = ReadStore();
	do {
		result = machine.RunTransition(currentTransition, store);
		state = result.CurrentState;
		currentTransition = result.NextTransition;
		MergeAndPersist(store, result.StoreChanges);
		ExecuteActions(result.ActionsToExecute);
	}while(!machine.IsCompleted(state));
}

Looks nice. Let’s see what problems we may have with this approach.

Persisting the store

Let’s now see some pros and cons of this approach.

By persisting the store at once, we can easily identify if there are two state machines executing at the same time. This would result in concurrent writes which we can find by using the versions or locks.

By saving the changes after the state machine finishes the transition, we can have the outbox behavior. We persist the store changes and the information what actions to execute. This way, when we can retry the actions in case of crashes. We’ll see that in details in the next part.

This approach is also technology-independent. It’s easy to serialize the key-value dictionary in any technology. However, if the state machine decides to put some complex objects in the store, they need to be serializable and deserializable. Also, they need to be backwards compatible when the state machine code changes. Let’s explore that a little more.

Let’s say that the state machine preserves something like Store["property"] = someObject. If the state machine executor would like to serialize the dictionary now, the someObject value must be serializable. While this sounds trivial, this is often not the case. For instance, many types in Python are not serializable by the built-in solutions like json package. Similarly, objects in Java must implement the Serializable interface or adhere to the requirements of the serialization library. While this is not a big issue, this puts some requirements on the state machine.

Much bigger issues may happen when deserializing the value. First, it may be impossible to deserialize the someObject value due to lack of parameterless constructor or other library requirements. This is not a rare issue.

Worse, we now need to deal with backward and forward compatibility. Let’s say that the state machine is paused and then resumed on some other node. This can be due to a retry or rolling deployment. When the execution is retried, it may happen on either newer or older code version. This means that the store must be deserialized using a different code. If you use a binary serializer, this will most likely cause problems. The same issue may happen if the newer code would like to examine the store written by some older version of the code, like some other state machine execution.

The easiest solution to this problem is to avoid storing complex object entirely. This simplifies the serialization and the deserialization process. However, it doesn’t solve the issue with schema changes and compatibility.

If you need to store complex objects and still want to access stores created by the older state machines, it may be beneficial to store two versions of the store. One version is serialized using a binary serializer that can serialize and deserialize objects of any kind. The other version is stored using some regular JSON serializer that can only serialize the data but can’t deserialize it into complex objects. You would then examine this JSON data as raw JSON objects.

State Machine Executor Part 1 — Introduction

afish — Sun, 12 Oct 2025 16:59:20 +0000

This is the first part of the State Machine Executor series. For your convenience you can find other parts using the links below:
Part 1 — Introduction
Part 2 — Fault tolerance
Part 3 — Actions and history
Part 4 — Timeouts, exceptions, suspending
Part 5 — Streaming
Part 6 — Forking

In this series we explore how to decouple describing “what to do” from actually doing it. This means that instead of doing Console.WriteLine("Hello world!") we just describe that we want the Hello world! to be printed out to the standard output.

Introducing such an abstraction if very beneficial. If we only describe what to do, we get the following:

The business code doesn’t need to worry about actual implementation details but focus on the business part only
We can change implementation of “how things are done” without affecting the business code
It’s easier to add additional layers (e.g., monitoring, logging, scaling) without changing the business code
We can postpone the actual materialization of the side effects
We get history and auditing for free, just by checking the description
Testing is much easier as we don’t need to mock anything or deal with side effects being executed
It’s much easier to inspect what will happen. We can also change the description if needed

It’s actually yet another form of dependency inversion and introducing higher-level APIs for lower-level operations. However, with each generalization, there comes a price of having to adhere to a specific framework of thought.

Conceptual implementation

Let’s start with some pseudocode describing what we want to do. We would like to have a framework for executing finite state machines. The state machine consists of states and transitions between the state. Importantly, whenever the state machine wants to execute a side-effectful operation, it needs to describe them and ask the framework to get them done.

Conceptually, we have the following:

class StateMachine {
	TransitionResult RunTransition(string transitionName) {...}
	bool IsCompleted(string state) {...}
}

The state machine supports running a transition, and can report whether a given state is the terminal one or not. Each TransitionResult describes the following:

class TransitionResult {
	string CurrentState;
	List ActionsToExecute;
	string NextTransition;
}

We see that after running a transition, we get the new state that the machine is in, the list of actions to execute, and the name of the next transition that the state machine would like to run.

Finally, we have the following execution logic:

void Run(StateMachine machine, string initialTransition){
	string state = null;
	string currentTransition = initialTransition;
	do {
		result = machine.RunTransition(currentTransition);
		state = result.CurrentState;
		currentTransition = result.NextTransition;
		ExecuteActions(result.ActionsToExecute);
	}while(!machine.IsCompleted(state));
}

We take the state machine and the initial transition, and then we loop until completed. Nothing fancy here.

There are few missing blocks that we will need to provide based on our needs:

How are actions described? Ideally we’d like to have strongly typed objects and be able to change how the actions are executed. You may need a registery of action executors that can handle specific action types. We can also replace these executors dynamically and change them without touching the business code.
How is the state machine created? You may need a factory that will create the instance based on some input or whatever.
How are actions executed? Simple for loop will be a good start, but you may also run them in parallel or even scale-out. Again, we can change that without touching the business code.

At first glance, this approach may look great. It gives many benefits in terms of testing and code organization, we can optimize many aspects without touching the business code, and this can be adapted to various program flows.

However, this programming model is very limiting, which is not obvious initially. In the next parts, we’ll explore various aspects and see how to tackle them.

Bit Twiddling Part 7 — Change the order of the screens for RDP

afish — Thu, 28 Aug 2025 07:36:55 +0000

This is the seventh part of the Bit Twiddling series. For your convenience you can find other parts in the table of contents in Par 1 — Modifying Android application on a binary level

Today we’re going to tackle the problem of keeping windows in the same place when connecting over RDP. In the other blog post I said, that it’s not possible to control the order of the screens enumerated by mstsc /l programmatically, and that we have to move the screens by plugging them differently. Let’s actually change that with some low-level hacks.

What is the problem again

Let’s say, that I have the following monitors reported by mstsc /l:

We can see that I have 4 monitors:

Monitor 1 is in the center. It’s reported as id 0 in the MSTSC Setup
Monitor 2 is on the left. It’s reported as id 3 in the MSTSC Setup
Monitor 3 is on the right. It’s reported as id 4 in the MSTSC Setup
Monitor 4 is at the bottom. It’s reported as id 30 in the MSTSC Setup

Once I log in to the remote machine, this is what the Display in RDP shows:

Looks good. Most importantly, if I open a window on the Monitor 1 (the one in the center), then it is shown like this:

Important part in this picture is that the notepad is in the center. You don’t need to zoom in to see any other details.

Now, let’s say that I connect another display and make it duplicate the Monitor 1. Windows may decide that the order of the devices changed, and now mstsc /l shows this:

Notice that the Monitor 1 in the center changed its id to 5.

Let’s connect to the RDP server again. This is what Display in RDP shows:

Notice that the monitors changed their numbering. First monitor is now on the left. What’s worse, the windows have moved:

You can see that the notepad moved to the left. It didn’t move actually, it is still on the same first monitor. It’s that the monitor changed its position.

What’s worse, even if I add selectedmonitors:s:5,3,4,30 to the .rdp file, the problem is still there.

Why does it happen? You can read more in this article where I describe it in details. But now, let’s try to fix it.

How it works

We can use API Monitor or other strace-like application to figure out what happens. mstsc uses EnumDisplayDevicesW function to find all the devices.

The API accepts a parameter with the device id (that is the second parameter), and the pointer to the structure where the details will be stored (the third parameter). Most importantly, it returns true or false indicating whether a monitor with given device id exists.

mstsc simply runs in a loop like this:

for(int i=0;;++i){
    if(!EnumDisplayDevicesW(null, i, pointer, flags)){
         break;
    }
}

mstsc iterates over the devices starting from zero until there are no more devices reported by the API. This explains why numbering in mstsc /l is not continuous and why the numbers may change any time.

How can we fix that? We need to hijack the method, check the second parameter, and override it accordingly. With the screenshots above, the loop runs like this:

id = 0 => some virtual screen and returns true
id = 1 => some virtual screen and returns true
id = 2 => some virtual screen and returns true
id = 3 => Monitor 2 and returns true
id = 4 => Monitor 3 and returns true
id = 5 => Monitor 1 and returns true
...
id = 30 => Monitor 4 and returns true
...
id = something big => returns false

We would like it to effectively do something like this:

id = 0 => changes id to 5 => Monitor 1 and returns true
id = 1 => changes id to 3 => Monitor 2 and returns true
id = 2 => changes id to 4 => Monitor 3 and returns true
id = 3 => changes id to 30 => Monitor 4 and returns true
id = 4 => returns false

Let’s do it.

Implementation

As with other dirty hacks, we are going to use the debugger to inject the payload and do some memory operations.

The plan is as follows:

We allocate some memory on the side
We add a payload call_and_return_false that calls the just_call payload and returns false afterwards
We add a payload call_and_return_true that calls the just_call payload and returns true afterwards
We add a payload just_call that restores proper registers and then just calls the original EnumDisplayDevicesW
We find the regular EnumDisplayDevicesW implementation and do the long jump to our main payload
The main payload checks the second parameter (which is in the rdx register). If the parameter should be changed, it is modified and then we jump to the call_and_return_true or call_and_return_false payload accordingly. Otherwise, it jumps to the just_call payload.

Let’s see the payloads in action. First, let’s examine the original EnumDisplayDevicesW method:

u USER32!EnumDisplayDevicesW
0:000> u USER32!EnumDisplayDevicesW
USER32!EnumDisplayDevicesW:
00007ffa`0cd01240 4053            push    rbx
00007ffa`0cd01242 55              push    rbp
00007ffa`0cd01243 56              push    rsi
00007ffa`0cd01244 57              push    rdi
00007ffa`0cd01245 4156            push    r14
00007ffa`0cd01247 4157            push    r15
00007ffa`0cd01249 4881ec98030000  sub     rsp,398h
00007ffa`0cd01250 488b0529e60700  mov     rax,qword ptr [USER32!_security_cookie (00007ffa`0cd7f880)]

Nothing special here. It simply preserves the registers. We’ll need to override this with a long jump to our payload, like this:

mov rax, payload_address
push rax
ret
nop
nop
nop
nop

This way, we override the first 16 bytes.

Okay, let’s now see the main payload:

cmp rdx, monitor_id_1
jne 0x13
mov rdx, overriden_id_1
mov rax, call_and_return_false or call_and_return_true
push rax
ret
cmp rdx, monitor_id_2
jne 0x13
mov rdx, overriden_id_2
mov rax, call_and_return_false or call_and_return_true
push rax
ret
...
cmp rdx, monitor_id_n
jne 0x13
mov rdx, overriden_id_n
mov rax, call_and_return_false or call_and_return_true
push rax
ret
mov rax, just_call
push rax
ret

We have a series of instructions like this:

if(second_parameter == monitor_id_1){
    second_parameter = overriden_id_1
    jump call_and_return_false or call_and_return_true
}
if(second_parameter == monitor_id_2){
    second_parameter = overriden_id_2
    jump call_and_return_false or call_and_return_true
}
...
if(second_parameter == monitor_id_n){
    second_parameter = overriden_id_n
    jump call_and_return_false or call_and_return_true
}
jump just_call

We compare each monitor in a sequence, override the parameter if needed, and jump accordingly.

Now, let’s see the payload for call_and_return_false

movabs rax, call_and_return_false+0x13
push rax
movabs rax, just_call
push rax
ret
mov rax, 0
ret

This conceptually looks like this:

put after_call address on the stack
jump just_call

:after_call
return 0

We do the same for call_and_return_true and just return different value.

The payload for just_call looks like this:

push    rbx
push    rbp
push    rsi
push    rdi
push    r14
push    r15
sub     rsp,398h
movabs rax, EnumDisplayDevicesW+0x16
push rax
ret

We simply run the preamble of the WinAPI function (which we scratched by introducing long jump over there), and then we jump th the WinAPI function in the correct place.

Automation

Let’s now see some sample C# code that does all of that automatically:

public static void Patch(int id){
		Console.WriteLine(id);
		var addresses = RunCbd(id, @"
.sympath srv*C:\tmp*http://msdl.microsoft.com/download/symbols
.reload
.dvalloc 3000
.dvalloc 4000
.dvalloc 5000
.dvalloc 6000
u USER32!EnumDisplayDevicesW
qd
		");
		
		Func> splitInPairs = address => address.Where((c, i) => i % 2 == 0).Zip(address.Where((c, i) => i % 2 == 1), (first, second) => first.ToString() + second.ToString());			
		Func translateToBytes = address => string.Join(" ", splitInPairs(address.Replace(((char)96).ToString(), "").PadLeft(16, '0')).Reverse().Select(p => "0x" + p));
		
		var pattern = "0,0,1-1,3,1-2,5,1-3,30,1-4,0,0";
	
		var freeMemoryForEnumDisplayReturnFalse = addresses.Where(o => o.Contains("Allocated 3000 bytes starting at")).First().Split(' ').Last().Trim();
		var freeMemoryForEnumDisplayReturnTrue = addresses.Where(o => o.Contains("Allocated 4000 bytes starting at")).First().Split(' ').Last().Trim();
		var freeMemoryForEnumDisplayJustCall = addresses.Where(o => o.Contains("Allocated 5000 bytes starting at")).First().Split(' ').Last().Trim();
		var freeMemoryForEnumDisplay = addresses.Where(o => o.Contains("Allocated 6000 bytes starting at")).First().Split(' ').Last().Trim();
		var enumDisplayAddress = addresses.SkipWhile(o => !o.StartsWith("USER32!EnumDisplayDevicesW:"))
				.Skip(1)
				.First().Split(' ').First().Trim();
		
		var enumAfterPayloadReturnAddress = (Convert.ToUInt64(enumDisplayAddress.Replace(((char)96).ToString(),""), 16) + 0x10).ToString("X").Replace("0x", "");		
		var freeMemoryForEnumDisplayReturnFalseAfterCall = (Convert.ToUInt64(freeMemoryForEnumDisplayReturnFalse.Replace(((char)96).ToString(),""), 16) + 23).ToString("X").Replace("0x", "");
		var freeMemoryForEnumDisplayReturnTrueAfterCall = (Convert.ToUInt64(freeMemoryForEnumDisplayReturnTrue.Replace(((char)96).ToString(),""), 16) + 23).ToString("X").Replace("0x", "");
		
		var patternInstruction = "";
		
		if(pattern != ""){
			foreach(var part in pattern.Split('-')){
			var sourceMonitor = int.Parse(part.Split(',')[0]).ToString("X");
			var destinationMonitor = int.Parse(part.Split(',')[1]).ToString("X");
				var returnValue = part.Split(',')[2];

				patternInstruction += @" 0x48 0x83 0xFA " + sourceMonitor + @" ";
				patternInstruction += @" 0x75 13 "; // jump short 13 bytes
				patternInstruction += @" 0x48 0xC7 0xC2 " + destinationMonitor + @" 0x00 0x00 0x00 ";
				patternInstruction += @" 0x48 0xB8 " + translateToBytes(returnValue == "1" ? freeMemoryForEnumDisplayReturnTrue : freeMemoryForEnumDisplayReturnFalse) + @" 0x50 0xC3 ";
			}
		}
		
		var patchEnumDisplayScript = @"
.sympath srv*C:\tmp*http://msdl.microsoft.com/download/symbols
.reload
e	" + enumDisplayAddress + @"	0x48 0xB8 " + translateToBytes(freeMemoryForEnumDisplay) + @" 0x50 0xC3 0x90 0x90 0x90 0x90
e	" + freeMemoryForEnumDisplayReturnFalse + @" 0x48 0xB8 " + translateToBytes(freeMemoryForEnumDisplayReturnFalseAfterCall) + @" 0x50 0x48 0xB8 " + translateToBytes(freeMemoryForEnumDisplayJustCall) + @" 0x50 0xC3 0x48 0xC7 0xC0 0x00 0x00 0x00 0x00 0xC3
e	" + freeMemoryForEnumDisplayReturnTrue + @"	0x48 0xB8 " + translateToBytes(freeMemoryForEnumDisplayReturnTrueAfterCall) + @" 0x50 0x48 0xB8 " + translateToBytes(freeMemoryForEnumDisplayJustCall) + @" 0x50 0xC3 0x48 0xC7 0xC0 0x01 0x00 0x00 0x00 0xC3
e	" + freeMemoryForEnumDisplayJustCall + @" 0x53 0x55 0x56 0x57 0x41 0x56 0x41 0x57 0x48 0x81 0xEC 0x98 0x03 0x00 0x00 0x48 0xB8 " + translateToBytes(enumAfterPayloadReturnAddress) + @" 0x50 0xC3
e	" + freeMemoryForEnumDisplay + @" " + patternInstruction + @" 0x48 0xB8 " + translateToBytes(freeMemoryForEnumDisplayJustCall) + @" 0x50 0xC3

qd
		";
		RunCbd(id, patchEnumDisplayScript);
	}

Most of that should be rather straightforward, but let’s go through it line by line.

We start by allocating some memory in the process and dumping the machine code of the EnumDisplayDevicesW function (lines 3-12).

We then parse the output (lines 14-15 and 19-25). We then calculate some dependent labels (lines 27-29).

Important part is the pattern in line 17. We encode it in the following way (spaces added for clarity):

monitor_id_1,override_id_1,return_true_or_false - monitor_id_2,override_id_2,return_true_or_false - ...

We then parse this pattern and use it in lines 31-44.

Finally, we concatenate all of that together and create the instructions for the debugger (lines 46-57).

Availability Anywhere Part 28 — Remote work best practices

afish — Sun, 03 Aug 2025 09:05:28 +0000

This is the twentieth eighth part of the Availability Anywhere series. For your convenience you can find other parts in the table of contents in Part 1 – Connecting to SSH tunnel automatically in Windows

We can’t just “take what we do now and how we do it” and “send it over the Internet”. We have to change how we work to make our activities more efficient. This blog post describes my tips & tricks for remote work.

Definitions

This section defines necessary terms.

“Working remotely” means that “people don’t need to get together”:

There are at least 2 people involved in a work
People are in at least 2 different locations and do not need to meet together in one place to do the work

“Blocking work” means that “one person needs to wait for another”:

One person performs an action and needs to wait for another person to act in order to move forward

“Working remotely” means “we coordinate to work at the same time”:

Two people coordinate to work at the same time to do the “blocking work”

“Working asynchronously” means “we do not coordinate to work at the same time”:

Two people do not work at the same time to do the “blocking work”

“Work” typically consists of “anything that we produce”:

Producing artifacts (documents, code, emails, etc.)
Sharing screen
Sharing audio
Sharing video (e.g., web camera)

Working remotely and working asynchronously

It’s important to understand the distinction between the “remote work” and the “asynchronous” work. Specifically:

A work may be remote and not asynchronous – e.g., when we hop on a call to discuss something
A work may be non-remote and asynchronous – e.g., when we physically sit in the same place but we don’t get blocked by each other

Most (~90%) of the typical IT work (coding, posting updates, scheduling marketing campaigns, writing documents, deploying artifacts, etc.) can be done remotely and asynchronously. This doesn’t mean that it must be done remotely and asynchronously (e.g., people may prefer a different approach), but if for any reason (e.g., technical, procedural, policy-based) it cannot be done, then you are doing it wrong.

There are few cases in which I believe non-remote and/or synchronous approach might be beneficial (e.g., much easier to achieve):

Building personal relationships (e.g., getting to know each other better)
Teaching (e.g., how to use tools, how to do stuff)
Sharing secrets (e.g., configuring security channels for the first time)
Working on an urgent work (e.g., solving an ongoing issue in production)
Negotiating (e.g., when you can use non-verbal communication for your advantage)

Long story short is: most things in IT can be done remotely and asynchronously. If they can’t be in your case, then you are doing it wrong.

Being efficient when working remotely

This section describes how to be efficient when working remotely.

Work asynchronously

This could also be titled “prefer written forms” as this is what it comes down to most of the time. Two greatest inventions in this world are writing and a printing press. In the modern world, this means that we should prefer written form and make it accessible for others.

Replace your synchronous activities with asynchronous one:

Instead of “hoping on a call”, write down your updates and send them in a written form
Instead of “presenting the idea”, write down a design document and share it with others
Instead of “asking about something”, write down your questions and send them over
Instead of “showing my screen”, record a video, transcribe it, and send it over
Instead of “running a brown bag meeting”, write down your material and share it over, optionally record videos that help with understanding (and transcribe them)
Instead of “asking for updates”, introduce a process in which others post updates in a common form to a well-known place
Instead of “having a daily meeting to discuss ongoing matters”, introduce a process in which others produce updates many times a day

If you need to perform synchronous activities, capture them for later reproduction:

Record and transcribe all your meetings
Summarize meetings in writing (manually or automatically)
Write down notes of what you have done (e.g., links to all the artifacts you produced and pages you visited)

Long story short is: you should work asynchronously by producing a written form that can be easily accessed on demand. If something that you create cannot be consumed in the future in an abridged form (e.g., you prefer speaking in meetings instead of writing a design document, or you don’t record your meetings so they get lost), then you are doing it wrong.

Prepare for technical limitations and disruptions

Be aware that others may use different technical solutions and hardware which may change, limit, or make unusable some of the artifacts that you send to them. Examples include:

Others may not watch your camera feed (e.g., because their device is too slow, or they don’t have good enough Internet connectivity, may want to save battery, or their device doesn’t support the technology).
Others may explicitly disable your camera feed.
Others may watch your screen share on a much smaller device (e.g., on a mobile phone).
Others may receive your screen share in a much lower quality (e.g., because your communication platform limits the screen resolution).
Others may miss some elements (e.g., because they don’t see all the frames of your video, or because something popped up on their screen, or because they are on a train and have a laggy connection).
Others will get your feed delayed (i.e., you can’t beat physics and sending things to the other side of the world takes time) which you can’t reliably predict (e.g., network devices introduce lag mid-way due to routing changes).
Others will get your feed delayed even more because they may work over many layers of remote connections and remote desktops. It’s not uncommon to have 1-2 seconds of delay during a call.
Others may face different delays for things they receive and things they send. Some protocols are known for making received sound much more delayed than the sent one.
Others may not see what you typed on the chat (e.g., because the chat window is hidden, or they can’t access the chat from their mobile phone).
Others may not hear the beginning of your sentence (e.g., because your microphone may have cut that).

While we can fix some of these problems (e.g., we can chip in and buy everyone a powerful computer), we won’t solve all of them. There is no point in fighting with things we can’t solve, so let’s just avoid them. To achieve that, we have to change what we do and how we do it.

Long story short is: others receive different things from what you send.

Replace non-verbal communication

Verbal communication should be avoided and minimized as much as possible. This includes the following:

Replace your non-verbal signals with other forms

If you feel an emotion – say it.
If you want to speak – say it, type it on the chat, or use the “raise your hand” button
If you want to show something with “hand waving” – draw it on the screen
If you agree or disagree – say it, type it on the chat, or use a “reaction” button
When you are done speaking – say it, so others don’t need to wait for a long pause
If you want someone else to chime in – call them by their name explicitly

Confirm implicit signals explicitly

If you think someone feels “this or that” – ask them to describe their feelings
If you think someone agrees or disagrees – ask them to share their perspective explicitly
If someone asked a question like “are there any other questions” and you have nothing – wait few seconds, and then say “no”

Make sure others receive your signals

If you are speaking, make sure that they can hear you
If you are going to end something over chat or use reactions, make sure that others have access to these and follow them

Long story short is: you need to turn non-verbal signals into explicit verbal ones.

Lower the communication complexity

The communication language should be adjusted to all of the consumers (e.g., people you speak with) and as simple as possible. Since there may be more consumers in the future that are unknown at the time of production, this effectively requires us to always use the simplest possible communication techniques.

You could say that others should just learn the language better. That is correct, but it’s not realistic. We only have so much time, and people in IT generally get paid for their IT skills, not for their language skills (as long as the latter are decent enough). Given limited time, people can’t develop all of their skills, but they need to choose some over another. Therefore, just as we keep our code from being unnecessarily complex, we need to do the same with our communication.

This includes:

Use words as they are explained in dictionaries. If you think something is wrong, then do not say “it’s fine”. If you are not interested in how people are doing, then do not say “how are you”. I’m not saying that doing it is wrong (since this is how the language evolved), but I am saying that there is no harm in using words and phrases that are less confusing for foreign speakers who learned the language from books and courses instead of years of casual conversations.
Avoid poetic language and less-popular words. People may not know these words which will make it harder for them to understand your message.
Avoid techniques that may be hard to use for some people. For instance, nearly 5% of people in IT are color blind, so do not use colors that are hard to read for them.
Do not force others to change because of your culture’s specifics or history. People do not try to offend you with how they name branches or what words they use. If you think otherwise, then realize that your country is small when compared to the whole planet – it’s only a single percent of the population (and even the biggest country is ~18% of all people). This means that the remaining (which is typically above 95% of the planet) probably don’t feel offended by the things that offend you. I’m not saying that your culture or history is not important, but you are not the hub of the universe, and I bet you don’t know the history of a few hundred other countries which may be much longer and much more complex than yours. If you think otherwise, then work on this.
Understand that people may consciously use inaccurate words because they don’t know better ones. Do not attach to the words, but focus on what people mean instead.
Do not attach hidden agenda to what others say. Others do not want to offend you by using short and direct messages, or by communicating differently than you are used to. If you think otherwise, then work on this.
Do not think that your interpretation of words, emoticons, emojis, or memes is the only right one and others use these things “incorrectly”. These things were often invented decades ago and you are not the one to tell others what they mean. Understand that there were generations before you on this planet, and that the meaning of many things have changed. Do not derail discussions because others use words or emoticons differently than you would. Focus on what others want to say rather than getting offended because someone used a dot or reacted with “wrong” emoji.
Keep your sarcasm, irony, and funny memes clearly marked and separate from the serious conversations.
Explain your abbreviations and terms you use. Words have different meanings depending on the context. Make it easier for others to understand by explaining and defining all the terms.
Do not assume that others heard what you said or they understood it the way you intended. Help others understand you by repeating your message, sending it over multiple channels, and making sure they understand the seriousness, urgency, and meaning of what you want to communicate.

Long story short is: empirical studies show that most of the IT problems are related to human interactions. Try to understand others better and pick the most probable intention of what they say rather than picking the easiest one to attack. Make your communication simpler and easier to consume for others.

Lower the technical requirements

Anything that you produce should have as minimal technical requirements as possible.

Some examples:

Turn your camera off. As mentioned earlier, people may not watch it anyway. Most of the time, there is no need to use a camera at all. Turn it off.
Don’t share anything bigger than Full HD resolution (1920 x 1080). It’s cool that you have an enormous screen with 5k resolution, but don’t make others watch it on a 13-inch laptop or 7-inch mobile phone.
Do not share black screen with white text. Communication platforms compress the image which makes it unreadable. Do not make others use tools that invert colors. Sidenote: if you can’t stand the white background because it’s too bright, then your screen is most likely misconfigured – dim it and use night mode or similar settings.
Do not send enormous images, videos with crazy high FPS or resolution. For typical office work, Full HD with 20 FPS is way more than needed most of the time. Do not make others decode stuff needlessly when it doesn’t bring any value. Save the planet and reduce the carbon footprint by not congesting the network with things nobody needs.

Long story short is: turn your camera off, change background to white, limit resolution to Full HD and 20 FPS. Do not produce higher quality than needed. Text is always easier to consume than video.

Find your work-life blend

You can use many tools to add value to what you do and how you deliver to find your work-life blend. Examples include:

Draw diagrams

Use tools like Excalidraw to draw diagrams on the fly. Open Excalidraw in your browser, share your screen, and you can now prototype live with others.
Get yourself a touch screen and a pen to draw diagrams easier. This lets you draw faster.

Post short videos

Record short videos to show things to others. Use recording features of your communication platforms, or use https://screen-recorder.com/ to do that. Also, use the autotranscribe mechanism thanks to which you don’t need to type long messages and yet others can still read your things instead of watching videos.
It may sound like a very time consuming activity, but in reality it’s not. It takes one minute to record a video once you know your tooling.
When you submit a pull request, record a short (1-3 minutes) video in which you walk others through the code changes. This helps others review your changes much easier.
Record a separate video in which you show how the application changed. It’s basically demoing a feature. Keep it as a separate video because it may be shared with a different audience (end clients, product owners, etc.) and you don’t want to couple it with going through internals.
When reporting an issue or asking for help, just show what happens instead of describing it.

Make your private things public

Instead of having private chats with others, consider having a public channel where others may come and ask you questions. This makes it searchable and you can share it with others in the future.
Keep your working notes in a document that you can share easily. Do this especially for debugging notes, configuration notes, or helper scripts.

Jira is your best friend

Keep a history of what you did in your tracking system. Put comments over there many times a day and mark others involved in specific activities.
This way others can easily see the history. You will also use this many times in the future when you’ll need to figure out what you did.
Inform others what you do by pushing updates on them. Have a space where you can freely drop a line like “working on X” which doesn’t require any reaction or response from others.

Don’t repeat yourself

Avoid doing the same thing many times. Automate, record, transcribe, or write down. If you see that you explain something again, then write down or record your explanation and share it afterwards.
Do not lose your work. Keep the history of your terminal commands to not figure out the same things many times.

Make your camera feed consistent

Replace your background with some fake one to make it consistent no matter where you work from. Blurring background may not be enough as details may change when you go to a different room.
Avoid animated backgrounds or flickering screens. People may get distracted.

Use good microphone and mute yourself

Get yourself a microphone that can remove most of the surroundings. Instead of using earbuds, get a headset with a microphone on a stick.
Mute your microphone whenever you are not speaking.

Use many devices and relax while taking meetings

You don’t need to work from one computer over and over again. Work on a good remote desktop configuration, so you can work from your living room, balcony, city park, or from your car.
You don’t need to sit in your room when taking meetings. Go for a walk, stretch a little, or work from your bed. Just make sure you don’t leak confidential information this way.

Exercise while you work

Get yourself a bicycle, treadmill, or stepper and use it during meetings.
Make sure others don’t hear the noise and don’t get distracted by your head wobbling.

Find balance in your work-life balance or work-life blend

If you work asynchronously, then you most likely don’t need to stick to a 9-5 routine. It’s not wrong to do so, but many people find it refreshing to change their rigid schedule.
Take a break mid day to do groceries or go swimming.
Understand for which activities you need to have your computer and which ones you can perform using mobile phones or other devices. You probably don’t need a full blown workstation to reply to an email or post updates on your IM.

Long story short is: there are many ways to improve your remote work environment and what you do. Keep looking.

Prepare for the future

Things will not get “easier” in the future. You need to prepare for it and it’s best to start now.

Synchronous work is impossible in the space

A synchronous way of working is just not the way. You won’t be able to work remotely and synchronously with people on the Moon or Mars.

Instead of pretending like you can work remotely the same way you work non-remotely, just learn how to do the remote work better.

Technical limitations and disruptions will intensify in the near future

Netflix decreased their video quality globally during Covid era because the Internet just couldn’t keep up with the increased bandwidth. Your network connections improve much slower than the computing requirements of the things we use. Just do the math and see how much more bandwidth is needed for an 8k 60 FPS video stream over the 640×480 20 FPS stream we had in the past.

You might say that computers get faster and faster, so we don’t need to worry. That is simply wrong. Microsoft Teams page in Google Chrome will not run on a computer with 4 GBs of memory anymore. Many remote desktops have no more than 16 GBs of memory (and often less), and we expect users to run IDEs and Microsoft Teams, share the screen, send video with blurred backgrounds, and receive the camera feeds of everyone on the call at the same time. This just doesn’t work.

Communication will become more diversified

People travel around the world a lot. This makes languages evolve even more. New words come, phrases change their meaning, cultural context is much less known. Spend a few more seconds when preparing the message to avoid wasting hours/days on misunderstandings.

At the same time, new generations redefine the meaning of older things. The Internet is full of flamewars around the meaning of emojis, dots at the end of the sentence, or memes that were “misunderstood”. Pick your battles and don’t waste your life on things that don’t matter.

Summary

Remember the following:

Do things remotely and asynchronously.
Work asynchronously by producing a written form that can be easily accessed on demand. Make sure that what you create can be consumed in the future in an abridged form.
Others receive different things from what you send. Make sure that they can still consume your messages despite the technical issues.
Replace your non-verbal signals with explicit verbal ones.
Try to understand others better and pick the most probable intention of what they say rather than picking the easiest one to attack. Make your communication simpler and easier to consume for others.
Turn your camera off.
Change background to white.
Limit the resolution to Full HD and 20 FPS.
Do not produce higher quality than needed. Text is always easier to consume than video.
Record short videos and use the autotranscribe feature.
Draw diagrams with a touch screen.
Use many devices to improve your work-life blend.
Go for a walk and exercise during the day.

Instead of pretending like you can work remotely the same way you work non-remotely, just learn how to do the remote work better.

Appendix: On trust in communication

I’m a little more blunt in this paragraph to drive my point home. Feel free to disagree.

We sometimes shake our hands. I once heard this evolved from ages ago when people shaked their forearms to check if others hid knives in their sleeves. No matter if it’s true or not, we do not suspect others carrying a knife these days to attack us. Sure, the world is not always a safe place, but most of the time we don’t feel danger when meeting our coworker.

Similar things happen in our work environments. Most of the time, we have no logical reason to assume that others want to harm us, offend us, play dirty tricks, or undermine our work. We have no logical reason to not trust them, but we are very entitled to assume good intentions.

Many languages on this planet evolved with mechanisms to show respect and assure others that we mean no harm. For instance, we don’t call others by their first name but we say words like “mister” or “mistress”. By doing that, we convey a message “I know how you are, but we are not close friends yet, but I assure I mean no harm”. This makes perfect sense. We are not afraid of others attacking us physically, but we acknowledge that we may disagree with them and this still sparks emotions and may lead to conflicts. By calling someone “mister”, we still show respect, even though we disagree or suggest someone is wrong.

However, these things may be called redundant when you realize we have no reason to assume anything but good intentions. That’s why some languages evolved to drop these “official” titles and call others by their first name. This makes little sense to me. There is a distance between people and it’s wrong to pretend that it’s not there. My coworkers are not the same friends as some people I’ve known for decades. That’s why “colleagues” is a much better word to describe our relationship. If you pretend that we’re friends and you call me by my first name on our first encounter, then I’m okay with that, but I’m still aware that it doesn’t work like that.

Now, coming to the point. If you feel attacked because I asked a direct question like “why are you doing that?” instead of saying something like “I’m pretty sure I must have missed what you said and that’s on me, but could you please explain to me again why you are doing that?”, then it’s on you. You have no reason to assume bad intentions from my end. I’m not attacking you by not being super polite. There is no reason for me to speak like you are not an adult.

When I ask “why are you doing that”, I’m not suggesting that “you shouldn’t be doing that”, or that “you are doing it wrong”, or anything of that sort. Attaching hidden meaning is a problem that you developed in the English-speaking culture by saying “it’s fine” when you really think it’s not, or by dropping “mister” to pretend like we are all friends. I assure you that I mean no harm, and if you think otherwise, then you probably feel insecure and have some trust issues. Just work on that.

Appendix: On offensive language

I’m a little more blunt in this paragraph to drive my point home. Feel free to disagree.

I understand that bad things happened in the past. All countries have similar stories. That’s life. But if you now feel offended because someone says “blacklisting”, names their branch “master”, or “grooms” the items instead of refining them, then it’s on you. Most of the population of our planet (like 95+%) don’t feel offended. I’m not going to spend all my time and energy to avoid some words just because you find them offensive.

What you should do instead, is assume good intentions. And if some words really trigger you so much that you really can’t focus on the essence of the message instead of the wording, then work on it.

Appendix: On emojis

I’m a little more blunt in this paragraph to drive my point home. Feel free to disagree.

I remember the days with no Internet, no mobile phones, and no computers. I also remember when the Internet got widespread, emoticons became common, and instant messengers started to support more and more things like images, memes, reactions, and emojis. I find it funny when someone comes and starts educating me about the meaning of emojis or emoticons because I was around when these things were popularized. If you now want to redefine the meaning of these things, then go ahead, but I’m not going to pay attention to that.

Just assume good intentions. That’s all it takes.

I was taught to finish my sentences with a dot. If you think that I mean something else by putting a dot at the end of the sentence, then I assure you it’s not the case. Same goes for an emoji of a smiling face. A smiling face is widely considered a positive signal. If you think it’s sarcasm or some hidden agenda, then I assure you it’s not the case. And if you feel offended by “thumbs up” reaction or any other trend that emerges after I finish writing this blog post, then I assure you that it’s most likely not the case.

I’ve seen many flame wars about dots or emojis. I don’t want to spend my time and energy on that. You may call me selfish because I “just don’t want to learn what emojis really mean or that dots at the end of the message carry some other meaning”. Or you may just realize that this “meaning” you’re speaking about is different from what most of this planet thinks. If I feel angry, I’ll just let you know directly. I won’t be sending subtle signals by using emoji of a smiling face that you think may mean some sarcasm. I’ll be much more direct than that. So if you want me to learn yet another new “meaning” of an emoji invented way before Generation Alpha got their first mobile phone, then it’s on you.