This is the fifteenth part of the Async Wandering series. For your convenience you can find other parts in the table of contents in Part 1 – Why creating Form from WinForms in unit tests breaks async?
You probably heard that async
is all about not blocking the operating system level thread. This is the fundamental principle of asynchronous programming. If you block the operating system level thread, you lose all the benefits of the asynchronous code.
You also need to keep in mind how to write your C# code. You probably heard that you should keep async
all the way up. This is rather easy to keep because the compiler takes care of that. What’s slightly harder to remember is to keep ConfigureAwait(false)
all the way down. If you don’t do it this way, the compiler won’t help you and you may run into some nasty deadlocking issues, especially if you use some weird SynchronizationContext
.
Last but not least, you probably know that the asynchronous code is only useful if your code is IO-bound. You probably heard that many times. However, what might be very surprising is that C# actually does a lot to make your application work even if your code is CPU-bound and you still use async
. This is very misleading and may let you believe that you know async
, whereas you only know async
in C#. Let’s see why.
Table of Contents
There is plenty of no threads!
One of the best articles about async
is C# is titled There Is No Thread. Stephen Cleary shows that it’s all about continuations and juggling the lambdas to run your code when some IO-bound operation finishes. I even used this title in my Internals of Async talk in which I explain all the internals of synchronization contexts, continuations, and the machinery behind the scenes.
However, it’s only a figure of speech. At the very end of the day, we need to have some thread to run the code. Depending on your synchronization context, there may be some dedicated thread to run the continuations (like in desktop or Blazor applications), or we can use threads from the thread pool. If you think carefully about the asynchronous code, you should notice that this is the place where C# either bites you hard (and causes many deadlocks) or saves your application even if you are doing something very wrong. How? Because C# uses many threads.
By default, C# uses the thread pool to run continuations. The thread pool runs some not-so-trivial algorithm to spawn new threads when there is plenty of work to be done. This is not part of the asynchronous programming paradigm per se. This is just the implementation detail of C#’s asynchronous code which heavily impacts how your applications scale. Other languages don’t do it in the same way and what works well in C# may fail badly somewhere else. For instance, Python’s asyncio uses just one thread even though Python supports multithreading. While this is just an implementation detail, it have tremendous performance implications. Let’s see why.
One thread can kill you
Let’s take a typical message processing flow. We take a message from the service bus, refresh the lease periodically, and process the message in the background. Let’s say that our flow is IO-bound and we use async
to benefit from non-blocking thread instead of spawning multiple threads. Let’s simulate the system. You can find the whole code in this gist.
We start with a message that will store when we received it, when we refreshed the lease for the last time, the identifier of the message, and the final status (if we lost the message or finished successfully):
1 2 3 4 5 6 7 8 |
public class Message { public DateTime LastRefreshTime { get; set; } public DateTime ReceiveTime { get; set; } public bool WasLost { get; set; } public bool WasFinished { get; set; } public int Id { get; set; } } |
Now, we want to configure timings in our application. We can specify how long it takes to receive the message from the bus, how many operations we need to perform on each message, and how long they all take:
1 2 3 4 5 6 7 8 9 10 11 |
public class Settings { public int MessagesCount { get; set; } public TimeSpan ReceivingDuration { get; set; } public TimeSpan ProcessingCPUDuration { get; set; } public TimeSpan ProcessingIODuration { get; set; } public int ProcessingLoops { get; set; } public TimeSpan RefreshDelay { get; set; } public TimeSpan RefreshDuration { get; set; } public TimeSpan MessageLeaseTimeout { get; set; } } |
Finally, we have some statistics showing how we did:
1 2 3 4 5 6 |
public class Stats { public int GeneratedMessages { get; set; } public int ProcessedSuccessfully { get; set; } public int Lost { get; set; } } |
Let’s now see the scaffodling code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 |
public static async Task Simulate() { Initialize(); StartMonitoringThread(); await RunLoop(); } private static void Initialize() { CurrentStats = new Stats(); CurrentSettings = new Settings { ReceivingDuration = TimeSpan.FromMilliseconds(250), RefreshDelay = TimeSpan.FromSeconds(2), RefreshDuration = TimeSpan.FromMilliseconds(250), MessageLeaseTimeout = TimeSpan.FromSeconds(5), ProcessingCPUDuration = TimeSpan.FromMilliseconds(100), ProcessingIODuration = TimeSpan.FromMilliseconds(1000), ProcessingLoops = 20, MessagesCount = 100 }; } private static void StartMonitoringThread() { Thread monitoringThread = new Thread(() => { while (true) { Thread.Sleep(TimeSpan.FromSeconds(3)); Log($"Received messages {CurrentStats.GeneratedMessages}, " + $"success {CurrentStats.ProcessedSuccessfully}, " + $"failed {CurrentStats.Lost}, " + $"still running {CurrentStats.GeneratedMessages - CurrentStats.ProcessedSuccessfully - CurrentStats.Lost}"); } }); monitoringThread.IsBackground = true; monitoringThread.Start(); } |
We have the Simulate
method that runs the magic. It starts by initializing the timings and setting up some monitoring thread to print statistics every three seconds.
When it comes to the timings: we will run 20 loops for each message. In each loop’s iteration, we will do some CPU-bound operation (taking 100 milliseconds), and then some IO-bound operation (taking 1000 milliseconds). We can see that the CPU operation is 10 times shorter than the IO-bound one.
Finally, we have the heart of our system:
1 2 3 4 5 6 7 8 9 10 11 |
private static async Task RunLoop() { while (true) { var message = await ReceiveMessage(); if (message == null) continue; KeepLease(message); ProcessMessage(message); } } |
We receive the message, keep refreshing the lease, and process the message. Some error-handling code is omitted for brevity.
Receiving the message is rather straightforward – we check if we have more messages in the queue, then take one, otherwise we simply return:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
public static async Task<Message?> ReceiveMessage() { await Task.Yield(); await Task.Delay(CurrentSettings.ReceivingDuration); if (CurrentSettings.MessagesCount-- > 0) { CurrentStats.GeneratedMessages++; Message message = new Message { LastRefreshTime = DateTime.Now, WasLost = false, Id = CurrentStats.GeneratedMessages, ReceiveTime = DateTime.Now }; Log($"New message received with id {message.Id}"); return message; } else { return null; } } |
Keeping a lease is also clear – we wait for some time, then refresh the lease and check if we made it on time:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
public static async Task KeepLease(Message message) { await Task.Yield(); while (message.WasFinished == false) // This is unsafe according to memory model { await Task.Delay(CurrentSettings.RefreshDelay); if (DateTime.Now > message.LastRefreshTime + CurrentSettings.MessageLeaseTimeout) { message.WasLost = true; CurrentStats.Lost++; Log($"Lost lease for message {message.Id}"); return; } else { await Task.Delay(CurrentSettings.RefreshDuration); Log($"Refreshed lease for message {message.Id}"); message.LastRefreshTime = DateTime.Now; } } CurrentStats.ProcessedSuccessfully++; } |
Finally, the heart of our message processing. We simply run a loop and do the work:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
public static async Task ProcessMessage(Message message) { await Task.Yield(); for (int part = 0; part < CurrentSettings.ProcessingLoops && message.WasLost == false; ++part) { Thread.Sleep(CurrentSettings.ProcessingCPUDuration); // CPU-bound part await Task.Delay(CurrentSettings.ProcessingIODuration); // IO-bound part } message.WasFinished = true; if (!message.WasLost) { Log($"Finished message with id {message.Id} in {DateTime.Now - message.ReceiveTime}"); } } |
Notice that we block the thread for the CPU-bound operation and use await
for the IO-bound one.
We also have this logging method that prints the timestamp, thread ID, and the message:
1 2 3 4 |
public static void Log(string message) { Console.WriteLine($"{DateTime.Now}\t{Thread.CurrentThread.ManagedThreadId}\t{message}"); } |
Let’s run the code, let it go for a while, and then see what happens:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 |
11/9/2024 12:05:06 PM 4 New message received with id 1 11/9/2024 12:05:06 PM 6 New message received with id 2 11/9/2024 12:05:06 PM 6 New message received with id 3 11/9/2024 12:05:07 PM 8 New message received with id 4 11/9/2024 12:05:07 PM 8 New message received with id 5 11/9/2024 12:05:07 PM 4 New message received with id 6 11/9/2024 12:05:08 PM 4 New message received with id 7 11/9/2024 12:05:08 PM 4 New message received with id 8 11/9/2024 12:05:08 PM 11 New message received with id 9 11/9/2024 12:05:08 PM 8 Refreshed lease for message 1 11/9/2024 12:05:08 PM 12 New message received with id 10 11/9/2024 12:05:08 PM 11 Refreshed lease for message 2 11/9/2024 12:05:09 PM 12 New message received with id 11 11/9/2024 12:05:09 PM 7 Received messages 11, success 0, failed 0, still running 11 11/9/2024 12:05:09 PM 11 Refreshed lease for message 3 11/9/2024 12:05:09 PM 11 New message received with id 12 11/9/2024 12:05:09 PM 12 Refreshed lease for message 4 11/9/2024 12:05:09 PM 6 New message received with id 13 11/9/2024 12:05:09 PM 8 Refreshed lease for message 5 11/9/2024 12:05:09 PM 12 New message received with id 14 11/9/2024 12:05:10 PM 12 Refreshed lease for message 6 11/9/2024 12:05:10 PM 12 New message received with id 15 11/9/2024 12:05:10 PM 11 Refreshed lease for message 7 11/9/2024 12:05:10 PM 11 New message received with id 16 11/9/2024 12:05:10 PM 6 Refreshed lease for message 8 11/9/2024 12:05:10 PM 12 New message received with id 17 11/9/2024 12:05:10 PM 11 Refreshed lease for message 9 11/9/2024 12:05:10 PM 8 New message received with id 18 11/9/2024 12:05:10 PM 11 Refreshed lease for message 1 11/9/2024 12:05:11 PM 4 Refreshed lease for message 10 11/9/2024 12:05:11 PM 13 New message received with id 19 11/9/2024 12:05:11 PM 12 Refreshed lease for message 2 11/9/2024 12:05:11 PM 13 Refreshed lease for message 11 11/9/2024 12:05:11 PM 11 New message received with id 20 11/9/2024 12:05:11 PM 6 Refreshed lease for message 3 11/9/2024 12:05:11 PM 4 Refreshed lease for message 12 11/9/2024 12:05:11 PM 13 New message received with id 21 11/9/2024 12:05:11 PM 13 Refreshed lease for message 4 11/9/2024 12:05:11 PM 4 Refreshed lease for message 13 11/9/2024 12:05:11 PM 4 New message received with id 22 11/9/2024 12:05:12 PM 4 Refreshed lease for message 5 11/9/2024 12:05:12 PM 6 Refreshed lease for message 14 11/9/2024 12:05:12 PM 13 New message received with id 23 11/9/2024 12:05:12 PM 7 Received messages 23, success 0, failed 0, still running 23 11/9/2024 12:05:12 PM 4 Refreshed lease for message 6 11/9/2024 12:05:12 PM 4 Refreshed lease for message 15 ... 11/9/2024 12:05:50 PM 15 Finished message with id 84 in 00:00:22.3550821 11/9/2024 12:05:50 PM 15 Refreshed lease for message 83 11/9/2024 12:05:50 PM 4 Refreshed lease for message 100 11/9/2024 12:05:50 PM 8 Finished message with id 85 in 00:00:22.3554494 11/9/2024 12:05:50 PM 15 Refreshed lease for message 84 11/9/2024 12:05:50 PM 20 Refreshed lease for message 92 11/9/2024 12:05:50 PM 20 Finished message with id 86 in 00:00:22.3882717 11/9/2024 12:05:50 PM 8 Refreshed lease for message 93 11/9/2024 12:05:51 PM 4 Refreshed lease for message 85 11/9/2024 12:05:51 PM 20 Finished message with id 87 in 00:00:22.3452990 11/9/2024 12:05:51 PM 4 Refreshed lease for message 94 11/9/2024 12:05:51 PM 20 Refreshed lease for message 86 11/9/2024 12:05:51 PM 7 Received messages 100, success 86, failed 0, still running 14 11/9/2024 12:05:51 PM 14 Finished message with id 88 in 00:00:22.3968974 11/9/2024 12:05:51 PM 13 Refreshed lease for message 87 11/9/2024 12:05:51 PM 4 Refreshed lease for message 95 11/9/2024 12:05:51 PM 4 Refreshed lease for message 88 11/9/2024 12:05:51 PM 14 Finished message with id 89 in 00:00:22.3782384 11/9/2024 12:05:51 PM 4 Refreshed lease for message 96 11/9/2024 12:05:51 PM 15 Finished message with id 90 in 00:00:22.3557212 11/9/2024 12:05:51 PM 15 Refreshed lease for message 89 11/9/2024 12:05:52 PM 13 Refreshed lease for message 97 11/9/2024 12:05:52 PM 20 Refreshed lease for message 90 11/9/2024 12:05:52 PM 15 Finished message with id 91 in 00:00:22.4805351 11/9/2024 12:05:52 PM 15 Refreshed lease for message 98 11/9/2024 12:05:52 PM 20 Refreshed lease for message 91 11/9/2024 12:05:52 PM 15 Refreshed lease for message 99 11/9/2024 12:05:52 PM 15 Finished message with id 92 in 00:00:22.3979587 11/9/2024 12:05:52 PM 4 Refreshed lease for message 100 11/9/2024 12:05:52 PM 20 Finished message with id 93 in 00:00:22.3374987 11/9/2024 12:05:53 PM 4 Refreshed lease for message 92 11/9/2024 12:05:53 PM 4 Finished message with id 94 in 00:00:22.3451488 11/9/2024 12:05:53 PM 4 Refreshed lease for message 93 11/9/2024 12:05:53 PM 13 Refreshed lease for message 94 11/9/2024 12:05:53 PM 13 Finished message with id 95 in 00:00:22.3784563 11/9/2024 12:05:53 PM 13 Finished message with id 96 in 00:00:22.3800325 11/9/2024 12:05:53 PM 13 Refreshed lease for message 95 11/9/2024 12:05:54 PM 4 Finished message with id 97 in 00:00:22.3312738 11/9/2024 12:05:54 PM 20 Refreshed lease for message 96 11/9/2024 12:05:54 PM 7 Received messages 100, success 96, failed 0, still running 4 11/9/2024 12:05:54 PM 13 Finished message with id 98 in 00:00:22.3502617 11/9/2024 12:05:54 PM 4 Refreshed lease for message 97 11/9/2024 12:05:54 PM 13 Finished message with id 99 in 00:00:22.3527442 11/9/2024 12:05:54 PM 4 Refreshed lease for message 98 11/9/2024 12:05:54 PM 4 Finished message with id 100 in 00:00:22.3675039 11/9/2024 12:05:54 PM 13 Refreshed lease for message 99 11/9/2024 12:05:55 PM 13 Refreshed lease for message 100 11/9/2024 12:05:57 PM 7 Received messages 100, success 100, failed 0, still running 0 |
We can see that all messaged were processed successfully in around 50 seconds. Processing a message was taking around 22 seconds which makes perfect sense since we had 20 iterations taking around 1100 milliseconds each. No failures, all was good.
Let’s now increase the CPU-bound operation time to 1 second (to match the IO-bound part). This is what happens:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 |
11/9/2024 12:06:57 PM 8 New message received with id 1 11/9/2024 12:06:57 PM 11 New message received with id 2 11/9/2024 12:06:58 PM 11 New message received with id 3 11/9/2024 12:06:58 PM 4 New message received with id 4 11/9/2024 12:06:58 PM 13 New message received with id 5 11/9/2024 12:06:58 PM 13 New message received with id 6 11/9/2024 12:06:59 PM 14 New message received with id 7 11/9/2024 12:06:59 PM 11 New message received with id 8 11/9/2024 12:06:59 PM 4 New message received with id 9 11/9/2024 12:06:59 PM 15 Refreshed lease for message 1 11/9/2024 12:06:59 PM 4 New message received with id 10 11/9/2024 12:07:00 PM 12 Refreshed lease for message 2 11/9/2024 12:07:00 PM 12 New message received with id 11 11/9/2024 12:07:00 PM 7 Received messages 11, success 0, failed 0, still running 11 11/9/2024 12:07:00 PM 6 Refreshed lease for message 3 11/9/2024 12:07:00 PM 16 New message received with id 12 11/9/2024 12:07:00 PM 14 Refreshed lease for message 4 11/9/2024 12:07:00 PM 14 New message received with id 13 11/9/2024 12:07:00 PM 15 Refreshed lease for message 5 11/9/2024 12:07:00 PM 4 New message received with id 14 11/9/2024 12:07:01 PM 13 Refreshed lease for message 6 11/9/2024 12:07:01 PM 12 New message received with id 15 11/9/2024 12:07:01 PM 6 Refreshed lease for message 7 11/9/2024 12:07:01 PM 16 New message received with id 16 11/9/2024 12:07:01 PM 14 Refreshed lease for message 8 11/9/2024 12:07:01 PM 4 Refreshed lease for message 9 11/9/2024 12:07:02 PM 13 Refreshed lease for message 1 11/9/2024 12:07:02 PM 12 Refreshed lease for message 10 11/9/2024 12:07:02 PM 12 New message received with id 17 11/9/2024 12:07:02 PM 6 Refreshed lease for message 2 11/9/2024 12:07:02 PM 16 Refreshed lease for message 11 11/9/2024 12:07:02 PM 14 Refreshed lease for message 3 11/9/2024 12:07:02 PM 14 Refreshed lease for message 12 11/9/2024 12:07:02 PM 4 Refreshed lease for message 4 11/9/2024 12:07:03 PM 13 Refreshed lease for message 13 11/9/2024 12:07:03 PM 13 Refreshed lease for message 5 11/9/2024 12:07:03 PM 12 Refreshed lease for message 14 11/9/2024 12:07:03 PM 12 New message received with id 18 11/9/2024 12:07:03 PM 7 Received messages 18, success 0, failed 0, still running 18 11/9/2024 12:07:03 PM 16 Refreshed lease for message 6 11/9/2024 12:07:03 PM 16 Refreshed lease for message 15 ... 11/9/2024 12:08:38 PM 21 Finished message with id 88 in 00:00:42.4181405 11/9/2024 12:08:38 PM 42 Finished message with id 90 in 00:00:41.9138898 11/9/2024 12:08:38 PM 8 Refreshed lease for message 91 11/9/2024 12:08:38 PM 28 Refreshed lease for message 85 11/9/2024 12:08:38 PM 28 Refreshed lease for message 88 11/9/2024 12:08:39 PM 24 Finished message with id 85 in 00:00:43.9296927 11/9/2024 12:08:39 PM 16 Finished message with id 93 in 00:00:40.6588087 11/9/2024 12:08:39 PM 7 Received messages 100, success 88, failed 0, still running 12 11/9/2024 12:08:39 PM 21 Refreshed lease for message 93 11/9/2024 12:08:39 PM 23 Refreshed lease for message 94 11/9/2024 12:08:39 PM 16 Refreshed lease for message 92 11/9/2024 12:08:40 PM 23 Refreshed lease for message 97 11/9/2024 12:08:40 PM 16 Refreshed lease for message 99 11/9/2024 12:08:40 PM 21 Refreshed lease for message 96 11/9/2024 12:08:40 PM 42 Refreshed lease for message 98 11/9/2024 12:08:40 PM 16 Refreshed lease for message 90 11/9/2024 12:08:40 PM 42 Refreshed lease for message 95 11/9/2024 12:08:40 PM 42 Refreshed lease for message 100 11/9/2024 12:08:40 PM 24 Finished message with id 94 in 00:00:41.3801588 11/9/2024 12:08:40 PM 8 Finished message with id 92 in 00:00:41.9121719 11/9/2024 12:08:40 PM 8 Finished message with id 95 in 00:00:41.1802881 11/9/2024 12:08:40 PM 24 Finished message with id 96 in 00:00:40.9180345 11/9/2024 12:08:40 PM 24 Refreshed lease for message 91 11/9/2024 12:08:40 PM 52 Refreshed lease for message 85 11/9/2024 12:08:41 PM 24 Finished message with id 98 in 00:00:41.3325727 11/9/2024 12:08:41 PM 21 Finished message with id 91 in 00:00:43.1595798 11/9/2024 12:08:42 PM 24 Refreshed lease for message 94 11/9/2024 12:08:42 PM 42 Refreshed lease for message 92 11/9/2024 12:08:42 PM 24 Refreshed lease for message 99 11/9/2024 12:08:42 PM 8 Refreshed lease for message 97 11/9/2024 12:08:42 PM 42 Refreshed lease for message 96 11/9/2024 12:08:42 PM 24 Finished message with id 97 in 00:00:42.6127819 11/9/2024 12:08:42 PM 42 Refreshed lease for message 98 11/9/2024 12:08:42 PM 42 Finished message with id 99 in 00:00:40.4341505 11/9/2024 12:08:42 PM 7 Received messages 100, success 95, failed 0, still running 5 11/9/2024 12:08:42 PM 21 Refreshed lease for message 95 11/9/2024 12:08:42 PM 42 Refreshed lease for message 100 11/9/2024 12:08:43 PM 42 Refreshed lease for message 91 11/9/2024 12:08:43 PM 8 Finished message with id 100 in 00:00:41.1363357 11/9/2024 12:08:44 PM 21 Refreshed lease for message 97 11/9/2024 12:08:44 PM 52 Refreshed lease for message 99 11/9/2024 12:08:44 PM 52 Refreshed lease for message 100 11/9/2024 12:08:45 PM 7 Received messages 100, success 100, failed 0, still running 0 |
This time it took nearly 2 minutes to process all the messages. Each message is now taking around 40 seconds. Still, all worked.
Let’s now talk about threads. You can see that the examples use multiple threads to handle the messages. In the second execution, there were around 60 active messages at one time, so this created many threads (we can see that at least 50 threads were created based on the log above). Our application scales well and we can’t complain. Seems like async
is doing a really good job!
However, what would happen if we moved this code to some other asynchronous platform? For instance, to Python’s asyncio that uses only single thread? We can emulate that in C# by running the code above in a WinForms context that forces continuations to go through one thread. Let’s change the CPU-bound operation duration to 100 milliseconds (to the original value) and let’s run this from the WinForms app now:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 |
11/9/2024 12:19:20 PM 1 New message received with id 1 11/9/2024 12:19:20 PM 1 New message received with id 2 11/9/2024 12:19:20 PM 1 New message received with id 3 11/9/2024 12:19:21 PM 1 New message received with id 4 11/9/2024 12:19:21 PM 1 New message received with id 5 11/9/2024 12:19:22 PM 1 New message received with id 6 11/9/2024 12:19:22 PM 1 New message received with id 7 11/9/2024 12:19:22 PM 1 Refreshed lease for message 1 11/9/2024 12:19:22 PM 1 New message received with id 8 11/9/2024 12:19:22 PM 11 Received messages 8, success 0, failed 0, still running 8 11/9/2024 12:19:23 PM 1 Refreshed lease for message 2 11/9/2024 12:19:23 PM 1 New message received with id 9 11/9/2024 12:19:23 PM 1 Refreshed lease for message 3 11/9/2024 12:19:23 PM 1 New message received with id 10 11/9/2024 12:19:23 PM 1 Refreshed lease for message 4 11/9/2024 12:19:24 PM 1 Refreshed lease for message 5 11/9/2024 12:19:24 PM 1 New message received with id 11 11/9/2024 12:19:24 PM 1 Refreshed lease for message 6 11/9/2024 12:19:24 PM 1 New message received with id 12 11/9/2024 12:19:25 PM 1 Refreshed lease for message 7 11/9/2024 12:19:25 PM 1 Refreshed lease for message 1 11/9/2024 12:19:25 PM 1 Refreshed lease for message 8 11/9/2024 12:19:25 PM 1 New message received with id 13 11/9/2024 12:19:25 PM 11 Received messages 13, success 0, failed 0, still running 13 11/9/2024 12:19:26 PM 1 Refreshed lease for message 2 11/9/2024 12:19:26 PM 1 Refreshed lease for message 9 ... 11/9/2024 12:21:15 PM 1 Refreshed lease for message 58 11/9/2024 12:21:15 PM 1 Refreshed lease for message 62 11/9/2024 12:21:15 PM 1 Finished message with id 46 in 00:00:42.5373955 11/9/2024 12:21:16 PM 1 Refreshed lease for message 49 11/9/2024 12:21:16 PM 1 Refreshed lease for message 51 11/9/2024 12:21:16 PM 1 Refreshed lease for message 53 11/9/2024 12:21:17 PM 1 New message received with id 65 11/9/2024 12:21:17 PM 1 Refreshed lease for message 55 11/9/2024 12:21:17 PM 1 Refreshed lease for message 46 11/9/2024 12:21:17 PM 1 Refreshed lease for message 57 11/9/2024 12:21:17 PM 1 Refreshed lease for message 59 11/9/2024 12:21:17 PM 11 Received messages 65, success 43, failed 3, still running 19 11/9/2024 12:21:17 PM 1 Refreshed lease for message 61 11/9/2024 12:21:17 PM 1 Refreshed lease for message 48 ... 11/9/2024 12:22:53 PM 11 Received messages 100, success 90, failed 3, still running 7 11/9/2024 12:22:53 PM 1 Refreshed lease for message 97 11/9/2024 12:22:53 PM 1 Refreshed lease for message 99 11/9/2024 12:22:54 PM 1 Refreshed lease for message 94 11/9/2024 12:22:54 PM 1 Refreshed lease for message 96 11/9/2024 12:22:54 PM 1 Finished message with id 95 in 00:00:32.1189187 11/9/2024 12:22:54 PM 1 Refreshed lease for message 98 11/9/2024 12:22:54 PM 1 Refreshed lease for message 100 11/9/2024 12:22:55 PM 1 Refreshed lease for message 95 11/9/2024 12:22:56 PM 1 Finished message with id 96 in 00:00:31.0536654 11/9/2024 12:22:56 PM 1 Refreshed lease for message 99 11/9/2024 12:22:56 PM 1 Refreshed lease for message 97 11/9/2024 12:22:56 PM 11 Received messages 100, success 92, failed 3, still running 5 11/9/2024 12:22:56 PM 1 Refreshed lease for message 96 11/9/2024 12:22:57 PM 1 Refreshed lease for message 98 11/9/2024 12:22:57 PM 1 Refreshed lease for message 100 11/9/2024 12:22:57 PM 1 Finished message with id 97 in 00:00:30.1211740 11/9/2024 12:22:58 PM 1 Refreshed lease for message 99 11/9/2024 12:22:58 PM 1 Refreshed lease for message 97 11/9/2024 12:22:58 PM 1 Finished message with id 98 in 00:00:29.1238261 11/9/2024 12:22:59 PM 1 Refreshed lease for message 98 11/9/2024 12:22:59 PM 1 Refreshed lease for message 100 11/9/2024 12:22:59 PM 11 Received messages 100, success 95, failed 3, still running 2 11/9/2024 12:22:59 PM 1 Finished message with id 99 in 00:00:28.1643467 11/9/2024 12:23:00 PM 1 Refreshed lease for message 99 11/9/2024 12:23:00 PM 1 Finished message with id 100 in 00:00:27.2119750 11/9/2024 12:23:01 PM 1 Refreshed lease for message 100 11/9/2024 12:23:02 PM 11 Received messages 100, success 97, failed 3, still running 0 |
It wasn’t that bad and we can see that we indeed ran on a single thread. First, notice that now it took nearly 4 minutes to complete. That’s understandable as we now run things on a single thread. Also, notice that each message was taking around 30-40 seconds to complete. That is much longer than before. This is because messages compete for the CPU time and we don’t have any parallelism. It’s also worth noting that we lost 3 messages. That’s not that bad. The system overscaled just a bit and couldn’t deal with the load but the stabilized and finished processing.
Let’s now increase the CPU-bound duration to 1 second and try again:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 |
11/9/2024 12:24:26 PM 1 New message received with id 1 11/9/2024 12:24:27 PM 1 New message received with id 2 11/9/2024 12:24:29 PM 12 Received messages 2, success 0, failed 0, still running 2 11/9/2024 12:24:29 PM 1 New message received with id 3 11/9/2024 12:24:29 PM 1 Refreshed lease for message 1 11/9/2024 12:24:32 PM 12 Received messages 3, success 0, failed 0, still running 3 11/9/2024 12:24:32 PM 1 Refreshed lease for message 2 11/9/2024 12:24:33 PM 1 New message received with id 4 11/9/2024 12:24:34 PM 1 Lost lease for message 3 11/9/2024 12:24:34 PM 1 Refreshed lease for message 1 11/9/2024 12:24:35 PM 12 Received messages 4, success 0, failed 1, still running 3 11/9/2024 12:24:38 PM 12 Received messages 4, success 0, failed 1, still running 3 11/9/2024 12:24:39 PM 1 Refreshed lease for message 2 11/9/2024 12:24:39 PM 1 New message received with id 5 11/9/2024 12:24:39 PM 1 Lost lease for message 4 11/9/2024 12:24:39 PM 1 Refreshed lease for message 1 11/9/2024 12:24:41 PM 12 Received messages 5, success 0, failed 2, still running 3 11/9/2024 12:24:43 PM 1 New message received with id 6 11/9/2024 12:24:44 PM 12 Received messages 6, success 0, failed 2, still running 4 11/9/2024 12:24:46 PM 1 Refreshed lease for message 5 ... 11/9/2024 12:31:48 PM 1 Lost lease for message 98 11/9/2024 12:31:51 PM 12 Received messages 99, success 7, failed 88, still running 4 11/9/2024 12:31:52 PM 1 Refreshed lease for message 97 11/9/2024 12:31:52 PM 1 Refreshed lease for message 92 11/9/2024 12:31:52 PM 1 Refreshed lease for message 86 11/9/2024 12:31:54 PM 12 Received messages 99, success 7, failed 88, still running 4 11/9/2024 12:31:54 PM 1 New message received with id 100 11/9/2024 12:31:54 PM 1 Lost lease for message 99 11/9/2024 12:31:54 PM 1 Finished message with id 86 in 00:01:04.5795457 11/9/2024 12:31:57 PM 12 Received messages 100, success 7, failed 89, still running 4 11/9/2024 12:31:57 PM 1 Refreshed lease for message 86 11/9/2024 12:31:57 PM 1 Refreshed lease for message 97 11/9/2024 12:31:58 PM 1 Refreshed lease for message 92 11/9/2024 12:31:59 PM 1 Lost lease for message 100 11/9/2024 12:32:00 PM 12 Received messages 100, success 8, failed 90, still running 2 11/9/2024 12:32:01 PM 1 Refreshed lease for message 97 11/9/2024 12:32:02 PM 1 Refreshed lease for message 92 11/9/2024 12:32:03 PM 12 Received messages 100, success 8, failed 90, still running 2 11/9/2024 12:32:04 PM 1 Refreshed lease for message 97 11/9/2024 12:32:05 PM 1 Refreshed lease for message 92 11/9/2024 12:32:06 PM 12 Received messages 100, success 8, failed 90, still running 2 11/9/2024 12:32:08 PM 1 Refreshed lease for message 97 11/9/2024 12:32:09 PM 12 Received messages 100, success 8, failed 90, still running 2 11/9/2024 12:32:09 PM 1 Refreshed lease for message 92 11/9/2024 12:32:11 PM 1 Refreshed lease for message 97 11/9/2024 12:32:12 PM 12 Received messages 100, success 8, failed 90, still running 2 11/9/2024 12:32:12 PM 1 Finished message with id 92 in 00:00:55.4921671 11/9/2024 12:32:13 PM 1 Refreshed lease for message 92 11/9/2024 12:32:14 PM 1 Refreshed lease for message 97 11/9/2024 12:32:15 PM 12 Received messages 100, success 9, failed 90, still running 1 11/9/2024 12:32:17 PM 1 Refreshed lease for message 97 11/9/2024 12:32:18 PM 12 Received messages 100, success 9, failed 90, still running 1 11/9/2024 12:32:19 PM 1 Refreshed lease for message 97 11/9/2024 12:32:21 PM 12 Received messages 100, success 9, failed 90, still running 1 11/9/2024 12:32:21 PM 1 Refreshed lease for message 97 11/9/2024 12:32:24 PM 1 Refreshed lease for message 97 11/9/2024 12:32:24 PM 12 Received messages 100, success 9, failed 90, still running 1 11/9/2024 12:32:27 PM 1 Refreshed lease for message 97 11/9/2024 12:32:27 PM 12 Received messages 100, success 9, failed 90, still running 1 11/9/2024 12:32:28 PM 1 Finished message with id 97 in 00:00:50.4062632 11/9/2024 12:32:29 PM 1 Refreshed lease for message 97 11/9/2024 12:32:30 PM 12 Received messages 100, success 10, failed 90, still running 0 |
And here things start to collapse. It took us 8 minutes to process all messages, each of them was taking around 1 minute, and we failed to process 90 out of one hundred. We lost 90% of all of the messages. Our system became unreliable just because we increased the CPU-bound part of the message processing. But why did it break the application exactly? What happened?
You don’t control the priority of continuations
Our application runs three distinct operations in total:
- Take the message from the queue
- Refresh the lease of the message
- Do some processing of the message
Every single time we await the task, we release the thread and let it do something else. Once the IO-bound operation finishes, we schedule it to run on the same thread. However, the order of continuations doesn’t reflect the importance of what we should do.
In order to keep the system stable, we need to refresh the leases. Therefore, if there is any continuation that wants to refresh the lease (the continuation in KeepLease
method), it should run before everything else.
Once we don’t have any continuations for refreshing the leases, we should run continuations for message processing. Obviously, if some KeepLease
continuation gets scheduled, it should preempt other continuations.
Finally, when we have no continuations for refreshing the leases or processing the messages, we should run the continuation for getting new message from the queue. In other words, we receive a new message only when we have some idle CPU time that we can use to process something more.
Unfortunately, the async
in C# doesn’t let you easily prioritize the continuations. However, this is not a problem most of the times because C# uses multiple threads! Once a continuation is free to run, the thread pool will grow to run the continuation earlier if possible. This is not part of the async
programming paradigm and you can’t take it for granted. However, when we run things on a single thread, then continuations have no priorities and message processing continuations may stop lease refreshing continuations from running. Even worse, we may run continuation that receives new message from the bus even though we are already overloaded.
Depending on the nature of your platform (be it C# with different synchronization context, Python with single-threaded asyncio, or JavaScript with one and only one thread), you may get different results. Your application may scale well or may fail badly.
Let’s fix it
We can fix this issue in many ways. Conceptually, we need three different queues: the first one represents the lease refreshments, the second is for message processing, and the third is for getting new message from the bus. We would then have one processor that would check each of the queues in order and execute the operations accordingly. Unfortunately, rewriting the application from async
paradigm to a consumer with multiple queues is not straightforward.
Instead, we can reorder the continuations. The trick is to introduce a priority for each continuation. We do the following:
- We store a desired priority of continuations running on a thread
- When a continuation wants to run, it checks if the desired priority is equal to the priority of the continuation
- If it’s equal, then the continuation resets the desired priority to some invalid value and continues
- Otherwise, the continuation bumps the priority if possible and lets other continuations run
The protocol is based on the following idea: some continuation sets the desired priority to be at least the priority of the continuation and then lets other continuations to run. If there are other continuations of lower priority, they will simply release the CPU and reschedule themselves. If there are continuations of some higher priority, they will bump the desired priority. And if there are no continuations, then th original continuation will finally get the CPU, run the code, and reset the priority, so other continuations can run the same dance over and over. Here is the code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
public static async Task Prioritize(int priority) { var cookie = Guid.NewGuid(); while (true) { if (CurrentSettings.DesiredPriority == priority && CurrentSettings.Cookie == cookie) { CurrentSettings.DesiredPriority = -1; CurrentSettings.Cookie = Guid.Empty; return; } else { if (CurrentSettings.DesiredPriority < priority) { CurrentSettings.DesiredPriority = priority; CurrentSettings.Cookie = cookie; await Task.Yield(); continue; } else { await Task.Yield(); continue; } } } } |
We need to run this method every single time when we run await
. For instance, KeepLease
becomes this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
public static async Task KeepLease(Message message) { await Task.Yield(); await Prioritize(3); while (message.WasFinished == false) // This is unsafe according to memory model { await Task.Delay(CurrentSettings.RefreshDelay); await Prioritize(3); if (DateTime.Now > message.LastRefreshTime + CurrentSettings.MessageLeaseTimeout) { message.WasLost = true; CurrentStats.Lost++; Log($"Lost lease for message {message.Id}"); return; } else { await Task.Delay(CurrentSettings.RefreshDuration); await Prioritize(3); Log($"Refreshed lease for message {message.Id}"); message.LastRefreshTime = DateTime.Now; } } CurrentStats.ProcessedSuccessfully++; } |
You can find the full snippet here. Let’s see it in action:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 |
11/9/2024 12:35:42 PM 1 New message received with id 1 11/9/2024 12:35:43 PM 1 New message received with id 2 11/9/2024 12:35:45 PM 12 Received messages 2, success 0, failed 0, still running 2 11/9/2024 12:35:45 PM 1 Refreshed lease for message 1 11/9/2024 12:35:46 PM 1 Refreshed lease for message 2 11/9/2024 12:35:48 PM 12 Received messages 2, success 0, failed 0, still running 2 11/9/2024 12:35:48 PM 1 Refreshed lease for message 1 11/9/2024 12:35:49 PM 1 Refreshed lease for message 2 11/9/2024 12:35:51 PM 12 Received messages 2, success 0, failed 0, still running 2 11/9/2024 12:35:51 PM 1 Refreshed lease for message 1 11/9/2024 12:35:52 PM 1 Refreshed lease for message 2 11/9/2024 12:35:53 PM 1 New message received with id 3 11/9/2024 12:35:54 PM 12 Received messages 3, success 0, failed 0, still running 3 11/9/2024 12:35:55 PM 1 Refreshed lease for message 1 11/9/2024 12:35:56 PM 1 Refreshed lease for message 3 11/9/2024 12:35:57 PM 12 Received messages 3, success 0, failed 0, still running 3 11/9/2024 12:35:58 PM 1 Refreshed lease for message 1 11/9/2024 12:35:58 PM 1 Refreshed lease for message 2 11/9/2024 12:35:59 PM 1 Refreshed lease for message 3 11/9/2024 12:36:00 PM 12 Received messages 3, success 0, failed 0, still running 3 11/9/2024 12:36:01 PM 1 Refreshed lease for message 1 11/9/2024 12:36:02 PM 1 Refreshed lease for message 3 11/9/2024 12:36:03 PM 12 Received messages 3, success 0, failed 0, still running 3 ... 11/9/2024 1:09:00 PM 1 Refreshed lease for message 100 11/9/2024 1:09:00 PM 1 Refreshed lease for message 98 11/9/2024 1:09:02 PM 12 Received messages 100, success 97, failed 0, still running 3 11/9/2024 1:09:03 PM 1 Refreshed lease for message 99 11/9/2024 1:09:03 PM 1 Refreshed lease for message 100 11/9/2024 1:09:04 PM 1 Refreshed lease for message 98 11/9/2024 1:09:05 PM 1 Finished message with id 98 in 00:00:46.5274062 11/9/2024 1:09:05 PM 12 Received messages 100, success 97, failed 0, still running 3 11/9/2024 1:09:06 PM 1 Refreshed lease for message 100 11/9/2024 1:09:06 PM 1 Refreshed lease for message 99 11/9/2024 1:09:07 PM 1 Refreshed lease for message 98 11/9/2024 1:09:08 PM 12 Received messages 100, success 98, failed 0, still running 2 11/9/2024 1:09:09 PM 1 Refreshed lease for message 100 11/9/2024 1:09:09 PM 1 Refreshed lease for message 99 11/9/2024 1:09:11 PM 12 Received messages 100, success 98, failed 0, still running 2 11/9/2024 1:09:12 PM 1 Refreshed lease for message 99 11/9/2024 1:09:12 PM 1 Refreshed lease for message 100 11/9/2024 1:09:14 PM 12 Received messages 100, success 98, failed 0, still running 2 11/9/2024 1:09:15 PM 1 Refreshed lease for message 99 11/9/2024 1:09:15 PM 1 Refreshed lease for message 100 11/9/2024 1:09:17 PM 1 Finished message with id 99 in 00:00:51.5793525 11/9/2024 1:09:17 PM 1 Refreshed lease for message 99 11/9/2024 1:09:17 PM 1 Refreshed lease for message 100 11/9/2024 1:09:17 PM 12 Received messages 100, success 99, failed 0, still running 1 11/9/2024 1:09:19 PM 1 Refreshed lease for message 100 11/9/2024 1:09:20 PM 12 Received messages 100, success 99, failed 0, still running 1 11/9/2024 1:09:22 PM 1 Refreshed lease for message 100 11/9/2024 1:09:23 PM 12 Received messages 100, success 99, failed 0, still running 1 11/9/2024 1:09:25 PM 1 Refreshed lease for message 100 11/9/2024 1:09:26 PM 12 Received messages 100, success 99, failed 0, still running 1 11/9/2024 1:09:27 PM 1 Refreshed lease for message 100 11/9/2024 1:09:29 PM 12 Received messages 100, success 99, failed 0, still running 1 11/9/2024 1:09:29 PM 1 Refreshed lease for message 100 11/9/2024 1:09:30 PM 1 Finished message with id 100 in 00:00:54.5227973 11/9/2024 1:09:32 PM 1 Refreshed lease for message 100 11/9/2024 1:09:32 PM 12 Received messages 100, success 100, failed 0, still running 0 |
We can see that the code runs much slower and it takes 35 minutes to complete. However, all messages are processed successfully and the code scales automatically. We don’t need to manually control the thread pool size, but the application simply processes fewer or more messages depending on the actual CPU-bound processing time.
Summary
async
programming is very hard. We were told many times that it’s as simple as putting async
here and await
there. C# did a lot to get rid of deadlocks as nearly every platform now uses no synchronization context (as compared with old ASP.NET which had its own context or all the desktop apps that were running with a single thread). Also, C# uses a thread pool and can fix many programmer’s mistakes that can limit the scalability.
However, asynchronous programming can be implemented in many other ways. You can’t assume that it will use many threads or that the coroutine will be triggered immediately. Many quirks can decrease the performance. Python’s asyncio is a great example of how asynchronous programming can work much differently, especially if you take the Python’s performance into consideration. What’s IO-bound in C#, can easily become CPU-bound in Python because Python is way slower.