Bit Twiddling Part 5 — Fixing audio latency in mstsc.exe (RDP)

This is the fifth part of the Bit Twiddling series. For your convenience you can find other parts in the table of contents in Par 1 — Modifying Android application on a binary level

Today we’re going to fix the audio latency in mstsc.exe. Something that people really ask about on the Internet and there is no definite solution. I’ll show how to hack mstsc.exe to fix the latency. First, I’ll explain why existing solutions do not work and what needs to be done, and at the end of this post I provide an automated PowerShell script that does the magic.

What is the issue

First, a disclaimer. I’ll describe what I suspect is happening. I’m not an expert of the RDP and I didn’t see the source code of mstsc.exe. I’m just guessing what happens based on what I see.

RDP supports multiple channels. There is a separate channel for video and another one for audio. Unfortunately, these two channels are not synchronized which means that they are synchronized on the client on a best effort basis (or rather sheer luck). To understand why it’s hard to synchronize two independent streams, we need to understand how much different they are.

Video is something that doesn’t need to be presented “for some time”. We can simply take the last video frame and show it to the user. If frames are buffered, we just process them as fast as possible to present the last state. You can see that mstsc.exe does exactly that by connecting to a remote host, playing some video, and then suspending the mstsc.exe process with ProcessExplorer. When you resume it after few seconds, you’ll see the video moving much faster until the client catches up with the latest state.

When it comes to audio, things are much different. You can’t just jump to the latest audio because you’d loose the content as it would be unintelligible. Each audio piece has its desired length. When you get delayed, you could just skip the packets (and lose some audio), play it faster (which would make it less intelligible for some time), or just play it as it goes. However, to decide what to do, you would need to understand whether the piece you want to play is delayed or not. You can’t tell that without timestamps or time markers, and I believe RDP doesn’t send those.

As long as you’re getting audio packets “on time”, there is no issue. You just play them. The first part is if you get them “in time”. This depends on your network quality and on the server that sends you the sound. From my experience, Windows Server is better when it comes to speed of sending updates. I can see my mouse moving faster and audio delayed less often when I connect to the Windows Server than Windows 10 (or other client edition). Therefore, use Windows Server where possible. Just keep in mind that you’ll need CAL licenses to send microphone and camera to the server which is a bummer (client edition lets you do that for free). Making the packets to be sent as fast as possible is the problem number one.

However, there is another part to that. If your client gets delayed for whatever reason (CPU spike, overload, or preemption), your sound will effectively slow down. You will just play it “later” even though you received it “on time”. Unfortunately, RDP cannot detect whether this happened because there are no timestamps in the stream. As far as I can tell, this is the true reason why your sound is often delayed. You can get “no latency audio” over the Internet and I had it many times. However, the longer you run the client, the higher the chance that you’ll go out of sync. This is the problem number two.

Therefore, we need to “resync” the client. Let’s see how to do it.

Why existing solutions don’t work and what fix we need

First, let me explain why existing solutions won’t work. Many articles on the Internet tell you to change the audio quality to High in Group Policy and enforce the quality in your rdp.file by setting audioqualitymode:i:2. This fixes the problem number one (although I don’t see much difference to be honest), but it doesn’t address the problem number two.

Some other articles suggest other fixes on the remote side. All these fixes have one thing in common – they don’t fix the client. If the mstsc.exe client cannot catch up when it gets delayed, then the only thing you can do is to reset the audio stream. Actually, this is how you can fix the delay easily – just restart the audio service:

I add the timeout to give some time to clear the buffer on the client. Once the buffer is empty, we restart the service and then the audio should be in sync. Try this to verify if it’s physically possible to deliver the audio “on time” in your networking conditions.

Unfortunately, restarting the audio service have many issues. First, it resets the devices on the remote end, so your audio streams may break and you’ll need to restart them. Second, this simply takes time. You probably don’t want to lose a couple of seconds of audio and microphone when you’re presenting (and unfortunately, you’ll get delayed exactly during that time).

What we need is to fix the client to catch up when there is a delay. However, how can we do that when we don’t have any timestamps? Well, the solution is simple – just drop some audio packages (like 10%) periodically to sync over time. This will decrease the audio quality to some extent and won’t fix the delay immediately, but after few seconds we’ll get back on track. Obviously, you could implement some better heuristics and solutions. There is one problem, though – we need to do that in the mstsc.exe itself. And here comes the low level magic. Let’s implement the solution that drops the audio frames to effectively “resync” the audio.

How to identify

We need to figure out how the sound is played. Let’s take ApiMonitor to trace the application. Luckily enough, it seems that the waveOutPrepareHeader is used, as we can see in the screenshot:

Let’s break there win WinDBG:

We can see a method named mstscax!CRdpWinAudioWaveoutPlayback::vcwaveOutWrite. Let’s see it:

Okay, we can see that this method passes the audio packet to the API. When we look at WAVEHDR structure, we can see that it has the following fields:

This is exactly what we see in ApiMonitor. Seems like the dwBufferLength is what we might want to change. When we shorten this buffer, we’ll effectively make the audio last shorter. We can do that for some of the packets to not break the quality much, and then all should be good.

We can verify that this works with this breakpoint:

Unfortunately, this makes the client terribly slow. We need to patch the code in place. Effectively, we need to inject a shellcode.

First, we need to allocate some meory with VirtualAllocEx via .dvalloc.

The debugger allocates the memory. In my case the address is 25fb8960000.

The address of the WinAPI function is in the memory, so we need to remember to extract the pointer from the address:

Now we need to do two things: first, we need to patch the call site to call our shellcode instead of [mstscax!_imp_waveOutPrepareHeader (00007ffb89f6d900)]. Second, we need to construct a shell code that fixes the audio packet for some of the packets, and then calls [mstscax!_imp_waveOutPrepareHeader (00007ffb89f6d900)] correctly.

To do the first thing, we can do the absolute jump trick. We put the address in the rax register, push it on the stack, and then return. This is the code:

We can compile the code with Online assembler and we should get a shell code. We can then put it in place with this line:

Unfortunately, this patch is long. We need to break few lines and then restore them in the shellcode. So our shell code starts with the lines that we broke:

Next, we need to preserve our working registers:

Okay, now we can do the logic. We want to modify the buffer length. However, we don’t want to do it for all of the packets. We need some source of randomness, like time, random values, or something else. Fortunately, the WAVEHDR structure has the field dwUser which may be “random enough” for our needs. Let’s take that value modulo some constant, and then change the packet length only for some cases.

First, let’s preserve the buffer length for the sake of what we do later:

Now, let’s load dwUser and divide it by some constant like 23:

Now, we can restore the buffer length to rax:

At this point we have rax with the buffer length, and rdx with the remainder. We can now compare the reminder and skip the code modifying the pucket length if needed:

We can see that we avoid the buffer length modification if the remainder is at most 20. Effectively, we have 20/22 = 0.909% chance that we won’t modify the package. This means that we modify something like 9% of the packages, assuming the dwUser has a good distribution. The code is written in this way so you can tune the odds of changing the packet.

Now, let’s modify the package. We want to divide the buffer length by 2, however, we want to keep it generic to be able to experiment with other values:

You can play with other values, obviously. From my experiments, halving the value works the best.

Now it’s rather easy. rax has the new buffer length or the original one if we decided not to modify it. Let’s restore other registers:

Let’s update the buffer length:

Now, we need to prepare the jump addresses. First, we want to put the original return address of the method mstscax!CRdpWinAudioWaveoutPlayback::vcwaveOutWrite which is mstscax!CRdpWinAudioWaveoutPlayback::vcwaveOutWrite+0x6d:

Now, we can jump to the WinAPI method:

That’s it. The final shellcode looks like this:

We can implant it with this:

Automated fix

We can now fix the code automatically. We need to do the following:

  • Start mstsc.exe
  • Find it’s process ID
  • Attach the debugger and find all the addresses: free memory, mstsc.exe method, WinAPI method
  • Construct the shellcode
  • Attach the debugger and patch the code
  • Detach the debugger

We can do all of that with PowerShell. Here is the code:

We compile some C# code on the fly. First, we find existing mstsc.exe instances (line 16), then run the new instance (line 17), wait a bit for the mstsc.exe to spawn a child process, and then find the id (lines 19-20). We can then patch the existing id.

First, we look for addresses. We do all the manual steps we did above to find the memory address, and two function addresses. The script is in lines 27-32. Notice that I load symbols as we need them and CDB may not have them configured on the box.

We can now parse the output. We first extract the allocated memory in lines 35-36.

Next, we look for the call site. We dump the whole method, and then find the first occurrence of mov 8d,30h. That’s our call site. This is in lines 38-41.

Next, we calculate the return address which is 16 bytes further. This is in lines 42-43.

Finally, I calculate the WinAPI method address. I extract the location of the pointer for the method (lines 45-47).

Next, we need to construct the shell code. This is exactly what we did above. We just need to format addresses properly (this is in helper methods in lines 49-50), and then build the script (lines 52-558). We can run it and that’s it. The last thing is customization of the values. You can see in line 108 that I made two parameters to change numerator and denominator for the odds of modifying the package. This way you can easily change how many packets are broken. The more you break, the faster you resynchronize, however, the worse the sound is.

That’s it. I verified that on Windows 10 22H2 x64, Windows 11 22H2 x64, Windows Server 2016 1607 x64, and Windows Server 2019 1809 x64, and it worked well. Your mileage may vary, however, the approach should work anywhere. Generally, to make this script work somewhere else, you just need to adjust how we find the call site, the return address, and the address of the WinAPI function. Assuming that the WinAPI is still called via the pointer stored in memory, then you won’t need to touch the machine code payload.

Below is the script for x86 bit (worked on Windows 10 10240 x86). Main differences are in how we access the data structure as the pointer is on the stack (and not in the register).

Some keyword to make this easier to find on the Internet

Her some keywords to make this article easier to find on the Internet.

audio latency in mstsc.exe
audio latency in rdp
audio delay in mstsc.exe
audio delay in rdp
laggy sound in rdp
sound desynchronized in rdp
sound latency in rdp
slow audio
how to fix audio in rdp

Enjoy!