chatterbox – Random IT Utensils

Chatterbox Part 15 — Make Messenger call you

afish — Sat, 06 Aug 2022 08:00:32 +0000

This is the fifteenth part of the Chatterbox series. For your convenience you can find other parts in the table of contents in Part 1 – Origins

We know how to initiate audio conferences with other parties as we covered that in Chatterbox Part 9 — Calling Facebook using GSM. Now we want to do the opposite — we want Messenger (or any other IM) to call our phone on an incoming audio call.

It’s actually very simple, once we have the previous solution. This time we need to use the bridge mechanism to call us, for instance Zoom “invite by phone” or Chime “call me”. Other bridges should support it as well. You need to set up those bridges and pick up your phone. Some of them may want you to accept the call by pressing a key.

Another idea is to use Google Voice and do a conference call.

Setting up a bridge may take around a minute (to open browsers, route voice lines, dial you in etc). To make it faster, you can keep the bridge online 24/7, and only call your phone once you accept the incoming call.

Keep in mind voice calls are insecure! They can be captured and overheard, so don’t use them for a confidential communication.

Chatterbox Part 9 — Calling Facebook using GSM

afish — Sat, 28 Aug 2021 08:00:25 +0000

This is the ninth part of the Chatterbox series. For your convenience you can find other parts in the table of contents in Part 1 – Origins

Until now we focused on how to integrate text communication. We can share some media the same way, namely via providing links to images or videos. We can even embed them in some webview to simplify previewing. However, to integrate interactive calls we need to do a little more.

How do we call from Facebook to Skype? Or from Whatsapp to Google Hangouts? We can provide a server reencoding all media streams but that would probably require a lot of coding and protocol reversing. Adding mobile calls (over GSM provider, not over Internet) would increase the complexity even more. However, we can actually do this much easier, by capturing screen + voice + webcam.

First, we need to be able to “dial in” from mobile phone to the call. To do that we can use some existing services, e.g. Chime or Zoom. So we create a bridge BridgeA, we dial in from our mobile phone and we’re done.

Next, we need to have some server which would route things between networks. We open two browsers: BrowserA which dials into BridgeA, and BrowserB which calls someone on Facebook, Hangouts, Meet, whatever else.

Now, we need to have two fake audio lines. On Windows you can use Virtual Audio Cable and VB-Cable which are free and can do the trick.

We also need to be able to route an application to selected audio line. Sometimes it can be selected in the application, sometimes it cannot. For the latter case you can use Audio Router.

Now, we configure BrowserA to emit output to Line1 and get input from Line2. Analogically, we configure BrowserB to emit output to Line2 and get input from Line1. Effectively, anything going out of BrowserA will go into BrowserB using Line1. In the same way everything going out of BrowserB will enter BrowserA using Line2.

That’s it. Now you can route voice between two bridges. Obviously, you can mix audio in any way to create multi-way bridges if needed. Nothing can stop you now.

Okay, what about video? For that we can use BridgeA the same way, just dial in using computer/mobile phone with video. Next, you need to configure virtual webcam source. You can use Chrome extension or OBS Virtual Cam. Just capture screen from BrowserA into virtual webcam used by BrowserB, and the other way round. Depending on the capture quality and configuration you can mimic multiple screens etc.

What about the delay? From my experience it is close to 1 second. This is mediocre but I believe people are now used to the latency so it should be fine, at least it’s okay for my purposes.

Chatterbox Part 8 — Integrations

afish — Sat, 05 Dec 2020 09:00:40 +0000

This is the eighth part of the Chatterbox series. For your convenience you can find other parts in the table of contents in Part 1 – Origins

When working with software we typically need to integrate with other components — whether some external software like database, or some libraries we incorporate and use in our code base. Below are couple of stories of weird integration issues I had to solve.

Exceptions

Threading is hard. It’s even harder when we start integrating components.

I was using some Java library to handle protocol. I was running it in .NET process via IKVM. There is one big difference between unhandled exception behavior in Java and in C# — in Java they don’t kill the process (only the thread), in C# they take whole process down. So what happens if you take java code with no try/catch on the main thread function and you get an exception? Whole process dies.

How do you fix that? Either you hack or just move code to external process. The former works but is risky, the latter makes your infrastructure more complex (on the other hand you should have watchdog and multiple nodes anyway).

Deadlocks

Similar situation with external library not handling threading correctly. It was deadlocking when sending a message. It was running in external process so it wasn’t blocking whole system but detecting the lock wasn’t easy. How do you recognize if a thread is working hard or is waiting indefinitely? Your ping thread won’t help (because ping works) so you need to go with other checks in place. It makes things much messier — you need to check if the action trigger (like a queued message) is processed in a given time. You could go with Wait Chain Traversal but detecting deadlocks is not simple. Not to mention that it doesn’t need to be “deadlock” technically but lost message etc.

Indefinite waiting

Never wait indefinitely. Never. It’s a recipe for a failure. This is pretty clear when you take locks explicitly but what happens if you use await? Having timeouts in place is harder but you still need to have them. Otherwise you end up with a bug when puppeteer doesn’t open new page and your await never finishes. You either fix it in the source or add more checks and timeouts around.

Memory failures

Whenever you incorporate external library into your process, you need to take care of segfaults and memory errors. While they are rare in managed code, they still happen. What do you do when your process segfaults? You need to restart it but you also need to make sure it doesn’t happen again (so you take memory dump or logs). Always run external code sandboxed and in an isolation, you just can’t let your code to fail because some other library is buggy.

Time handling

Time management is hard. There are time zones, there are leap seconds, so many other things. And what happens if other library handles time differently?

Whenever you handle time always be consistent. Going with UTC is not a silver bullet but if your whole system does that then translate from local to UTC as early as possible. Never adhere to other library conventions because it’ll make your code messy.

Persistence

You can never loose user data, especially in stateful situations which do not retry. Persist data as soon as you get it (either entered by the user or via some callback from library). Also, make sure you have audit mechanisms in place, retries, deadlettering and other solutions to get some insight on the system performance.

Metrics

Always have some metrics in place. You’re not following your logs closely (especially when you travel) but you need to have some mechanism to notify about failures happening too often. Whether it’s p99 performance metric or just a token bucket for too many exceptions in 15 minutes — keep it in place and get some notification. There will be some false positives, there will be some false alarms, but it’s better so know something doesn’t work than to be disappointed. Especially when you travel and you just can’t log remotely to see logs or make sure things work. Once you start using system “for real”, you need to make sure the system lets you know something is wrong.

Logging

Keep your logs clean and tidy. Log enough but don’t log too much. This includes both logging request contents but also not logging every single line. Just make sure you log all side effects so you can reproduce them from logs.

Also, make sure you log important contextual things. Timestamp, thread id, process id, binary name, request context, these things are very important. You may need to trace bugs using memory dumps, things are much harder there and you’ll need as much details as possible. Similar thing for resource leasing or even mutex locking. If you log that “mutex was abandoned” then it’s helpful but how do you know which component held it? You need to log uniformly and as much as possible. See logging in distributed system.

Disk space

If you log and take memory dumps, make sure you cleanup periodically. System may not fail “clearly” when you run out of disk storage but it won’t work and you’ll get weird exceptions. Just schedule your cron jobs. Make sure you archive artifacts so you don’t lose them.

Documentation

Finally, document decisions. You won’t remember why things work they way they do. Same goes for features, you may implement something and then forget it’s there or how to use it. Just keep readme up to date, it is helpful. Especially if you travel and cannot log in to see the code.

Never deploy on Friday

People go with CD and it’s cool but never deploy in risky times. If you leave for the weekend — do not deploy on Friday evening. You may have rollbacks in place and capture all issues right away but you don’t want to log in remotely from some airport (been there done that). This “super cool feature I need today” really can be implemented when you get back. It’s cool you have a feature but it needs to be reliable as well.

Trust and check

Whenever you do action with side effects, make sure it succeeded. Checking HTTP code may not be enough, you don’t actually know if external system processes your action correctly. Check somehow, if you send message to your friend then ask server later on if it was delivered. Have another instance of your client and see if it gets ping from the server. Download message history periodically and make sure yours are there. And if it’s not — just let user know. It’s bad when system tells you “something failed and I don’t know what” but it’s even worse when it fails quiety.

Don’t exit deep inside your code

Never use System.exit in a library. Don’t exit deep inside your code. And never use System.exit directly, go with your thin wrapper like ProcessHelper.exit. Log when it’s called from (don’t forget to log thread etc) because one day your system may be “failing” just because it exits in a place where you didn’t want to do it.

Thread names

Name your threads. This gives you one crucial piece of information — you not only know what the thread does (by examining memory dump etc) but also what it is supposed to do. This is especially important with asynchronous code which can migrate between threads, maybe your callstack is okay but is not on the right thread.

Deduplication

Always assing unique id to events if possible. If you cannot do it easily (it’s not provided by the protocol), then derive it from some hash code function (based on sender, content, timestamp etc). You’ll be able to remove duplicates.

Deadletters because of deadletters

What happens if you have a deadletter? You probably want to notify yourself about it. But what happens if you fail when notifying about the deadletter? If you generate another one then your system can easily collapse. You get one deadletter, it then causes another two deadletters, couple minutes later you have thousands messages which you cannot process but they consume resources. Make sure you can cut this circle somehow, or at least delay so the system survives the load.

Generate new messages based on context

Whenever one message causes another one (like you send email and then you get email sent message) always have common code in place to maintain the continuity. This could be as simple as copying one correlation id field but could be much more sophisticated (storing context, deadletters, senders, ids etc). Also, one day you’ll need to copy new property between messages, you don’t want to go through your whole codebase and update every single place where you use constructor.

Have sanity checks in place

We already know waiting indefinitely is a bad idea. What about other things? If you send one message per second is it okay? Ten of them? One hundred? Sure, you’ll come up with some ridiculous limits which should never be met. Put alarms on them so you know when your system sends thousand messages in 5 minutes. It may not save you from spamming someone else but at least will let you stop it early.

Formats

You’ll need to change message schema, encryption keys, content format. Always make sure these are versioned and you can maintain compatibility.

Locale

Whenever you push a text to a storage, always control the format. Don’t rely on “default locale” or “the current format”, enforce it manually so it doesn’t break when you migrate the software to different machine/country/continent.

Chatterbox Part 7 — File writing

afish — Sat, 28 Nov 2020 09:00:24 +0000

This is the seventh part of the Chatterbox series. For your convenience you can find other parts in the table of contents in Part 1 – Origins

Sometimes you may decide to go with simple file instead of a full blown SQL/NoSQL database. However, writing a file is not as simple as it sounds. APIs are not atomic, writing a file takes time (especially if it’s more than just couple of bytes). What happens if your machine gets unplugged? Your process dies? Exception is thrown? File gets corrupted and things go wrong. That’s not a problem when you write new file but it is a big issue when you overwrite an already existing file (like when reading data, recalculating in app, and writing back to “the same” file). And you may think that writing on the side and then replacing existing file is a solution but as this SO discussion shows it’s not trivial (especially when you don’t know underlying file system or even if it’s a physical drive or some sort of S3 emulation). How do we protect from data corruption?

There are couple ideas how to solve this. I assume there is just one user of the file, if you have many of them then you’ll need to extend protocols with some system-wide lock or something similar.

Two files and pointer

If your filesystem supports atomic deletion of the file, you can use this protocol:

Assumptions:

There are two files with data: A and B
There may be a pointer file P with whatever content (ideally empty)

Reading:

Check if P extists
If it does — read and return A
Otherwise read and return B

Writing:

Check if P exists
If it does — write B and delete P
Otherwise write A and create empty P

How does it work? P tells you if you should go with file A or B. If P exists then file A contains latest data, otherwise it’s file B.
Now, what if you’re writing to file and it fails? You’re writing to backup file so it doesn’t matter, your main file is still intact. Since P modification is atomic (create or delete) then it’s safe.

What if you cannot go with pointer (or you don’t want to)?

Two files and more writes and structured data

This solution requires you to either use structured data (like JSON) which allows you to tell whether the file is correct or to write checksum next to the file content.

Assumptions:

There are two files with data: A and B
Content is either structured or there is a checksum before the actual content

Reading:

Read A and check if it’s correct (via structure or checksum)
If it is — write A to B. Return A.
Otherwise — write B to A. Return B

Writing:

Write to A.
Write to B.

How does it work? Let’s say that writing to A fails. This means that when reading A you’ll discover it’s broken so you’ll restore A from replica B which is a rollback.
If writing to A succeeds but writing to B fails, you’ll read A correctly and overwrite B (roll forward).

This algorithm works but requires more writes and some special structure. What if you have to store unstructured text and cannot determine it’s broken just by reading its content?

Three files and even more writes

Assumptions:

There are three files with data: A, B, and C

Reading:

Read A, read B, compare
If they are the same — write A to C. Return A.
Otherwise write C to A, write C to B, return C.

Writing:

Write A. Write B. Write C.

If write to A fails then reader will detect data mismatch by comparing A and B, and then perform data restore by overwriting A and B.
If write to B fails then reader will detect data mismatch the same way and will restore A and B from C.
If write to C fails then reader will see A and B are equal. In that case reader will overwrite C using A.

Obviously, there are multiple writes here and higher data consumption.

Keep in mind that if you read from C (because A and B are not the same) then streaming file C and updating A+B cannot be done straightforwardly. Let’s say that you do something like:

while(C){
  string line = read(C);
  write(A, line);
  write(B, line);
  yield return line;
}

What if something happens after writing to B but before whole C file is processed? You’ll end up with A and B being the same but with wrong content. This can be somehow detected by checking file sizes or you can do:

while(C){
  string line = read(C);
  write(A, line);
  yield return line;
  write(B, line);
}

This may result in some really hard to track edge cases (when B happen to have matching prefix and you start overwriting A and then A and B are the same but they are just prefix of C). In practice, you may take even more crazy approach:

Reading:

Read A, read B, compare
If they are the same — write A to C. Return A.
Otherwise take first character of C and write something different to B. Write C to A. Write C to B. Return C.

Writing:

Take first character of new A and old A, write something different to B
Write A.
Write B.
Write C.

Why do we need this? Let’s say we don’t and your files are A = 123, B = 123, C = 123. You now want to write 14 as new data. You come and write A = 14 and then fail when writing B after flushing first character.
So when reading you realize A = 14 is not equal to B = 1. So you go and start restoring C = 123 to A and B. You write first character to A and then fail.
Next reader comes and reads A = 1 and B = 1 so it looks okay. But C = 123!
You also cannot use file sizes to figure out if full write was successful (because you don’t know if C is still a safe backup or broken new version).

Extended version will fail if you accidentally truncate files (which is a typical behavior). Let’s say writing to A failed early and left A empty. You then go, read first character of C and want to write something else to B. You fail but leave B being empty, so now A and B are equal.

You may think that you can compare file timestamps and you don’t need three files, only two. Just compare their content, if they differ then check if A was written after B — if it was then B is the correct content (backup), otherwise A is the correct content (as you failed when writing B). Keep in mind not all file systems provide sufficient precision, also you generally cannot trust timestamps as time may go backwards etc.

Also, you cannot just read A, B, and C, and compare all of them because you don’t know if C was written correctly (you cannot determine which file is the safe backup).

So it may be that your full protocol needs to be:

Reading:

Read A, read B, compare
If they are the same — write A to C. Return A.
Otherwise you need to write some garbage to A and B, but if any of these is empty then you need to start with it first. Write C to A. Write C to B. Return C.

Writing:

Take first character of new A and old A, write something different to B
Write A.
Write B.
Write C.

Chatterbox Part 6 — Protocols

afish — Sat, 21 Nov 2020 09:00:17 +0000

This is the sixth part of the Chatterbox series. For your convenience you can find other parts in the table of contents in Part 1 – Origins

We covered a lot about general system design and features. Now it’s time to talk about the actual core of the system for which it was created — handling various IM protocols.

It was all supposed to be working as an XMPP protocol with gateways (transports) to other networks. And my current recommendation is — do not use these transports. They don’t support may network features (like group chats, media etc), they are generally hard to maintain (you need to have separate server next to the regular XMPP one), and they use a lot of resources. Thankfully, there are other approaches:

Native support in your beloved language — if there is a library for doing that, you’re all set!
Native support in some other language which you can run on your platform — for instance Java library used in .NET via IKVM. However, this may be failing in edge conditions due to some differences (like thread dying on unhandled exception etc).
Native support in some other language — just write some simple infrastructure pushing messages over named pipes to similar component written in your beloved language and effectively you end up with same solution

Unfortunately, some protocols cannot be handled like this. It’s great when you have library maintaining stateful connection but for some protocols it won’t work. You can try:

Polling — depending on the API, you may be able to poll messages via REST API or something similar. This obviously needs to be done carefully because of throttling, permissions, banning etc.
CLI — there are command line tools which you can just run each minute to receive messages. It may be harder to send/receive media, there may be issues with encoding and national characters.
Transports via other means — XMPP, Bitlbee, Matrix, IRC, some other solutions allowing you to connect indirectly. They probably won’t support all protocol features.
Web scraping — this may not be actually that bad

There are multiple things to keep in mind when maintaining a protocol. First, they are stateful — a connection must be there and it must be maintained. Sometimes it is very hard to say if it’s still alive (because of bugs in libraries etc). Hence, implementing a watchdog is tricky as just a dumb ping won’t work. You need to do something meaningful for the protocol and see if it worked — like change your status from Online to Away and back, send message to someone/something etc.

Also, restoring connection may be tricky. Some messages may get lost, some status updates may be received so it’s easy to lose order of events. Also, if there is a hiccup in the connection then you may actually miss some notifications. It’s worth asking for messages history periodically.

You should also check if your side effects are really there. If you send a message to someone else, just ask your server for a message history and see if it’s there. Sometimes it’s easier to have effectively two connections with the same account to immediately see if the message you send on connection A is received on connection B.

Also, reuse server identifiers as much as possible. If server sends you an ID with the message, store it to remove duplicates easily.

Web scraping

While it sounds terrible, there are actually solutions to do it effectively.

First, you need good tools. I’m going with Puppeteer as it has nice API and pages generally work. It also allows you to install extensions, click with mouse and type with keyboard the “regular” way (not JS).

Second, some pages use some decent API under the hood. You may just observe network traffic (directly or with some extensions) and dump it from the browser. This is the best way as you’ll have full metadata for messages (timestamps, senders, ids etc).

Some pages keep a lot of data in their view model. Depending on the library, it may be super easy to extract (like with Knockout.js) or moderately hard (like with React). However, if metadata is there, you can easily parse it (from JSON etc).

Unfortunately, some pages are terrible and you literally have to scrape the DOM. You’ll definitely hit issues with timestamps (especially that for some protocols they change when you refresh the page), message identities (is this brand new “OK” message or something we’ve already scraped), content, media etc. Not to mention that DOM may change pretty often and your scripts will fail.

When it comes to replying via web, sometimes it can be done entirely in JS, sometimes you need to go with keyboard typing. Puppeteer handles that easily.

If you control the browser window size etc, instead of interacting with DOM via query selectors you may go with clicking on well known pixels by their coordinates. This may break after UI changes, obviously.

Some platforms have multiple interfaces with different implementations. For instance Skype is embedded in Outlook UI, it may be easier to find other implementation if it exposes some metadata via different library or DOM.

Chatterbox Part 5 — Self healing

afish — Sat, 14 Nov 2020 09:00:36 +0000

This is the fifth part of the Chatterbox series. For your convenience you can find other parts in the table of contents in Part 1 – Origins

One of the most important parts for reliable systems is their ability to heal themselves. While it may sound easy, there are many things to consider here. Let’s say that you want to monitor if your process (or subprocess) died. But then:

How do you know it died? Maybe it’s just slow? Maybe it’s waiting for mutex? Maybe it’s working hard on some loong (infinite?) loop? Super simple — just add ping!
How do you handle ping? If you run it on a separate thread then how do you know if it isn’t the only running thread in your app? You need to maintain daemon threads properly
What if ping is slow? Maybe you should just give it a couple of retries before killing?
How do you call ping? File? Socket? Named pipe?

Okay, let’s say that we think process isn’t responding. Let’s kill it. How?

Stopping it “the right way” probably won’t work because it is not responding
But if we kill it then it won’t release the resources
What if it’s being debugged? We won’t just kill it because operating system will stop us
What if it is reporting errors (WER)? We can’t kill it either
What if it’s some zombie process and it cannot be killed at all?
What if it’s running remotely and we lost the connection?

Okay, let’s say we killed it. Let’s now restart:

What if resources are still locked?
What if it had mutex? We can take ownership but how do we know if data is correct?
What if the process dies deterministically because of some poison message? How many times do we restart it? Do we do exponential backoff? Something else?
What if our watchdog dies and there is nothing to restart the process?

And so on…

It’s not easy to implement proper watchdog but you need to realize one thing — your process WILL die. Sooner or later. Your machine will restart as well. You cannot catch all exceptions, you cannot handle all issues, sometimes you just need to restart.

My solution currently works like this:

Watchdog observers processes via named pipe
Each process has deep ping which does something meaningful (we’ll cover that in next part)
If ping failed 3 times, watchdog restarts the process
Watchdog can take memory dump to simplify debugging later on
Processes observe watchdog and kill themselves if watchdog dies
Instead of using system mutexes I’m using my custom ones to track ownership (PID and TID)
Each mutex is locked with timeout (this is crucial, never wait indefinitely)
If process A detects that B holds mutex for too long, it takes B’s memory dump and kills it to retake the lock
I had to override a lot of system settings for WER and others, to not end up with zombie processes which cannot be killed at all

It works. While I never can say that it is bulletproof, I haven’t seen issues for months now and it survived many severe conditions (CPU 100% consumed for hours, no memory, no disk space etc).

Chatterbox Part 4 — Other channels

afish — Sat, 07 Nov 2020 09:00:11 +0000

This is the fourth part of the Chatterbox series. For your convenience you can find other parts in the table of contents in Part 1 – Origins

Apart from web, desktop, and mobile you may want to have other channels.

Texts

Why

Why would you need text messages for your service at all? Can’t you just turn on mobile data? Well, there are many reasons why texts are better:

Mobile data doesn’t work everywhere — imagine hiking or just going to the woods
Mobile data may be expensive — what if you’re travelling internationally?
It consumes battery much faster
You may just want to turn it off

While it may sound surprising, I’m actually using texts a lot to communicate. That’s partly because I travel often but also because I just don’t like having mobile data turned on all the time.

How to send

Now the question is, how to reliably send text messages. Let me start with this one statement — texts do get lost. Often.

If you’re just looking for “notification” text, not necessarily text where you must control the content, you can go with one time passwords from services like LinkedIn, GitHub etc. Just emulate logging in and the service will send you OTP as text. You won’t know what’s exactly happening, but you’ll have the notification (so you can turn your mobile data on).

If you need to send messages with controlled content, Google Voice sounds like a good option. You can send text via email. Similarly, multiple service providers offer email to text feature, you just mail some specific address and a text is delivered.

There are also some other platforms for mass texting etc. They typically charge for messages.

What to send

We mentioned encryption last time. You probably want to go with Base64 (or even better with Base58) for encrypted content. Keep in mind some countries may ban texting “encoded” messages.

Also, Base64 will make your messages much longer (regular text is allowed to be 160 chars long). While some platforms let you send one big message which will be assembled back on the mobile phone, I find it unreliable. I’ve seen to many broken messages due to that. Just go with assembling on your side, split messages into chunks and send them separately. This will probably lead to some form of packetizing with message numbering etc.

Keep in mind some characters may get replaced over the air. For instance, one number I was texting to was getting § (paragraph sign) instead of _ (underscore). On some other network I was receiving spaces instead of underscores. Go with Base58, this will give you four additional letters/numbers characters which you can use to encode signaling information.

How to receive

Once you start encrypting and packetizing your texts, you need to have a proper app on your phone. I don’t know how hard it is with iOS but for Android there are at least couple open source apps.

I was using QKSMS and I don’t think I recommend it. First, MMS messages do not work on my phone (actually here is a big warning — check your regular app if it handles MMS properly when you have data turned off and dual SIM, it is apparently super hard to implement). Second, code quality is slightly less than ideal so it may be hard to plug in your extensions.

I can recommend Silence. It’s a fork of Signal and while it has less features than QKSMS, its code quality is much better IMO and it works well with MMS (just had to go with some specific branch).

How to reply

You can go with sms to email feature. Use Google Voice (it has it built in), or ask your service provider. You may be for instance texting yourself (sic!) and then your application will be reading emails.

Things around security, encryption, packetizing etc — they are probably the same. You may want to enhance your text messaging app to split one contact into multiple (as you’ll be effectively chatting with one number all the times which will multiplex many contacts from various networks).

Emails

Sending and receiving is basically the same as for texts. You need to handle encryption etc.

Your service may either register for push notifications or just poll via IMAP each minute. This seems to be working fine for me.

Voice

Again, you may want to go with some service which just calls you and passes OTP (if you just want to get “notification”). You may also go with things like IFTTT if you want to control the message content. You may want to translate it beforehand to English, for that you may go with some free service like Yandex.

On a plane

This all works nice but what if you are on a plane and the only connection you have is some “chat only” wifi allowing you to use Whatsapp or Facebook?

You can mulitplex messages the same way. Just have another number which will be sending messages to you and which you’ll reply the same way as with texts.

Notes on decrypting messages

I mentioned that you can modify some open source applications but you can actually go with some other approach. Just prepare a simple HTML file with inlined JS for decrypting and depacketizing. When you receive a message, just copy it to the page on your phone and read over there. It’ll probably be much less convenient but also way easier to do than modifying some closed source app.

Chatterbox Part 3 — Security and mobile devices

afish — Sat, 31 Oct 2020 09:00:53 +0000

This is the third part of the Chatterbox series. For your convenience you can find other parts in the table of contents in Part 1 – Origins

What do we do about mobile devices? Depending on which platform we chose in previous part, we may have very limited options. If we go with our custom UI, we may need to implement something. For emails it’s relatively simple — there are plenty of clients out there. Similarly for IRC server. And for Slack-like chats there typically is some mobile app provided.

However, there is a big question here — if we decided to go with Slack, how do we make sure our messages are safe and encrypted?

Problem

If we go with Slack-like solution, we can easily push messages using CLI, REST API, webhooks etc. But if we just push plaintext then we have a very serious issue to solve. Apart from not trusting the underlying network, we now have another component which can leak data, spy on us, or simply expose something publicly. It may be okay if we generally don’t care, but once we take it for serious we need to have some good solution in place.

We could go with something built on top of the IM. Take Keybase or any other GPG based solution, generate keys, and send encrypted messages. But this effectively turns whole communication upside down, and your other side must be aware of it. Something we don’t necessarily want to do.

Solution

So we need to implement encryption. I’m not going to tell you how to do it (there are plenty of tutorials out there), just keep these things in mind:

You WILL HAVE TO be able to rotate keys sooner or later
You probably want to have backwards compatible solution (for changing algorithm, keys etc)
Once you start encrypting messages, it makes it much more error prone — think of unicode characters etc. You’ll need to go with BASE64 or similar, but then your messages will be much longer

Web interfaces

Now, let’s say that we push encrypted messages to Slack-like platform. What do we do next? We need to decrypt them on our side when reading.

For web interfaces this is relatively simple. Just inject some JS decrypting things on the fly. You need to figure out how to maintain keys, also, inlining JS may be disallowed depending on same origin policy. However, it is relatively straightforward.

For desktop applications with webview, you may need to use custom proxy to drop some HSTS-like headers etc. Nothing big but this may decrease your performance.

Also, keep in mind some messages may still need to go in plaintext. For instance links, link preview feature may be implemented on a server side of the chat platform, so if you want to have preview, you’ll need to decrypt them before pushing.

Mobile apps

Things here get even trickier. Some platforms will not have open source clients. You may still modify them on a binary level (with smali for Android for instance) but then you have to kind of maintain them. It may be much harder for iOS vs Android vs other platforms. Still, this is worth doing because it typically gives much better UI than web browser on mobile phone.

Security is also important here. You need to keep mobile apps up to date, so you need to be able to reapply your changes easily. Forking the repository works but how often do you want to go and solve merge conflicts?

Chatterbox Part 2 — Desktop interfaces

afish — Sat, 24 Oct 2020 08:00:52 +0000

This is the second part of the Chatterbox series. For your convenience you can find other parts in the table of contents in Part 1 – Origins

So we have our system for IM and we’d like to have a nice UI for chatting with people. Obviously, we could use regular web interfaces (like for Facebook) but the whole point is to have everything in one place, no matter what network we use. Let’s go and see couple solutions.

Self maintained UI

Obviously, we can just implement something. This doesn’t sound like a big deal, right? Just a webpage, contact list, chat window, text box and button to send message. However, there are many things we take for granted and implementing them just takes time:

Just a nice UI — reading black text on white background works well for 5 minutes. It’s good to have colors, shapes, etc.
Timestamp formatting — sounds simple but if you travel then it’s important to show “human friendly” time in local timezone, not something in UTC. This most likely require some front logic in JS
Link previews — it’s cool to see the page without opening the browser, not to mention playing YT videos or embedding images
Message markers — was the message sent? Delivered? Displayed? How many people read it in the group chat?
Highlighting new messages, showing popups, managing sounds
This should be fast, work on all your devices (imagine HTML support on older tablets, scale to width etc)

There are just so many things to implement. Obviously, having custom UI is good but it’d be great if we could “just have” the UI.

No UI at all

Maybe don’t use UI at all, maybe go with emails, flat files or something similar? Well, this works pretty god for some time but is much harder to maintain. It’s easy to lose track of who you’re talking to (because emails look the same, only subjects are different).

I was using this approach for some time and it worked well for a secondary communicator. Once I switched entirely to Chatterbox, it was a no go. Depending how much you talk to people, it’s just good to have some decent interface.

Also, conversation statuses are slightly harder. How do you indicate that some message was read by the other side? You send another email?

Reusing webclient

You could go with some online IRC or XMPP client, just wire your communicator into server and you’re good. There are even platforms like that — for instance Bitlbee. You expose “regular” server and connect to it using any IRC client you want. What’s more, you can change your client any time, just go with different IRC application. You have it for any OS, any device, desktop-based, web-based, etc.

You may have some hard times with sharing media, though. You need to upload files somewhere, either you host them by yourself, or you reuse some public hostings. This is not as simple as it can be, not to mention, that it still needs to go through IRC protocol (or whatever protocol you use). Can be tricky for bigger files, unrecognized extensions etc.

Showing conversation updates may be harder, though. How to show that someone read the message? You send regular IRC status update? It can get messy.

Big drawback is that this may be effectively stateless. Once you push a message, you cannot modify it which means that any editions, status updates etc are getting hard.

Reusing platform

Obviously, reusing sounds like a good idea. And there are multiple platforms out there which you can use and wire through your IM. Most popular are Slack, Discord, Mattermost, Rocketchat, Fleep.

These platforms are pretty powerful. They have their dedicated mobile clients, web interfaces, they support file uploads, permission managements etc.

How do you show conversation statuses? Just use reactions. You can show “chains” icon to indicate message is being processed or “thumbs up” for delivered one. You can also inject CSS to change background for delivered messages, etc. It maybe slightly harder for mobile client but generally works pretty well. Also, since this is just a web interface, if you need to tune it more (like plug your proxy etc), you can always write simple desktop webview app and do the magic.

I’m using chat like this for couple years now and works really good. I have status updates, chat rooms for each contact, link previews, file uploads etc. What’s more, it’s hosted by the owner so I don’t need to maintain it (we’ll get to security later in this series). And what’s more, it’s free.

How hard is it to script platform like this? It took me 3 hours for the first one. You just need to create new chat room for contact, invite two users (your main which you’ll be logging in and another one for “the other side” (or use webhooks)), configure notifications etc. Everything else is given to you out of the box, depending on the platform you go with.

There are drawbacks obviously. You are not free to implement anything you like (modifying messages may be harder). You may get throttled. You need to implement security (because it is hosted beyond your control). It may just die one day, stop being free, or block your account for suspicious activity. Sure, it may not work at all one day. Also, since you push messages to the platform, they’ll have effectively two timestamps — one of the time when it was pushed to Slack-like place, the other one of the “actual” send time (and these timestamps may be heavily out of sync).

Chatterbox Part 1 — Origins

afish — Sat, 17 Oct 2020 08:00:31 +0000

This is the first part of the Chatterbox series. For your convenience you can find other parts using the links below (or by guessing the address):
Part 1 — Origins
Part 2 — Desktop interfaces
Part 3 — Security and mobile devices
Part 4 — Other channels
Part 5 — Self healing
Part 6 — Protocols
Part 7 — File writing
Part 8 — Integrations
Part 9 — Calling Facebook using GSM
Part 10 — Poor man’s voice-based paging system
Part 11 — Scraping memory dump in Chrome with Chrome Debugging Protocol
Part 12 — Scraping page’s model with JavaScript or extensions
Part 13 — Capturing model with Fiddler
Part 14 — SMS application for android the hacky way
Part 15 — Make Messenger call you

One of the nicest aspects of IT is one problem can be solved in multiple ways. Some solutions are clearly “wrong”, some of them are clever and tricky, some of them are the de facto or de jure standards. This generalizes to patterns, designs, and ultimately — whole systems. On the other hand, multiple standards lead to incompatibilities and issues, just like in this famous XKCD.

This is a big issue in instant messaging world. There are many solutions in the market and they are very rarely compatible. There were multiple “standards” which seemed like they could solve the issue once and for all (XMPP for instance) but they failed miserably (abandoned by Facebook or Google) and now it looks like things are siloing again. Maybe the trend will reverse in couple years but currently there are more and more platforms for communication, with both text and voice solutions, which are by design independent and fight for market share.

I’m using dozens of protocols, literally. These include more popular ones like Facebook or Hangouts, some typical “group chats” like Slack or IRC, and some local solutions as well like Gadu Gadu in Poland. In some networks I have couple accounts (for instance for multiple SIM cards I have). Also, I’m using multiple devices (phones, tablets, desktops) and generally don’t like typing on mobile ones. Configuring all these devices is super painful, especially that they all have different operating systems etc.

I was using IM+ for some time and it was cool. It supported many of my networks (back in 2012 I didn’t have so many of them) but it had this one killer feature – if I didn’t get the message (because I was out) it was sending it to email and I could reply to it using regular email client. That was really convenient as I didn’t need to configure separate apps for each network, I just had to configure email client and stay in touch.

Unfortunately, couple years back the email feature stopped working. After using it for couple years I realized it’s very useful so I decided to reimplement it on my own in a very narrow scope. How hard could it be after all?

This is how Chatterbox started.

So there were some “plans” around it:

Keep it “simple” – I didn’t want to host complex “infrastructure” (like databases, queues etc)
Just support one protocol — XMPP — and use other networks via gateways
Send email for each message and monitor some inbox so I can respond
It’s just a “secondary” application, not something replacing all my communicators (especially Miranda NG I was using at that time)
Not spend much time on implementation — just get it up and running in hours and probably never develop

This didn’t seem to be a hard thing to do so it took me like two evenings to implement all of that. Now, over 3 years later, Chatterbox is way bigger than it was supposed to be:

It is my main IM and “web monitoring” tool (for forums, books, promotions etc)
Supports web, desktop, and mobile UI
Can mail me, text me, call me, read messages out loud, transform speech to text
I can mail it, text it, use mobile, desktop, or web interfaces to send messages
I can use it on an airplane with messaging wifi only (and still can contact any network, not just Fb or Whatsapp)
It can share media between networks so I can “just send” image to any network, no matter if it supports attachments or not (like IRC)
Recognizes when I’m around or unavailable and then sends notifications
Can schedule messages for later so I can compose now and get it deliver at a specific time
Supports recalling messages so I can stop them from getting delivered after hitting enter
Notifies about deadletters and failures so I’m paged when there is an issue
Encrypts messages so it’s safe in transit over public channels
Supports 30+ networks (yes, I’m using that many!)
Runs on a single box with relatively low resource usage but can also scale on other machines if needed
Self heals itself and can survive pretty significant outages (as long as the machine doesn’t die, obviously)
Is free and doesn’t use paid components
And the most important — it works for years now and I know I can trust it

It is a long and beautiful journey which let me learn a lot about operating systems, distributed applications, “enterprise” approaches. I forked plenty of libraries, fixed bugs in components I never wanted to touch, reimplemented OS primitives, or just learned multiple tricks.

Over the next parts I’ll describe how I implemented couple things which may be super simple once you know them but surprising if you never tried doing. It won’t be technical, more a bunch of notes showing how things can wrong and what mistakes I made on the way.