This is the sixth part of the Chatterbox series. For your convenience you can find other parts in the table of contents in Part 1 – Origins

We covered a lot about general system design and features. Now it’s time to talk about the actual core of the system for which it was created — handling various IM protocols.

It was all supposed to be working as an XMPP protocol with gateways (transports) to other networks. And my current recommendation is — do not use these transports. They don’t support may network features (like group chats, media etc), they are generally hard to maintain (you need to have separate server next to the regular XMPP one), and they use a lot of resources. Thankfully, there are other approaches:

  • Native support in your beloved language — if there is a library for doing that, you’re all set!
  • Native support in some other language which you can run on your platform — for instance Java library used in .NET via IKVM. However, this may be failing in edge conditions due to some differences (like thread dying on unhandled exception etc).
  • Native support in some other language — just write some simple infrastructure pushing messages over named pipes to similar component written in your beloved language and effectively you end up with same solution

Unfortunately, some protocols cannot be handled like this. It’s great when you have library maintaining stateful connection but for some protocols it won’t work. You can try:

  • Polling — depending on the API, you may be able to poll messages via REST API or something similar. This obviously needs to be done carefully because of throttling, permissions, banning etc.
  • CLI — there are command line tools which you can just run each minute to receive messages. It may be harder to send/receive media, there may be issues with encoding and national characters.
  • Transports via other means — XMPP, Bitlbee, Matrix, IRC, some other solutions allowing you to connect indirectly. They probably won’t support all protocol features.
  • Web scraping — this may not be actually that bad

There are multiple things to keep in mind when maintaining a protocol. First, they are stateful — a connection must be there and it must be maintained. Sometimes it is very hard to say if it’s still alive (because of bugs in libraries etc). Hence, implementing a watchdog is tricky as just a dumb ping won’t work. You need to do something meaningful for the protocol and see if it worked — like change your status from Online to Away and back, send message to someone/something etc.

Also, restoring connection may be tricky. Some messages may get lost, some status updates may be received so it’s easy to lose order of events. Also, if there is a hiccup in the connection then you may actually miss some notifications. It’s worth asking for messages history periodically.

You should also check if your side effects are really there. If you send a message to someone else, just ask your server for a message history and see if it’s there. Sometimes it’s easier to have effectively two connections with the same account to immediately see if the message you send on connection A is received on connection B.

Also, reuse server identifiers as much as possible. If server sends you an ID with the message, store it to remove duplicates easily.

Web scraping

While it sounds terrible, there are actually solutions to do it effectively.

First, you need good tools. I’m going with Puppeteer as it has nice API and pages generally work. It also allows you to install extensions, click with mouse and type with keyboard the “regular” way (not JS).

Second, some pages use some decent API under the hood. You may just observe network traffic (directly or with some extensions) and dump it from the browser. This is the best way as you’ll have full metadata for messages (timestamps, senders, ids etc).

Some pages keep a lot of data in their view model. Depending on the library, it may be super easy to extract (like with Knockout.js) or moderately hard (like with React). However, if metadata is there, you can easily parse it (from JSON etc).

Unfortunately, some pages are terrible and you literally have to scrape the DOM. You’ll definitely hit issues with timestamps (especially that for some protocols they change when you refresh the page), message identities (is this brand new “OK” message or something we’ve already scraped), content, media etc. Not to mention that DOM may change pretty often and your scripts will fail.

When it comes to replying via web, sometimes it can be done entirely in JS, sometimes you need to go with keyboard typing. Puppeteer handles that easily.

If you control the browser window size etc, instead of interacting with DOM via query selectors you may go with clicking on well known pixels by their coordinates. This may break after UI changes, obviously.

Some platforms have multiple interfaces with different implementations. For instance Skype is embedded in Outlook UI, it may be easier to find other implementation if it exposes some metadata via different library or DOM.