Chatterbox Part 7 — File writing

This is the seventh part of the Chatterbox series. For your convenience you can find other parts in the table of contents in Part 1 – Origins

Sometimes you may decide to go with simple file instead of a full blown SQL/NoSQL database. However, writing a file is not as simple as it sounds. APIs are not atomic, writing a file takes time (especially if it’s more than just couple of bytes). What happens if your machine gets unplugged? Your process dies? Exception is thrown? File gets corrupted and things go wrong. That’s not a problem when you write new file but it is a big issue when you overwrite an already existing file (like when reading data, recalculating in app, and writing back to “the same” file). And you may think that writing on the side and then replacing existing file is a solution but as this SO discussion shows it’s not trivial (especially when you don’t know underlying file system or even if it’s a physical drive or some sort of S3 emulation). How do we protect from data corruption?

There are couple ideas how to solve this. I assume there is just one user of the file, if you have many of them then you’ll need to extend protocols with some system-wide lock or something similar.

Two files and pointer

If your filesystem supports atomic deletion of the file, you can use this protocol:

Assumptions:

  • There are two files with data: A and B
  • There may be a pointer file P with whatever content (ideally empty)

Reading:

  • Check if P extists
  • If it does — read and return A
  • Otherwise read and return B

Writing:

  • Check if P exists
  • If it does — write B and delete P
  • Otherwise write A and create empty P

How does it work? P tells you if you should go with file A or B. If P exists then file A contains latest data, otherwise it’s file B.
Now, what if you’re writing to file and it fails? You’re writing to backup file so it doesn’t matter, your main file is still intact. Since P modification is atomic (create or delete) then it’s safe.

What if you cannot go with pointer (or you don’t want to)?

Two files and more writes and structured data

This solution requires you to either use structured data (like JSON) which allows you to tell whether the file is correct or to write checksum next to the file content.

Assumptions:

  • There are two files with data: A and B
  • Content is either structured or there is a checksum before the actual content

Reading:

  • Read A and check if it’s correct (via structure or checksum)
  • If it is — write A to B. Return A.
  • Otherwise — write B to A. Return B

Writing:

  • Write to A.
  • Write to B.

How does it work? Let’s say that writing to A fails. This means that when reading A you’ll discover it’s broken so you’ll restore A from replica B which is a rollback.
If writing to A succeeds but writing to B fails, you’ll read A correctly and overwrite B (roll forward).

This algorithm works but requires more writes and some special structure. What if you have to store unstructured text and cannot determine it’s broken just by reading its content?

Three files and even more writes

Assumptions:

  • There are three files with data: A, B, and C

Reading:

  • Read A, read B, compare
  • If they are the same — write A to C. Return A.
  • Otherwise write C to A, write C to B, return C.

Writing:

  • Write A. Write B. Write C.

If write to A fails then reader will detect data mismatch by comparing A and B, and then perform data restore by overwriting A and B.
If write to B fails then reader will detect data mismatch the same way and will restore A and B from C.
If write to C fails then reader will see A and B are equal. In that case reader will overwrite C using A.

Obviously, there are multiple writes here and higher data consumption.

Keep in mind that if you read from C (because A and B are not the same) then streaming file C and updating A+B cannot be done straightforwardly. Let’s say that you do something like:

What if something happens after writing to B but before whole C file is processed? You’ll end up with A and B being the same but with wrong content. This can be somehow detected by checking file sizes or you can do:

This may result in some really hard to track edge cases (when B happen to have matching prefix and you start overwriting A and then A and B are the same but they are just prefix of C). In practice, you may take even more crazy approach:

Reading:

  • Read A, read B, compare
  • If they are the same — write A to C. Return A.
  • Otherwise take first character of C and write something different to B. Write C to A. Write C to B. Return C.

Writing:

  • Take first character of new A and old A, write something different to B
  • Write A.
  • Write B.
  • Write C.

Why do we need this? Let’s say we don’t and your files are A = 123, B = 123, C = 123. You now want to write 14 as new data. You come and write A = 14 and then fail when writing B after flushing first character.
So when reading you realize A = 14 is not equal to B = 1. So you go and start restoring C = 123 to A and B. You write first character to A and then fail.
Next reader comes and reads A = 1 and B = 1 so it looks okay. But C = 123!
You also cannot use file sizes to figure out if full write was successful (because you don’t know if C is still a safe backup or broken new version).

Extended version will fail if you accidentally truncate files (which is a typical behavior). Let’s say writing to A failed early and left A empty. You then go, read first character of C and want to write something else to B. You fail but leave B being empty, so now A and B are equal.

You may think that you can compare file timestamps and you don’t need three files, only two. Just compare their content, if they differ then check if A was written after B — if it was then B is the correct content (backup), otherwise A is the correct content (as you failed when writing B). Keep in mind not all file systems provide sufficient precision, also you generally cannot trust timestamps as time may go backwards etc.

Also, you cannot just read A, B, and C, and compare all of them because you don’t know if C was written correctly (you cannot determine which file is the safe backup).

So it may be that your full protocol needs to be:

Reading:

  • Read A, read B, compare
  • If they are the same — write A to C. Return A.
  • Otherwise you need to write some garbage to A and B, but if any of these is empty then you need to start with it first. Write C to A. Write C to B. Return C.

Writing:

  • Take first character of new A and old A, write something different to B
  • Write A.
  • Write B.
  • Write C.