I was working with seemingly simple load job in Redshift. I was generating some data and wanted to store the output in separate table, just to persist it for debugging purposes. However, since the job could be reran if needed, I wanted to remove rows generated previously and add new ones. So I tried the following:

This is our table. Let’s assume that job can generate data for market 1 or market 2. So the job for market 1 does this:

Similarly, job for market 2 does this:

As you can see, both jobs use the same table but access different rows.

Now the thing is, if you try running both transactions in parallel, most likely you will get an error that transactions collide and one of them had to be aborted.

I didn’t understand why it works this way as per my understanding there is nothing wrong. Either it should use table lock and one transaction should wait for the other, ot it should use some range locks and figure out that there is nothing wrong.

Unfortunately, it is not that simple. Since Redshift is based on Postgresql, we can reproduce the problem over there as well. For instance, see dba.stackexchange.

Okay, we know what the problem is, how to solve it? I considered locking tables explicitly with LOCK statement but I never find it nice. I checked if it is possible to control partitions of the table but it doesn’t seem to be allowed in Redshift. I thought about doing so called “poor men’s clustering” with multiple tables joined using single view, but I didn’t want to increase the complexity.

Finally, I decided to go with simple MVCC-like approach. Add new column to the table, indicating timestamp of inserted row:

Now, when inserting data to the table, use this:

Finally, prepare one view extracting only latest data:

And now query the view instead of the original table. You can also run deletion job periodically if needed.