This is the first part of the TLA+ series. For your convenience you can find other parts using the links below:
Part 1 — TLA+ to ILP Part 1 — Reducing problem
Part 2 — TLA+ to ILP Part 2 — Performance check
Today we are going to reduce some TLA+ algorithm to ILP and see if it works. Let’s begin.
We take the Model Checking example by Tomasz Masternak from Exactly Once. It’s a short code presenting possible race conditions and issues with distributed transaction consisting of sending a message to the out queue and updating the message in a database. See the code first and understand what’s going on there as I won’t be covering it here.
Now, what we want to do is to model the same problem in ILP and see if we can get correct results. Let’s decompose the TLA+ model:
- We have an input queue with some messages
- We have an output queue with sent messages and their transaction id
- We have a database storing some transaction id
- Each message may be processed multiple times in case of failures
- Business transaction is represented as incrementing a transaction counter
- Message is marked as processed
We need to represent the model in ILP. We are going to store an array of bits holding messages. I’ll present this in the rough and dirty way without too much of an abstraction but in a production code you would probably encapsulate all the nasty stuff.
The code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 |
var integerWidth = 10; var epsilon = 0.1; var storeDebugExpressions = false; var cacheConstants = false; var storeDebugConstraints = false; var solver = new CplexMilpSolver(new CplexMilpSolverSettings { IntegerWidth = integerWidth, Epsilon = epsilon, StoreDebugExpressions = storeDebugExpressions, CacheConstants = cacheConstants, StoreDebugConstraints = storeDebugConstraints }); var messagesCount = 1; var retriesCount = 2; var totalSteps = messagesCount * retriesCount; Func<int, int> getBitsCount = maximumValue => 1 + (int)(Math.Log(maximumValue) / Math.Log(2)); var retriesBitsLength = getBitsCount(retriesCount); var transactionBitsLength = getBitsCount(totalSteps); var recordSize = // is in queue in 1 + // is in processed 1 + // is in queue out 1 + // is in db 1 + // failures count retriesBitsLength + // transaction id in queue transactionBitsLength + // transaction id in db transactionBitsLength; Func<IVariable[]> createRecord = () => Enumerable.Range(0, recordSize) .Select(v => solver.FromConstant(0)) .ToArray(); var bitFlags = Enumerable.Range(0, messagesCount) .SelectMany(step => { var record = createRecord(); record[0] = solver.FromConstant(1); return record; }) .ToArray(); Func<IVariable[], IVariable> recordToIVariable = bits => { int power = 1; IVariable result = solver.FromConstant(0); for(int i=bits.Length - 1;i>=0;--i){ result = result.Operation<Addition>(bits[i].Operation<Multiplication>(solver.FromConstant(power))); power *= 2; } return result; }; Func<IVariable[], int, int, IVariable> bitsToIVariable = (record, offset, length) => recordToIVariable(Enumerable.Range(offset, length) .Select(i => record[i]).ToArray()); Func<int, int, IVariable[]> integerToBits = (value, bitsCount) => { List<IVariable> result = new List<IVariable>(); for(int i=0;i<bitsCount;++i){ result.Add(solver.FromConstant(value % 2 == 1 ? 1 : 0)); value /= 2; } result.Reverse(); return result.ToArray(); }; Func<IVariable, int, IVariable[]> ivariableToBits = (value, bitsCount) => { List<IVariable> result = new List<IVariable>(); for(int i=0;i<bitsCount;++i){ result.Add(value.Operation<IsGreaterOrEqual>(solver.FromConstant(1))); value = value.Operation<Division>(solver.FromConstant(2)); } result.Reverse(); return result.ToArray(); }; // Just for easier debugging for development List<IVariable> debug = new List<IVariable>(); Action<IVariable[], IVariable> failMessage = (record, hasFailed) => { var failuresCount = bitsToIVariable(record, 4, retriesBitsLength); var tooManyFailures = failuresCount .Operation<IsGreaterOrEqual>(solver.FromConstant(retriesCount - 1)) .Operation<Conjunction>(hasFailed); // Remove from queue in record[0] = solver.Operation<Condition>(tooManyFailures, solver.FromConstant(0), record[0]); // Mark as processed record[1] = solver.Operation<Condition>(tooManyFailures, solver.FromConstant(1), record[1]); var newFailuresCount = solver.Operation<Condition>(tooManyFailures, failuresCount, failuresCount.Operation<Addition>(solver.FromConstant(1))); var newFailuresCountBits = ivariableToBits(newFailuresCount, retriesBitsLength); for(int i=0;i<retriesBitsLength;++i){ record[4 + i] = solver.Operation<Condition>(hasFailed, newFailuresCountBits[i], record[4 + i]); } }; List<IVariable> history = new List<IVariable>(); for(int step=0;step < totalSteps;++step){ // Fail macro can go to the next iteration so we need to have a variable representing if we carry on with the loop var isStillProcessing = solver.FromConstant(1); // Receive // First, select some message from queue in if possible Func<int, IVariable> isInQueueIn = id => bitFlags[id * recordSize]; var howManyInQueueIn = solver.Operation<Addition>( Enumerable.Range(0, messagesCount).Select(id => isInQueueIn(id)).ToArray() ); // We know if there was any message selected thanks to "hadAnyMessageInQueueIn var hadAnyMessageInQueueIn = howManyInQueueIn.Operation<IsGreaterThan>(solver.FromConstant(0)); isStillProcessing = hadAnyMessageInQueueIn; history.Add(isStillProcessing); var selectedMessage = solver .CreateAnonymous(Domain.PositiveOrZeroInteger) .Set<LessOrEqual>(solver.FromConstant(messagesCount - 1)); history.Add(selectedMessage); // Now, extract bits of the message var selectionStartIndex = selectedMessage.Operation<Multiplication>(solver.FromConstant(recordSize)); var currentRecord = Enumerable.Range(0, recordSize) .Select(i => selectionStartIndex.Operation<Addition>(solver.FromConstant(i))) .Select(i => solver.CompositeOperation<ArrayGet>(new ArrayGetParameters { Index = i }, bitFlags).ToArray().First()) .ToArray(); // Make sure that selected message is in queue (so if we're still processing then in queue must be true) solver.Set<Equal>( solver.FromConstant(1), solver.Operation<MaterialImplication>(isStillProcessing, currentRecord[0]) ); // Process // This isn't exactly equivalent to the code in TLA+ as we don't keep this variable in the model // We'll handle this later on var transactionId = step + 1; // SendOutgoingMsg // Either fail message var failedInSendOutgoingMsg = solver.CreateAnonymous(Domain.BinaryInteger).Operation<Conjunction>(isStillProcessing); history.Add(failedInSendOutgoingMsg); failMessage(currentRecord, failedInSendOutgoingMsg); isStillProcessing = solver.Operation<Conjunction>(isStillProcessing, failedInSendOutgoingMsg.Operation<BinaryNegation>()); // Or put it in queue out and set the transaction counter currentRecord[2] = solver.Operation<Condition>(isStillProcessing, solver.FromConstant(1), currentRecord[2]); var queueTransactionCounter = integerToBits(transactionId, transactionBitsLength); for(int i=0;i<transactionBitsLength;++i){ currentRecord[4 + retriesBitsLength + i] = solver.Operation<Condition>(isStillProcessing, queueTransactionCounter[i], currentRecord[4 + retriesBitsLength + i]); } // UpdateDb // Either fail message var failInUpdateDb = solver.CreateAnonymous(Domain.BinaryInteger).Operation<Conjunction>(isStillProcessing); history.Add(failInUpdateDb); failMessage(currentRecord, failInUpdateDb); isStillProcessing = solver.Operation<Conjunction>(isStillProcessing, failInUpdateDb.Operation<BinaryNegation>()); // Update database and set the transaction counter currentRecord[3] = solver.Operation<Condition>(isStillProcessing, solver.FromConstant(1), currentRecord[3]); var dbTransactionCounter = integerToBits(transactionId, transactionBitsLength); for(int i=0;i<transactionBitsLength;++i){ currentRecord[4 + retriesBitsLength + transactionBitsLength + i] = solver.Operation<Condition>(isStillProcessing, dbTransactionCounter[i], currentRecord[4 + retriesBitsLength + transactionBitsLength + i]); } // AckInMsg // Either fail message var failInAck = solver.CreateAnonymous(Domain.BinaryInteger).Operation<Conjunction>(isStillProcessing); history.Add(failInAck); failMessage(currentRecord, failInAck); isStillProcessing = solver.Operation<Conjunction>(isStillProcessing, failInAck.Operation<BinaryNegation>()); // Remove message from the in queue and mark as processed currentRecord[0] = solver.Operation<Condition>(isStillProcessing, solver.FromConstant(0), currentRecord[0]); currentRecord[1] = solver.Operation<Condition>(isStillProcessing, solver.FromConstant(1), currentRecord[1]); // We modified record in place and created new array elements, let's rewrite them back for(int i=0;i<recordSize;++i){ solver.CompositeOperation<ArraySet>(new ArraySetParameters { Index = selectionStartIndex.Operation<Addition>(solver.FromConstant(i)), Value = currentRecord[i] }, bitFlags); } } // Now, let's now try generating ghost message // We iterate through messages, check if any of them is ghost and make sure it's true var isGhost = solver.FromConstant(0); for(int i = 0; i < messagesCount; ++i){ isGhost = isGhost.Operation<Disjunction>(solver.Operation<Conjunction>( // In out queue bitFlags[i * recordSize + 2], // Not in db bitFlags[i * recordSize + 3].Operation<BinaryNegation>() )); } isGhost.Set<Equal>(solver.FromConstant(1)); solver.AddGoal("g", solver.FromConstant(0)); solver.Solve(); Func<int, int, int, double> getValue = (id, offset, length) => { double result = 0; for(int i=0; i < length; ++ i){ result *= 2; var value = Math.Round(solver.GetValue(bitFlags[id * recordSize + offset + i])); result += value; } return result; }; // This is just for easier debugging during development foreach(var v in debug){ Console.WriteLine(Math.Round(solver.GetValue(v))); } for(int i=0;i<totalSteps;++i){ var processingInStep = Math.Round(solver.GetValue(history[i * 5])) > 0; Console.WriteLine("Step {0} {1} processing", i + 1, processingInStep ? "was" : "wasn't"); Console.WriteLine("\tSelected message {0}", Math.Round(solver.GetValue(history[i * 5 + 1]) + 1)); Console.WriteLine("\tFailure in outgoing queue {0}", Math.Round(solver.GetValue(history[i * 5 + 2]))); Console.WriteLine("\tFailure in db {0}", Math.Round(solver.GetValue(history[i * 5 + 3]))); Console.WriteLine("\tFailure in ack {0}", Math.Round(solver.GetValue(history[i * 5 + 4]))); } for(int i=0;i<messagesCount;++i){ Console.WriteLine("Message {0}", i + 1); Console.WriteLine("\tIn queue in: {0}", getValue(i, 0, 1)); Console.WriteLine("\tIs processed: {0}", getValue(i, 1, 1)); Console.WriteLine("\tIn queue out: {0}", getValue(i, 2, 1)); Console.WriteLine("\tIn db: {0}", getValue(i, 3, 1)); Console.WriteLine("\tFailures count: {0}", getValue(i, 4, retriesBitsLength)); Console.WriteLine("\tTransaction id in queue: {0}", getValue(i, 4 + retriesBitsLength, transactionBitsLength)); Console.WriteLine("\tTransaction id in database: {0}", getValue(i, 4 + retriesBitsLength + transactionBitsLength, transactionBitsLength)); } |
A lot in here so let’s go line by line.
First, we initialize the solver in 1-13. Just a regular CPLEX.
Next, we store some hyperparameters: how many messages (line 15) and how many times each message can be processed (two times, line 16). We then calculate the number of all steps which solver should emulate and this is exactly number of messages times how many times we try processing a single message.
Next, we’re going to store things as bit flags so let’s define some helper function which calculates how many bits we need to store some bounded integer (line 19). It’s just a logarithm, nothing fancy (I’m running this in old .NET version so I don’t have Math.Log2
method yet…). We then calculate how many bits we need to store retries counter and transaction counter (lines 21-22).
Now, each record is going to hold the following:
- Whether it is currently in the input queue
- Wheter it was processed
- Whether it’s stored in the output queue
- Whether it’s in database
- Retries counter
- Transaction counter stored in the output queue
- Transaction counter stored in the database
This is in lines 21-37.
Lovely, now an actual bit mask in lines 40-50. We initialize flags so that initially all messages are in the input queue, everything else is zeroed-out.
Next, we define some helper methods to convert bits to integer, integer to bits etc. This is in lines 52-85. We also define some collection for debug variables to be able to fix issues easier (line 91) – this doesn’t affect the model checking at all.
Okay, let’s now begin with an actual calculation. Let’s ignore failmessage
for now and move on to line 117. We iterate through steps and start by defining a variable indicating whether we did a break in the loop – it’s this line in TILA+. The variable will be used in all operations. In ILP we can’t just break the iteration – we need to update all variables because we don’t know the values yet, so we use this flag to effectively do nothing.
First, we implement the Receive
part. We first calculate how many messages are there in the queue (lines 123-126) and if we have a message to process (lines 128-130). Next, we define a free variable indicating which message from the queue we start processing (lines 132-135). This is an important part as we don’t process messages in any order, we are free to model race conditions, caches, network delays, or whatever else. We just get a message from the queue.
Now, we get the bits of the message to a helper array to keep things easier and faster. We use Array.Get
operation which is slow (squared time complexity). We do it just once in lines 138-145.
However, since we don’t actually know how many messages there are, it may be that we select a message which was already processed. We make sure that if we selected the message then that message must have been in the queue (lines 148-151). However, if there are no more messages to process then we’ll need to emulate a no-op instructions later on.
Okay, now we move on to the business processing part. It’s trivial as it’s just a counter calculation at this point (line 157). However, later on we’ll need to emulate an assignment which in TLA+ copies values so we’ll need to copy them as well.
It’s a high time for some actual processing. We start with sending the message to the output queue in line 161. We first start by optionally failing the process. We create a free variable in line 161 which can be set only if we’re still processing (which is true at this point as nothing have failed yet). Then, we move on to the failing procedure (we call it in line 163).
The failing procedure starts by extracting the failure counter (line 97) and then checks if this message failed too many times already (lines 98-100). It’s true if and only if it was processed exactly once and it failed this time. Now we need to start performing conditional operations.
If the message failed too many times then we need to remove it from the input queue (line 103) but otherwise we don’t touch the bit at all (we rewrite its value). We do similar for processing flag – if the message failed too many times then we mark it as processed, otherwise we don’t touch the flag.
Now, we calculate the new failures count and convert it to bits (lines 108-109). We then store these bits in the record (lines 110-112).
We now carry on with processing. We update the flag indicating whether we failed and stopped the iteration (line 164). Next, we try putting the message in the output queue (line 167) and we use conditionals again – we send the message if and only if we’re still processing the loop (meaning that the message hasn’t failed too many times and hasn’t failed now). We also update the transaction counter in the same manner (lines 168-171). Notice here that we calculate the transaction counter and create new variables for it – that’s because this line effectively copies the value so we need to create a copy as well.
Next, we perform similar processing for updating the database (lines 174-185).
Finally, we acknowledge the message. We again check the failure (lines 189-192) and then update flags – remove the message from the input queue and mark it as processed.
Since we modified variables in place, we need to store them in the bitset. We do that in lines 200-206. And we’re done.
That’s it for representing the code. Now we need to model the invariant. We start with NoGhost
invariant which says that either the message is not in the output queue or it’s stored in the database. So we get each message one by one and check whether it’s in the input queue (line 217) and it’s not in the database (line 219) – in that case the message is ghost We finally add the constraint that this must be true so there must be some ghost message (line 222). If ILP solver finds an evaluation and shows a solution then it means that the invariant was broken and that the business code is wrong. If ILP fails to find a solution – it means that the invariant holds.
Now, we need some code to actually retrieve the history in a friendly manner (lines 228-262). And we’re done. Output:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 |
Tried aggregator 5 times. MIP Presolve eliminated 7077 rows and 4313 columns. MIP Presolve modified 2809 coefficients. Aggregator did 722 substitutions. Reduced MIP has 122 rows, 70 columns, and 365 nonzeros. Reduced MIP has 57 binaries, 13 generals, 0 SOSs, and 0 indicators. Presolve time = 0.02 sec. (12.70 ticks) Found incumbent of value 0.000000 after 0.02 sec. (14.35 ticks) Root node processing (before b&c): Real time = 0.02 sec. (14.53 ticks) Parallel b&c, 8 threads: Real time = 0.00 sec. (0.00 ticks) Sync time (average) = 0.00 sec. Wait time (average) = 0.00 sec. ------------ Total (root+branch&cut) = 0.02 sec. (14.53 ticks) Step 1 was processing Selected message 1 Failure in outgoing queue 1 Failure in db 0 Failure in ack 0 Step 2 was processing Selected message 1 Failure in outgoing queue 0 Failure in db 1 Failure in ack 0 Message 1 In queue in: 0 Is processed: 1 In queue out: 1 In db: 0 Failures count: 1 Transaction id in queue: 2 Transaction id in database: 0 |
How do we read that? First, it took 0.02 second to find that the invariant is broken. We can see that in the first loop iteration was processing a message (so there was still something in the input queue), chose first message from the queue, and then failed when putting it in the output queue. Step 2 tried doing the same and succeeded, but later failed when updating the message in the database. Finally, we can see that the message is not in the input queue, was processed, is in the output queue but not in the database. We can also see the failures count set to one and that the transaction id was stored correctly in the output queue.
Next time we’re going to experiment with other invariants, and later on we’ll run some performance tests.