Types and Programming Languages Part 9 – Real life testing

This is the ninth part of the Types and Programming Languages series. For your convenience you can find other parts in the table of contents in Part 1 — Do not return in finally

Last time we spoke about tests and whether we should have them. Today couple words about different types of tests and what to keep in mind.

Business logic tests

These tests include anything confirming that your business logic works — unit tests, integration tests, end to end tests etc. It’s up to you how you maintain them, just keep in mind they need to validate the business purpose of the code, not the technical structure. You shouldn’t care about private methods and if they are called in order, you shouldn’t verify parameters nor anything technical. Just make sure that output matches your input.

Avoid any non-production components (mocks). Use real databases, real endpoints etc. Mock them (with mockito-like things or in-memory wrappers) only if you need to make tests more reliable or faster. Keep in mind it doesn’t matter that the tests are green if the application doesn’t work. Make sure you validate all layers of your software, including network handshakes, impedance mismatch problems, authorization/authentication, permissions etc.

Ideally, use infrastructure as a code to create fresh stack every single time.

Comparison and performance tests

Make sure that your application works in the real life scenarios. Grab your production logs, extract requests and then run them against your new code and your old one to compare if the outputs match. This way you can actually make sure that your application really works. There are multiple things to keep in mind here.

First, stateless versus stateful applications. If your app is stateless then running it against production logs is easy as it just takes the input and returns the output. Those are mathematical operations, recommendation systems etc. However, you can make all applications effectively stateless if you always log input and output of all your external calls — network request, database call, file read etc. Obviously, this may not be possible in all instances but if you can achieve that then all your applications will be effectively stateless.

Second, GDPR, CCPA and others. Your test environment may not be allowed to process production data so you need to anonymize them (both in logs and in transit between applications). Also, georedundancy must be handled correctly — if your data is not allowed to leave the EU then you can’t run your tests in the USA.

Third, load tests. You generally want to avoid taking logs from load tests and running them in comparison tests because that would make it a self-fulfilling prophecy. However, this is something you need to be able to recognize in production anyway — you need to be able to discern an actual request from some fake testing one.

Now, when you have logs for comparison tests, you achieve two more things: correct caches and databases. After running comparison tests you should have your caches ready to serve the traffic at full speed. This is crucial as caches should be scaled to keep some stable hit ratio which may be hard to achieve with fake requests. Also, your databases are filled with sufficient amount of data and — more importantly — actual real-life records.

Having these two things in place you can start running performance tests. You can just continue replaying the production traffic against your test environment but this time to measure the performance and whether you can reach expected amount of transactions per second. You need to replicate the production-like infrastructure, so if you have caches on external hosts in your prod cluster then you need to organize them in the same way in your testing environment. Same applies to databases, network calls, georedundancy etc. Keep in mind that this may become expensive — if your production fleet hosts 1k machines in three zones then you need to have similar characteristics in your testing lab, only a bit smaller. Once again, you need to balance your business requirements and how much of the revenue you’re willing to lose with the money you may spend on maintaining the testing lab and running tests.

Going to production

Testing in the test lab is cool but it doesn’t end there. When deploying to production you should follow some phasing releases. There are generally three approaches.

In the first approach you have another fleet of hosts of the same size. Then, you deploy the stand-by fleet (which may take hours) and atomically swap old prod hosts with new prod hosts (via load balancer or some similar means). This is nice because you can roll back to the previous version in the same way. However, this poses multiple risks — you need to pay for twice the fleet size, you may get spurious issues (like faulty host, faulty load balancer, faulty switch, etc), you may not be able to rollback easily (because caches changed).

Second approach is based on some phasing release. You deploy 1% of your prod hosts and let it warm up for some time. If metrics remain unchanged then you assume it worked correctly so you can deploy some more hosts. This is nice because you don’t pay for much bigger fleet but makes rollbacks harder (as it takes time to roll things back and the process can fail), may pose a risk of corrupting the production data (if newly deployed host of the stateful application breaks something). Also, your caches may need to be doubled because half of your prod fleet uses old cache keys and the other half uses new keys.

Third approach is to use A/B tests or toggle switches of any kind. You just implement an if condition in your code which you can control with some switch after the deployment. So you start with your fleet running just code A. Then you deploy the application with code A and B, and running just A because the switch is off. After the deployment you can enable B selectively. This is nice because you can control how many customers get the new behaviour and you can roll it back easily. However, apart from drawbacks of previous solution, this also poses the risk of having a bug in the if condition. Not to mention that if you have multiple toggle switches then it may be very hard to track what’s going on.

Repeating tests

Keep in mind that after deploying to production you should repeat business logic tests anyway. It’s not enough to assume that if it worked correctly in pre-prod then it’s going to be fine in production. Your prod environment uses different network infrastructure, different hosts, different databases so you need to repeat tests so you know if you need to roll back or not.

Penetration tests

Similar principle applies to penetration tests. You should run them against all your environments. Your production environment should be physically separated from your testing one, so you need to make sure that the configuration of the prod env is safe and secure.