Types and Programming Languages Part 8 – Testing – is it worth it?

This is the eighth part of the Types and Programming Languages series. For your convenience you can find other parts in the table of contents in Part 1 — Do not return in finally

Today we are going to talk about tests in software engineering.

Why tests?

That’s actually a very good question. Why do we even bother? Is it worth it? The answer is — it depends. We typically explain that we test our applications to make sure they work correctly. However, this is not accurate and obscures the big picture. Our ultimate goal is to answer a business need — whether we do that with software, excel spreadsheet, piece of paper, or hiring someone to do it for us — that’s just a matter of choice and personal preferences. Engineers often forget that it’s not about the software and that’s why they overengineer things. Software must do its job which is to bring some value. There is no point in automating things if the automation takes more time than the manual work. There is no point in writing a software if we can do things manually faster. Finally, there is no point in testing if we can do it faster or cheaper in some other way.

So why do we test? Because in many cases it’s the cheapest and the simplest way of making sure the business process won’t brake on Friday night. We need to remember that the goal of software engineering is not to engineer software but to drive the business with the aid of the software. The worst what can happen is when we need to stop the business as usual (BAU) – it doesn’t matter if it’s the lack of power, lack of resources, a bug in 3rd party API, or just a bug in our application. We shouldn’t care about that because our goal is to keep BAU going on.

Now, we need to assess what’s probable and what’s not. If we think that the power outage is real then we consider signing SLAs with power plants or having stand-by links. If we think that the earthquake is possible then we consider introducing georedundancy. If we think that the bugs in the software are likely then we consider testing. However, there is no obligation to test the application before deploying it to the production and we should always balance the risk of losing the revenue (and understanding how much of it we lose) with the cost of decreasing the risk.

So before moving forward we need to understand the Principle Behind Testing:

Tests decrease the likelihood of losing the revenue. Think twice how much money you can spend on testing. Don’t test just for the sake of testing things

So why do we test? Because it can be cheap and can decrease the probability of releasing bugs to production. However, making sure that there are no bugs is expensive and so we need to always consider whether it’s worth the cost.

How to test?

There are multiple different ways of testing. They usually include some clever names which you can be asked about during the technical interview and probably nobody but the recruiter knows. However, the most important distinction is — tests can be manual and automated.

Manual tests are good. Since it’s a human being who runs them, they are generally immune to some spurious failures like a button popping up on the screen a little later then usual. Also, they measure how people perceive the application so they can validate non-functional requirements. Manual tests are self-healing — if there is a bug in the test then the test itself will find it in most cases.
However, manual tests are slow and do not scale well. Most of the times they can’t be executed any time of day. They can’t handle repetitive tasks well and are hard to modify. And they are expensive.

Automated tests on the other hand can be run by the computer so they can be executed any time. They can be repeated and do not get bored of seeing the same over and over again. However, they are much harder to prepare as it’s easy to tell a human “click the button and see if the modal appears” but it’s hard to do the same with the computer. Automated tests are dumb — if they are bugged then they will never figure that out.

Now, there are different types of automated tests depending on what we measure and how. Typical categories include unit tests, integration tests, end to end tests, performance tests, comparison tests and more. All of them should have the following:

  • They should be fast — we don’t want to waste time and if something can be done faster then it should be. However, “fast” is a very generic term which doesn’t tell much. For some tests “fast” means “milliseconds”, for others it means “hours” and that’s still perfectly fine. We can’t say that “test should finish in seconds” because if depends on the type of the test
  • They should be reliable — we want to avoid spurious failures in test reports. And whey I say “test reports” I mean it — sometimes it’s perfectly fine to rerun the test couple times and just see if it passed at least once. Again, it’s all about the cost — you can make your test never fail spuriously but it may be way too expensive
  • They should be immune to changes — tests need to verify some specific characteristic of the application, not the way how it’s achieved. If a test breaks just because you reorder two lines of code which does not break any business outcome then the test is wrong
  • They should be simple to follow and reason about — while we can test any internal part of our application by going through the very entry point, we should be able to pin-point the issue quickly.

Having that in mind we can move on to mocking which is the source of way too many confusions.

Why mocking?

There are generally two schools of what should be mocked. First is London’s TDD which tells you to mock everything. The reason for that is you want to isolate things you test so your tests never fail because something changes in the outside world (outside as seen from the tested characteristic which may be just some other private function in the same class). This sounds appealing — your tests will be reliable and will not fail because something else changed. However, let’s revert back to the Principle Behind Testing — we’re always interested in the revenue. We don’t care if the test is green but the application doesn’t work.

So what should we do then? London’s TDD tells you that you need to test all interactions. So it’s not only you test your private internals but you also need to make sure they are called correctly. With this assumption we dare to say that if all elements are tested in isolation then everything is going to work. Now, I’m not going to dwell into philosophical schools of reductionism, holism, or emergentism, but the practice shows that this approach doesn’t work. If you test your things in isolation then you get issues during integration. All mock-based tests are green but the application doesn’t work. Have you ever tried to integrate with something based on contracts just to realize that things do not work and you get a Big-Bang-Integration issue? Well, that’s common.

The other school is Detroit’s TDD (or so-called classicists). They tell you to avoid mocks and just the effects. You don’t care if your private method was called twice with these parameters in this order — all you care about is that the final effect is as expected. And this is perfectly in line with our Principle Behind Testing — we care if we don’t lose the revenue.

But we need mocks, right? We have to isolate the database, right? No, we don’t. The reason why we want to isolate the database is not because its “the database” and “it needs to be mocked out”. Imagine that you can create all your databases from scratch, with correct schemas, with full-blown engines, in complete isolation from other tests, and in just a millisecond. This way you can just run your code against the real database (and test all ORM mapping features, all indexes performance issues, all syntax differences between multiple databases for your dynamically generated SQLs) and do it fast. Would you then ever bother with mocking the database? I hope you get the point.

We mock not because there is some law or principle telling you that you need to mock X. We mock because without mocks our tests would slow down significantly. Instead of taking milliseconds with in-memory database, they would be running for minutes and hours. That’s why we mock. However, our goal is to not mock at all and if one day we get to the point where we can create the full-blown database in a split second then we should abandon mocks once and for all.

What is a mock?

Now, a very tricky question. Many people will tell you that they don’t use mocks, they run their tests against a real database, and yet their tests finish in milliseconds. And they you’re like “waaat? how?”. And the answer is — they are just misleading you.

A mock is a non-production piece put in place of a real production component. When people talk about mocks they typically mean mocks like in Mockito library. Something you create with mockLibrary.mockForClass< YadaYada>(). That’s one type of a mock but not the only one. Any in-memory database, any test-container, any library pretending to be an S3 API, anything non-production is a mock.

Now you may argue — “okay, but my in-memory database allows me to test ORM transformations which I can’t do with (mockito-like) mocks”. And that’s true but it’s still a mock. Only a little more clever one. As long as you do understand that it’s a mock then it’s perfectly fine, but once you start pretending that it’s the real component then you start misleading others. It’s perfectly fine to use it but don’t boast around that you don’t use mocks because it’s not true. You do.

Ideally, we want to avoid mocks. We want to have a new infrastructure stack created in a split second so we can test all the layers of our application. Since most of the times it’s impossible, we need to use mocks. And by reasoning from previous paragraph — it’s better to use an in-memory database (because we want to avoid isolating parts as much as possible), but it’s still mocking which one day we drop, hopefully.

What’s a unit?

London’s TDD evangelist would tell you that it’s the smallest part you can test in isolation from the others. In other words, that’s most likely a private method in your class. While London’s TDD followers tend to make units as little as possible, we should go the opposite and make them as big as we can.

Good rule of thumb is — a unit is a piece of functionality as dictated by GRASP principles. Most likely it’s your public entry point to some module.

What about test pyramid?

The answer is — you shouldn’t care. Just write tests where you need them and don’t follow “obvious rules”.