Types and Programming Languages Part 10 – Dogmatic TDD

This is the tenth part of the Types and Programming Languages series. For your convenience you can find other parts in the table of contents in Part 1 — Do not return in finally

Today a short post regarding Test-driven development (TDD).

How to apply TDD

A typical TDD purist would say that we need to write “one line of test, followed by one line of production code” and that we cannot write any new functionality in the Refactor phase of RGR where “new functionality” means any changes to the logic. These two statements seem valid but they are not correct.

First, one line of test and one line of production. TDD indeed encourages you to write minimal amount of code to make the test fail or pass. However, this is about a “conceptual amount” and shouldn’t be measured in lines of code. Sometimes one line of code is okay, sometimes ten of them is good as well. What’s more, you’re not required to run your tests after you write each and every line, no. You can write “a bit of a code” and only then run your tests.

There is a merit to write more code before running your tests. Let’s take this hypothetical experiment just to show the reasoning. Imagine you have two perfect developers who never make any coding mistake. The former of them runs tests after each and every line, the latter runs them only after he writes a bit of non-trivial code (either some difficult code or a significant amount of trivial changes). Since the latter is perfect, he runs tests only at the very end because all the code he writes is trivial to him. Which of these two developers is better?

The answer is: the latter because he finishes his work faster. The former spends some time on running tests but ends up with exactly the same code as the latter (as they are both perfect). So we conclude the latter is better.

Now, to make this example a little more realistic. It’s not about being a “perfect developer” in absolutely every case. It’s about splitting your work into pieces which are trivial enough for you that you are perfect and can write them in one go with no mistakes made. This depends on the knowledge, experience, skills, and the project. In some projects this will be literally a “one line of code” because everything is so coupled that you need to be very careful. For some other projects it can be even a hundred lines of code. For some of us this will be a “hello world” code only whereas for others it’ll be a “fizz-buzz” or something more complex. Some people consider a simple for loop hard, some others recognize patterns much faster and a simple for loop is, well, simple.

And that’s not about comparing developers or pointing fingers. It’s about realizing that if you can’t write (for example) fizz-buzz in one go with no mistakes then you can make two actions – you either introduce a process capturing errors as early as possible or you develop your skills so that you can do fizz-buzz with no problems. So don’t think that you’re unprofessional if you don’t run your tests after each line of code — it’s exactly the opposite. If you can write a bigger amount of code and make no mistakes then you’re just a more proficient programmer. Now, the trick is in finding the balance because you’ll always make mistakes, no matter what, even if you are sure that the code is infallible. When you get more experienced then more things are trivial and you need to execute tests less often.

You can see similar thing in Martin’s blog. He says “One line of test, followed by one line of production code, around, and around and around” but then in his video you can see this:

  • Around 31:30 he extracts variable and doesn’t run the tests — even though he should, right? He runs them only after adding two lines and modifying the third
  • In 33:50 he adds four lines in total and modifies one
  • In 39:20 he even copies the loop and fixes it mechanically!

Is that “one line of test, followed by one line of production code?”. No, it’s not. Especially in the third example, you could argue that the minimal amount of code he could write was the if statement to handle factorsOf(9) explicitly and then add the loop in Refactor step (although, that would be changing the business code a bit and not pure refactoring so you should probably add more tests earlier).

Regarding refactoring: he says “So the programmers extract a few methods, rename a few others, and generally clean things up. This activity will have little or no effect on the tests”. Did you spot it? “Little or no effect” suggests there may be an effect when you refactor the code. This means that when you are in the Refactoring phase then you are allowed to tweak your tests a bit. If you follow “one line + one line” approach dogmatically then (irony mode on) shouldn’t you have tests for your tests before modifying tests?

Just don’t be dogmatic.

Detroit or London?

Detroit. I already covered that in previous parts.

However, multiple TDD supporters promoted London approach. Interestingly enough, even Robert C. Martin says “If the structure of the tests follows the structure of the production code, then the tests are inextricably coupled to the production code – and they follow the sinister red picture on the left! It, frankly, took me many years to realize this. If you look at the structure of FitNesse, which we began writing in 2001, you will see a strong one-to-one correspondence between the test classes and the production code classes.”

So don’t assume there is just “one TDD” as people changed their approaches over the years.

Does TDD harm architecture?

Depends how you apply it. It does not if you mix top-down with bottom-up approach. It does if you go with bottom-up only.

Software engineering cannot rely on reductionism. There are multiple empirical examples showing that when we integrate multiple parts then new complexity is introduced. It’s called accidental complexity. That’s called emergentism.

If you now build your application from bottom only and don’t think about the big picture early enough then you won’t be able to refactor from one design to another (assuming you’re following RGR). Read this conversation:

“I remember when I was talking with Kent once, about in the early days when he was proposing TDD, and this was in the sense of YAGNI and doing the simplest thing that could possibly work, and he says: “Ok. Let’s make a bank account, a savings account.” What’s a savings account? It’s a number and you can add to the number and you can subtract from the number. So what a saving account is, is a calculator. Let’s make a calculator, and we can show that you can add to the balance and subtract from the balance. That’s the simplest thing that could possibly work, everything else is an evolution of that.

If you do a real banking system, a savings account is not even an object and you are not going to refactor your way to the right architecture from that one. What a savings account is, is a process that does an iteration over an audit trail of database transactions, of deposits and interest gatherings and other shifts of the money. It’s not like the savings account is some money sitting on the shelf on a bank somewhere, even though that is the user perspective, and you’ve just got to know that there are these relatively intricate structures in the foundations of a banking system to support the tax people and the actuaries and all these other folks, that you can’t get to in an incremental way. Well, you can, because of course the banking industry has come to this after 40 years. You want to give yourself 40 years? It’s not agile.”

Robert C. Martin says in his article:
“The idea that the high level design and architecture of a system emerge from TDD is, frankly, absurd. Before you begin to code any software project, you need to have some architectural vision in place.”

However, if you’re a TDD purist and go bottom-up then you can’t design high-level integration interfaces because they’re a production code. And obviously, you can’t write production code without tests, so you need to write tests first. And if you follow “one line + one line” approach then you end up with a situation that you write quite a lot of code to design interfaces, data structures, and all contracts without having a single line of code actually doing the work (welcome to the world of mocks).

Again, don’t be dogmatic. TDD is supposed to help you build better software, but it doesn’t mean you have to write literally one line of code or that you can’t do some design upfront without tests.

Does TDD guarantee you a high quality of tests?

No, it doesn’t guarantee. If you want to have guarantees then you need to go with formal systems like TLA+ or Coq. If that’s too much then you can use fuzzing or mutation testing. Still, this doesn’t guarantee you anything.

Does TDD increase quality of tests? Or more precisely: is it better to write tests with TDD approach over test-first or code-first? I don’t know. I’ve seen code with good quality tests written in either way, and I don’t have any convincing data points showing that TDD is better, worse, or even statistically different when it comes to the quality of tests. Typically, arguments for TDD are based on the time to market reduction in longer run or better code maintenance because the refactoring happens more often. However, it doesn’t mean that tests are better per se.