Types and Programming Languages Part 14 – Pure functions vs impure ones

This is the fourteenth part of the Types and Programming Languages series. For your convenience you can find other parts in the table of contents in Part 1 — Do not return in finally

Pure functions — functions returning the same output for the same input, not relying on external state. Some love them. Some argue that all functions should be pure. Some languages fight impure functions and don’t even let you write such. Is it correct to say that pure functions are better?

The whole discussion about “which function is better” comes down to the reasoning about functions. Pure functions have lower range of fire, and this allows to prove hypotheses easier.

Pure functions take input parameters $I$ and produce output $O$ . Impure functions take input of the form $(I, S)$ where $I$ is the input parameters, and $S$ represents the external state like global variables. Impure functions also produce output $O$ , however, this time $O$ may also affect the $S$ (and the $I$ as well).

How does $S$ change the situation? We can argue that $S$ represents the state of the application which is an implicit input to the function. With that in mind, both pure and impure functions return the same output $O$ for the input they get (so if the input is the same, then the output is the same). However, pure functions use $(I, \emptyset)$ as an input while impure ones use $(I, S)$ . From that point of view, both functions are actually pure.

The difference is in reasoning. We can reason about $I$ much easier than about $S$ . That’s because to prove anything about $I$ we typically need to just analyze the call site and how $I$ was set. We don’t need to look “outside” of the function, we just take the call place and can draw conclusions. We can then continue with the reasoning, to the caller of the call site, and reason about its input. We can carry on up the call stack.

However, we can’t do this type of reasoning with $S$ . To prove anything about $S$ , not only we need to show how $S$ was set, but also that it wasn’t modified by anything outside of the call stack, like some “previous” code. In other words, to prove something about $I$ , we need to just show the example proving our hypothesis. However, to prove something about $S$ we need to show that there are no “counterarguments” (code changing the state in an unexpected way) disproving our hypothesis. The latter is typically much harder as we need to analyze much more code to show that. Also, finding such a code may be non-trivial, as it may be some reflection changing the state, which our IDE won’t show for us easily. Or the code may even not be accessible, like some dynamically loaded plugin which we don’t own.

However, as long as we can keep $I$ and $S$ small then everything else doesn’t matter. It’s not the point about the function being pure or not. It’s all about being able to reason about the input $(I, S)$ . This depends on your cognitive skills, experience, and proficiency in programming in general. Some functions (and some programmers) may be terrible with three parameters and no global state, whereas some other functions (and some other programmers) may be just right with no parameters and five global variables. That all depends.

What about mutability? It doesn’t matter in most cases. Let’s take an example of double.TryParse function. It is not pure according to Wikipedia, as it 1) mutates the input parameter, and 2) uses the global state (for decimal separator). However, is this function “wrong” or “bad” in any way? No, because its range of fire is typically very limited.

Digression: actually, all functions take one more parameter $E$ . The parameter represents the environment like memory used by the operating system, type of CPU, electrical interference, or even cosmic rays. The difference between $S$ and $E$ is in what we control. While we can control $S$ as it’s the state of the application we execute, we can’t control $E$ . You may indirectly affect the memory used by the operating system, but mot likely you won’t be able to change the cosmic ray.

Digression 2: there are situations where impure functions are definitely worse or even impossible to use. For instance, constexpr in C++ or things calculated during compilation. However, that’s not the point of the discussion above.