This is the twentieth first part of the Types and Programming Languages series. For your convenience you can find other parts in the table of contents in Part 1 — Do not return in finally
We often hear that code is written once but read many times. Due to that, we favor code “quality”, “clarity”, or “readability” in other to make our lives easier. Unfortunately, it’s not that simple and optimizing for “code being read many times” is not what we really need in business line applications (and most of the software) because we change the hats that we wear.
Table of Contents
Why do we read code
It’s true that we read code many times. But why do we read code? It’s not like we do that for the sake of reading. We don’t read it from cover to cover and we don’t try to memorize it. So why do we read code?
Code is a representation of concepts, processes, and standard operating procedures. Code is not the crucial part or the main citizen, but it’s just a medium that we use to encode something else. Therefore, we read the code to understand the concept behind the code. But again, why do we do that?
We want to understand the concept because we either want to change it, or we need to troubleshoot it.
Changing the concept (= changing the code) requires us to understand many moving parts, how they interoperate, how they deal with data consistency, concurrency, side effects, and many other things. Changing the code forces us to think about the future, foresee challenges, predict future changes to the code and how it’s going to evolve. When thinking about the flow, we start “somewhere” and think where we can get to from that place. We take the starting point and think about many possible evolutions.
Troubleshooting the code is much different. When troubleshooting, we think about the past. We already know something happened. We already know our assumptions were wrong, we missed some edge case, or we know for a fact something has happened. When troubleshooting the code we much more often trace back. We take the end situation and want to trace its past to understand “how we got here”.
Practically speaking, this means that when we want to change the code, we often start from the top and go from there. We start with an API entrypoint, a facade, a main object that triggers the processing. When we troubleshoot, we often start from the very nested piece of code that caused a side effect that we observed somewhere else. The side effect that we deem invalid, and which we want to troubleshoot.
But why do we want to change the code or troubleshoot it? Now it becomes tricky, as each audience have different reasons. Depending on which hat we’re wearing at the moment, we’ll have different reasons to look into the code.
Not every line of code serves the same purpose
We need to understand that every line of code has a purpose, but not every line of code serves the same purpose. Even lines standing next to each other may have completely different purpose. Let’s see a sample pseudocode:
|
1 2 3 4 5 6 7 8 9 10 |
def foo(param1: Type1, param2: Type2){ recordEnteringMethod(foo, param1, param2); log(param1, param2); throwIfInvalid(param1, param2); let result = triggerBusinessProcessing(param1, param2); logDebuggingMessage(result); modifyExternalSystems(result); updateUi(result); recordExitingMethod(foo, param1, param2, result); } |
This sample shows a typical business code. At first, we may say that this code serves a business purpose – it calculates something and shows it to the user. It’s probably a facade hiding some complexity. However, things look different when we break it down line by line.
Metrics
First, we recordEnteringMethod. This is purely for monitoring and observability perspective. End users don’t care about these metrics, developers rarely care about that either, but ops team is very interested in this piece. This line of code is for “observability”. Similarly, the last line recordExitingMethod provides metric data points or traces.
These two lines repeat in many places, and we (developers) often consider them a noise. That’s why we try to hide them with Aspect Oriented Programming, dynamic code modifications, attributes/annotations, and other trickery that simply hides these lines from the code base.
Logging
Next, we log. Here, we want to call log.info or log.debug, depending on our environment.
We want to put a message in our logs that will help us troubleshoot what happened. We may need these logs for two things: first, to configure observability like exception monitoring or anomaly analysis; second, to troubleshoot what happened
The former case is every similar to recording metrics. We want to capture in a structured way what happened. We do this to support the latter, so we can travel the time and trace back what happened when we troubleshoot the code. End users don’t care about these logs. Ops team is partially interested. This line is however super important for the developers providing support.
Contracts
Moving on, we throwIfInvalid. This line serves many purposes.
First, it may be helpful for the caller to show them that they messed up. They broke the contract, didn’t adhere to the requirements, didn’t meet the preconditions, etc. This line of code supports the “unhappy path”. To put it differently, this line of code only slows us down on a “happy path”. In a perfect world, this line would never be needed as we would never end up in an invalid state (that’s an oversimplification, I know).
Second, this line protects our code from running into an even worse state. If we already know something is wrong, it’s generally good to terminate and crash. Otherwise, we risk breaking the data even more and causing irreversible damages.
End users don’t care about this line of code. They do care about having their data consistent, though. However, the data should also be protected in other places, like in the UI.
Ops teams don’t care much about this line. As long as the software “works” as in “executes all the lines successfully”, ops teams are “okay” with breaking the data consistency and other bugs. Obviously, this may later turn into metric spikes and alerts going off, so the ops teams are indirectly interested in making sure the data is not broken.
Last but not least, it’s developers who care about this line. However, it’s often not “us”. It’s “these other developers” that called our code. It’s our callers that are interested in this line. Our code could just go and try to apply the (invalid) changes the caller wants, but we don’t want that. However, the exception will most likely be visible in the caller’s space and will ping the caller’s ops team.
Business code
Now, we triggerBusinessProcessing. This is the place that all parties are interested in, but in a very different way.
End users are obviously interested in getting things done. They don’t care about our code structure or how we do things internally. They are interested in the side effects, though. Therefore, this line is not interesting for them yet. They ultimately want us to modifyExternalSystems and updateUi. However, even this is misleading. End users don’t care about our databases or the state we preserve, because they don’t interact with them. They care about what they see, so from their perspective the UI is the ultimate source of truth. So the end users care the most about their UIs, not the backends.
Ops teams care slightly less about how we calculate the business result (as long as it doesn’t hammer the CPU and memory), but they are more interested in how we modifyExternalSystems. They care about performance, so they want to make sure we don’t trigger a cascade of crazy modifications that will bring the whole system down. Most importantly, ops teams often see the very very end effect of our actions. They will see that the database CPU spikes, but they don’t understand that it’s because we changed the SELECT query to extract another column which resulted in table scan instead of using an index.
Finally, developers care about triggerBusinessProcessing a lot. They need to make sure the data is consistent and the results match the business documentation. This time, it’s “us” who care. It’s not “these other developers calling our code”.
What we should optimize for
It’s not enough to say that we should optimize for readability because it’s not the same for everyone. Let’s see what we should optimize for then.
Searchability
To read the code, we need to find it. So first, the code must be optimized for searchability. But does it?
When trying to change the code, we need to understand many moving parts. We have enough time to read through it, probably run it locally, or even step through with debuggers. We have IDEs, AI, static analysis, debuggers, web proxies, and other tools helping us to build the comprehensive picture of everything involved. Most importantly, we can start from the top.
Things are different when we troubleshoot. When it’s 4AM on Saturday morning and we are paged by the monitoring system, we need to act fast and avoid wasting time on false positives or dead ends. We know “something happened”, we see the “side effects” like metric spikes, exceptions, weird logs. We don’t have time to build the big picture. Most importantly, we want to find the bottom where it all manifests problems.
How can we find this “bottom”? Here is where the code should be optimized for searchability. As we saw in the previous section, every line serves a different purpose, therefore every line must be optimized differently. Lines like triggerBusinessProcessing are probably not the ones that we’ll see in external systems. However, throwIfInvalid or updateUi will manifest themselves outside of our code and will serve as the starting point.
To make it more specific, we’ll typically start looking from the following:
- A particular message that appeared somewhere in the UI, in the logs, or in the data entity
- A metric name that we observed in monitoring systems
- A static UI element, like a label or name
- An event name that we observe in queuing systems
- An endpoint name that we see in the browser’s dev tools
There are many more things that we may start with. They all share the same characteristic, though – we know something for a fact, and now we want to find where this “fact” emerged from. Therefore, we want to optimize for static code analysis and no false positives as much as possible:
- When emitting metrics, do not concatenate their names dynamically. People will search the codebase for your “full.metric.name.emitted”, so it’s better to have the code use this string in this exact form and in one place only
- Similarly, when creating UI elements, avoid concatenation where possible
- Keep your translated elements in one place. People will not look for your variables’ names. They will look for translated messages in their local language that you can’t speak
- Do not reuse the same message in two distinct places. People may get lost as they don’t know which place they are looking for
- Have “distinctive” names as much as possible to simplify searchability. “Name” is harder to find than the “Customer Name” which is still harder than the “Customer Name in Local Branch”. The more distinctive the element is, the better
- Emit a particular metric from one place only and make it carry just one scenario. Metric like
successis useless as it doesn’t indicate which operation it refers to. Metric likebusinessProcessA.successis better. You can also use dimensions or tags - Merge your success and error metrics into one. You don’t want to have
businessProcessA.successandbusinessProcessA.errorbecause the ops teams will not want to have two charts on their dashboard. All they are interested in “is it working”. Therefore, have only thebusinessProcessA.errormetric and emit non-zero value when things go wrong - Emit zeros on success. Similarly, when things are correct, emit zero to “businessProcessA.error” metric. This makes sure the data points are always there, so the ops team can validate they don’t have a typo in their dashboard configuration. What’s more, they can now configure alerts for missing data points scenario
- Have predictable names that adhere to the convention. Do not use rare synonyms or uncommon naming schemes, even when the common practices “make no sense” or are “wrong”. The world is not perfect and sometimes we just need to follow the crowd doing inefficient things in order to help the crowd achieve what they need
And here is my hot take: these suggestions may result in less clean code which is okay! For instance:
- We may get code duplication because we repeat the metric prefix in many places. Yes, this is annoying when writing the code and we need to find ways to avoid that (by generating code, using some readable string interpolation, etc.), but metrics are optimized for the ops team which searches the code at 4AM on Saturday. We do have time to update the code in many places. They don’t have time to understand clever ways of removing duplication at night.
- Methods with “weird parameters”. For instance, you may have a method
emitMetric(bool success, Exception? exception). You may be tempted to have two methods instead:emitSuccess()andemitException(Exception exception)but that will result in emitting the metric in two places. There are ways to deal with that, for instance you can wrap the parameters into an object that validates the scenario and prevents situations likeemitMetric(true, new Exception()), or you can deal with these in other ways depending on your programming language. However, we again want to help the ops team to find the source of the metric at night. - We may get duplication in our translation files. “Activity Report” may be used in many screens, but we should still have distinct entries in i18n files that will have the same value. Yes, it’s harder to update these translations when changing the code, but it helps people to find the place they are looking for. However, UI elements are less often searched for at night, so here we may look for a better balance between removing code duplication and helping searchability
- We may need to put foreign names in our code base. We typically write code in English, but it’s sometimes good to use foreign names if they are very visible in other systems (like in UI, metrics, or in business documentation). For instance, many years ago there was an insurance company that wanted to offer a new product. This product was then promoted by a commercial starring an actor (let’s call him Wiktor) that everyone knew from a popular movie. The business team often referred to this product as “Ubezpieczenie Wiktorowe” which means “Wiktor’s insurance”. Wiktor had no idea at all that the whole company was using his name when talking about a business product. Now, would you rather look for a method named
specialStartingDiscountOfferHandlerorwiktorsInsuranceOfferHandleror evenubezpieczenieWiktoroweHandler? I can tell you the last version was pretty effective
Even though the code is less “clean”, it lets us get the job done faster. It may be easy for you to find the code when you know the code base, but the ops team may struggle a lot at 4AM. It’s up to you if you want to be right or have a good night sleep.
Easy navigation
Next, we should optimize our for easy navigation. Once we find the code line that we are interested in, we most likely need to follow the code either to the places it calls or to the places that call this line.
Things are easy when we are using IDEs. They are very good in deciphering object types, method overloads, polymorphism, implicit parameters, defaults, and so on. Unfortunately, things are much harder when we are not using any IDE, but a basic text editor or a web browser.
The reality is that we browse the code much more often in a web browser than in an IDE. Sure, you have your project checked out locally and can spin up your IDE in seconds. Can your ops team do the same? Can you do that for projects from your neighbour teams? Can you do that easily when navigating between many layers of code across different projects and repositories? Sorry, no way.
People often browse the code using their web browsers. They don’t download the repository locally. They use GitHub’s search engine, in-house code explorers, or even Google or AI. These solutions don’t support any “go to definition” or “find all references”. People need to use basic exact match search or some full-text search that is often flaky when applied to source code. Not to mention that we have dynamic languages that are even harder to traverse and navigate.
Therefore, we should help using basic tool, for instance:
- Do not overuse
varand other type inference. The type should be clear when reading the code in notepad - Be careful with implicit parameters, overloads, and fancy inheritance hierarchies
- Be mindful when using polymorphism
- Have distinctive method names so they can be easily found when looking for a full method name using basic exact term search
- Prefer named types over unnamed tuples, as the named types can be easier to follow
- Avoid deep call chains that are hard to navigate
Again, these rules may result in the code being less “clean”. And again, this doesn’t apply to every single line of code. Think about what hat you are wearing and who would benefit from the particular piece of code.
Predictability
If you ever read Thinking Fast and Slow, you know that people tend to think fast to make their lives easier. We all do that. We follow stereotypes, patterns, routines, and avoid thinking as much as possible.
Your code should be optimized for that. Yes, we may complain that people are unwise or lazy, but think to yourself if you prefer to be paged at 4AM or if you prefer your ops team to deal with the issue on their own without calling you. Therefore,
- Follow common practices for naming, organizing the code structure, implementing interfaces, or creating overloads. If that results in code duplication or some “imperfect” solutions – so be it
- Use common patterns. People recognize adapters and factories much easier than double dispatch. The latter will make your code cleaner and smarter, the former will let you have a good night sleep
- Make the code predictable. Have methods like
EmitXwhereXis the metric name. This will let the search engine find it faster, and will also make readers more confident that they are reading the right part of code - Follow the industry. Many things are “incorrect” because reasons (history, compatibility, typos, etc.). Just follow them because this is what people are used to
Again, you can often make your code much smarter at the cost of predictability. Think if it’s worth it.
Code is just a tool
Last but not least, remember that code is just a tool. Don’t think about the code, think about the purpose.
I already mentioned this, but let me reiterate. Your code serves different purposes and is read differently by various people. Fellow developers are interested in how to call your code properly, so you build facades. Support teams are interested in finding metrics or UI elements. You are interested in maintaining the business logic over many years. Your code must be readable for all these groups, so you have to optimize it for these groups separately. Or you may be the “know it all” that needs to be paged at 4AM because nobody else can figure out how to troubleshoot.
This is much bigger than just the ops team. Your code will be read by people from other companies (like when you do open source or when you involve an independent consultant). It’s in your best interest to make them understand the code faster. It’s up to you if you make the code readable for them, or if you pay more to your consultants, or if you deal with more support cases.
This also makes you a “go to person”. Others prefer to ask you for help instead of figuring out things on their own because asking you is just faster. Newcomers waste tons of time on running into dead ends because they can’t navigate the code the same way that you can. It’s up to you if you prefer to be the documentation or if you want to make people independent. The latter requires lowering the bar sometimes.
Also, not everyone is tech savvy. We need to speak with product managers, business experts, end users. It’s much easier to communicate with them when we have the same representation of concepts. Parity between code and documentation is great for reducing this cognitive barrier. If your business analysts document a process, then it’s easier to read the code if it has exactly the same structure as things in the documentation. This goes even further and is called Conway’s law. We want to align our code with the communication paths and communication terms (ever heard about the ubiquitous language?) rather than the engineering practices.
The ultimate takeaway is: we don’t read the code for the sake of reading. We read the code to understand concepts. We want to understand them to be able to change the future, or to troubleshoot the past. These things are distinct and should be optimized for differently. There is no single “readable code” because readability depends on the purpose.