The Three Fundamental Principles of Software Projects
We Can’t Control Time (But We Can Control Scope)
It can be very hard to know precisely, sometimes even roughly, how long it will take to solve a particular problem using the software engineering method. At the same time, all the stakeholders, including the software engineer, want to know precisely how long it will take. Moreover, it can be very hard to assess progress accurately once the project is already on its way to complicate things further.
In this light, it is helpful to think about software projects from two aspects. One aspect is time, and the other aspect is scope.
The aspect of time is not something we can directly control.
The aspect of scope is something we can directly control.
Because we can’t control time, but we can control scope, the first principle is that we must focus on controlling the scope.
You might argue that we can control time by adding more people to the software project. Here the simple counter-argument is Brook’s law which states:
“Adding manpower to late software project makes it later”.
A slightly more elaborate version of the counter-argument here is that software projects tend to have a high cost of partitioning. We can understand the concept of partitioning (work) through strawberry fields.
Imagine a field of strawberries that is the size of a football field. Let’s say one person (picker) can pick 2,000 strawberries per day. Two pickers can pick 4,000 strawberries per day, and so on. Up to a point, for every new person working in the field, you get 2000 strawberries per day more. As the number of people in the field increases, the pickers start slowing each other down. As you keep adding more pickers, the congestion on the field increases, and the strawberries per day per picker declines. If you don’t stop adding pickers, the field will become so congested that pickers can no longer pick strawberries on it. Moving has become impossible.
Between these two states — the minimum and total collapse — there is the partitional optimum, a state where there is the correct number of people in the field. The correct number is the one that allows achieving 2,000 strawberries per day per picker and allows emptying the field from strawberries in the shortest possible calendar time.
The nuance here is that the partitional optimum cannot be improved by means of communication. This is to say that by means of communication, people in the field can not organize in a way that makes it possible to add more people to the field without adversely affecting the strawberries per day per person.
For every task, there is a number— partitional optimum — which indicates how many people can work on the task without adverse effects on the throughput per person.
Strawberry field operations are an example of highly partitionable tasks. Car mechanics is an example of poorly partitionable tasks. Software engineering tends to be closer to car mechanics than strawberry picking.
Communication as Cost
At this point, it will be helpful to introduce two new concepts; intra-work communication and extra-work communication.
Intra-work communication protocols are those that are a by-product of the actual work. For example, in software engineering, we have daily code commits. Extra-work communication protocols are those that are there to compensate for partitioning. Whereas intra-work communication protocols tend to have no time cost or very low time cost, extra-work communication protocols always have a time cost, often a high one. A particularly costly communication pattern—communication about extra-work communication — can have devastating effects on software engineering projects.
While we can’t make people pick strawberries faster by introducing communication protocols, we can make them pick strawberries a lot slower as we keep introducing more communication protocols.
We tend to think of communication as a value, but actually, it is a cost.
Let’s return to the strawberry field to explore the proposition of communication as cost.
Let’s say we are now approaching the number of people on the field where they can focus on picking strawberries without communicating about it at all. Each day they have eight hours of uninterrupted time for picking strawberries, just enough to pick 2000 strawberries on that day. We now add enough new people to cause a situation where pickers can’t avoid collisions without communication. We then introduce an intra-work communication protocol that takes 15 minutes per day from each picker. So instead of having eight hours for picking, the pickers now have 15 minutes less. This means every picker will pick 63 strawberries less per day. The cost of communication, in this case, is 3.25% of absolute throughput.
The first-order cost of communication should be understood — and measured — as a decline in throughput.
As we keep adding more people, we have to add more communication to compensate for more people in the field. We compensate by making communication more complex because we can’t just endlessly keep adding more communication. We start to introduce extra-work communication protocols. This presents a new problem; as communication increases in complexity — and deviates pickers from the actual work — misunderstandings and other communication issues become a new cause for collisions. Where in the first place, we started to introduce communication for the sole purpose of avoiding collisions, communication is now causing them.
The second-order cost of communication is collisions caused by extra-work communication protocols. The collisions have a direct negative effect on throughput.
Because we can’t anymore reduce communication or make it simpler, and we can’t keep on making it more complex either, we start adding layers to it. There is the planning of communication, coordination of communication, and actual communication. With all this communication, there is now the need to assess communication. This pattern leads to highly ritualized forms of communication, and the communication ends up far distances from the actual work.
Whereas initially, the communication was a byproduct of the activity of picking strawberries, for example, one picker communicating their intended direction to another picker to avoid a collision, the communication is now increasingly focused on communication. Communication about the planning of communication, communication about coordination of communication, and communication about actual communication. Then followed by communication about the assessment of communication which inevitably leads to more communication. A small fraction of all this communication is about the actual thing, which here is picking strawberries.
This kind of highly ritualized communication is the third-order cost of communication. It has devastating effects on the experience pickers have with their job. It’s hard to overstate its adverse effects on throughput.
Cost of communication is our next, and for now last, first principle. It means that while communication can result in some value, by itself it is a cost.
The Three Fundamental Principles
- Because we can’t control time, but we can control scope, we must focus on controlling the scope
- Every project has its partitional optimum, which defines how many people can perform work without collision
- While communication can result in value by itself, it is a cost