The Essence of Testing

Posted 4 years, 8 months ago | Originally written on 13 Nov 2020

“Code without tests is broken by design.”

Jacob Kaplan-Moss

I love that quote. It captures it all. Lacking tests is not just bad practice but effectively bad design. Bad design means that something is wrong with the product. A badly designed umbrella is not fit for purpose and could even cause injury (or worse, death). Developers really need to take this to heart. You need tests. Perhaps one day someone will invent a language which has tests built into the language i.e. no code runs without tests.

Nevertheless, if you are not persuaded by the above I want to try a different approach. While lacking tests might be improper due to design principles I also want to demonstrate that it is improper on logical principles. So let's get cracking.

Testing enforces the rigor of the scientific method on code. It is the closest we can get to proof that code works. The scientific method works on the basis of evidence, both affirmative and negative, that some claim is the case. This is in contract to mathematical and logical proof which are certain and fixed. In science evidence is availed for the mind to weigh and determine which direction to lean to. This is why you can have different scientists coming to different conclusions when presented with the same data, analyses and resulting evidence. In contrast, there is no controversy among mathematicians: once something is proved that's the end of the discussion.

Amphibian

Software behaves partly like mathematics/logic and partly like a scientific experiment. Obviously, the logical parts of software will be subject to the rigor of proofs but the 'softer' parts such as the UI, working with files, orchestrating processes etc. will be harder to pin down. It's these parts that require more attention.

When a scientist designs an experiment they are creating a set of conditions by which they can make as definitive a claim to evidence as possible. This is usually not easy because the conditions by which one can make a definitive claim require very stringent controls, which are very hard to achieve. For example, to show that a drug works, scientists perform multiple rounds of randomised control experiments, known as randomised control trials, so as to exclude subjective reports. Even the manner in which the data in analysed is tightly controlled for the results to be accepted.

Furthermore, experimental testing is bound by the idea of falsifiability. Falsifiability means that something can only be shown to be 'true' if it can shown to be 'false'. By 'true/false' here I don't mean logical values but rather something more like 'strong evidence in favour of' and 'weak evidence in favour of', respectively. If something cannot be proven false then it cannot be proven true. This amounts to a bias towards evidence against rather than evidence for. For example, if one makes the claim 'all swans are white' then upsetting this claim only requires a single non-white swan to be found.

These two ideas (the idea of evidence and the idea of falsifiabilty) can then help us to approach software testing in a way that should give us an impetus to write tests.

Give Us Evidence

It is the burden of the software author to show that the code works to requirements. They have to provide the evidence that the code works. Therefore, the right mindset is for the author to try and break their code. They should dream up practical scenarios (not fictitious ones) which explore the various ways to break their code.

Even if the code passes all tests it doesn't mean it is perfect. The idea of falsifiability applies that all one needs is one test to fail to take us back to the drawing board. It should now be evident what's at stake: seemingly perfect code can be discredited by a single failed test. In practice this ellicits a framework in which test failures are welcome. You want tests to fail as much as possible in development. That's good because every failed test invites inspection and questioning of whether the code can be written better to avoid the fail.

One notorious approach that's become popular is the use of vanity metrics such as code coverage. It is employed in thousands of projects. Once you start using code coverage you soon get drawn away from the task of testing to the visibility of testing. You strive to get to 100% code coverage at all costs and forget that even at such a coverage the code might still teem with simple bugs.

The converse is not problematic. If you focus on breaking your code and accepting the burden of proof then you will automatically achieve multiple rounds of 100% coverage.

quick links

Looking for a Python tutor? Read about my philosophy or contact me directly.