Coding as Exploration

Posted 3 years, 4 months ago | Originally written on 5 Aug 2021

Consider this article as a brain dump. One day it will be refined...

"Software is not building. It's not construction. It's design. And design is like exploring a space of possibilities and what all the possibilities are, and you don't know how long it's going to take. It's like science. You don't know how long it's going to be until you find your solution. So, software projects are very, very different to schedule, because there are so many unknowns."
Joel Spolsky

It is rare that a lucid understanding of the domain is known prior to writing your first line of code. Indeed, this was the objective of the waterfall method: learn everything you can about what the end-user will need before writing any code and hope that those requirements will not change. This assumes that even the end-user fully understands the domain. But this is rarely true. It can be guaranteed that upon delivery the user will discover something that can be improved or the domain may even have changed in which case the whole enterprise has been a waste. Ideally, we would like to build only that which is currently required as we grow our understanding of the domain and the users in that domain (for it might be the case that users located elsewhere but working in the same domain may have vastly different usage patterns).

For this reason, development must always proceed cautiously with every step offering users the chance to be observed using the product as well as give feedback. Given that the nature of software development is discrete (data models are discretised descriptions of entities; features are countable improvements; runtimes involve non-overlapping instances of execution etc.) it is open to combinatorial explosion. This is what makes writing great software very hard because careful expensive attention must be devoted to ensuring that very precise entities function seamlessly with as few chances for unintended consequences as possible. Every new feature increases the possible interactions with other components. Good design attempts to minimise the emergence of such heterogeneity. Once a good fit has been discovered (yes, these have to be discovered) incremental improvements can proceed far more rapidly and every instance of a bug is an opportunity for refinement. In fact, my experience has been that whenever I have taken the time to be exacting about refining my code fixing bugs is easy because the understandability of the code is usually good and the fix leads to far better code than before.

Also consider:

  • Choice of language. Every expression in code is an opportunity for the language to demonstrate its usefulness to the task at hand. It might be that the syntax impedes the current task and that the language would have to be abandoned. Alternatively, you can use an expressive language for development then later, once the domain is well understood after the exploratory phase, you can switch to a production high performance language.
  • Domain scope. The scope of the domain that the software covers decides the completeness of the models it uses. If the scope is too narrow then there will be many ad hoc types that litter the design and seem out of place. On the other hand if the scope is too wide then it will be clear that a subset of types that cohere well with a few others tacked on loosely. The ideal is for the software to exactly cover the domain so that there are no extraneous types.
  • Managing expectations. It pays to manage users expectations. Users are spoilt by their access to highly polished irrelevant software and are usually ignorant of the effort that is required to produce them. These expectations have to be tamed through the incremental delivery process. Typically, the comparison should not be made between the current release and the polished irrelevant software but rather the current release and the manual process of accomplishing the same task. One of my good friends got into Python and now has a deep appreciation of the efficiencies afforded by automation. He recounted to me how it used to take him literally hours to perform a certain analysis. His first programs were clunky and messy but they cut the time to a fraction of what he used to take. Gradually, he improved his program to the point that he automated the entire pipeline and packaged his scripts for easy installation. He not only saved his time but that of his lab by hours. He still has a lot to learn but his eyes are now open to both truths: that automation is worth the effort but also that it takes a lot to achieve a polished application. Most people have no clue that the easy online tools we enjoy such as Google Mail and Facebook are the results of teams of thousands of highly trained software engineers working in some of the best working conditions possible under the guidance of exceptionally talented architects and overseen by highly-incentivised project managers. Once they understand this they will be far more appreciative of anything that substitutes the manual process for their particular tasks.

This idea of incremental delivery implies that software development must be exploratory: we are looking for the correct representation of the domain and the transactions it admits. This is the essence of the hack: it's dirty. In most cases, the hack is so ad hoc that it can only be used by the creator. Unfortunately, in a lot of cases (particularly in fields where data analysis is part and parcel of the job such as scientific research) the code never leaves this stage even though it is presented as complete. Without proceeding to the refactor step the work persists as a juvenile babble, incoherently articulating ideas it is half prepared to do. This is why adapting such code is usually a fools errand unlikely to amount to much. The refactor attempts to elicit the implicit data models that need to be made explicit while highlighting the main transactions that are performed between these data models. It pays to have this clear in the mind of all participants.

Our perspective is naturally limited by our immediate terrain: the view from a valley is different to that from a summit. Even the highest summit only allows a view as far as the horizon. This is the principle at work with exploration: every iteration provides a foundation upon which to begin the next exploration.

If you are faithful to this task then you open yourself up to serendipity: unexpected pleasant surprises. The more you refactor your code towards domain faithfulness the more your code will automatically lend itself to the domain in ways you hadn't even anticipated. I experience this on a project I've been working on for a year. I discovered a way to use a feature that I had written without that application in mind. It was the best feeling in the world.