Designing for Failure

Posted 10 months ago | Originally written on 6 Jul 2023

By far, the biggest way to design for failure is in the social dynamics of the team. Broken teams, where personal egos are an essential production component, are by far the fastest way to crash. It is painful to be in a team where the structure of the team precludes the prevalence of good ideas and the persistence of bad ones.

Yet, it's not hard to fix. Every team needs to take to heart Patric Lencioni's book The Five Dysfunctions of a Team. In his book, which is a remarkable fictional tale of how a wise leader was able to reinvigorate a management team to function better, he cites the following problems:

  1. Absence of trust - doubts about the integrity of the team;
  2. Fear of conflict - preferring to let lingering doubts persist rather than engaging;
  3. Lack of commitment - not fully engaging;
  4. Avoidance of team accountability - turning a blind eye on poor performance;
  5. Inattention to team objectives - not caring about what the team should be focused on.

If I was to design a team I would put so much emphasis on making sure that I get 100% engagement by creating the environment that all can speak up. No sacred cows - not even myself. The highest priority is on good ideas and those tend to emerge from frontline team members.

Also, job titles and fixed job roles are the bane of team destruction. Teams should be fluid. Everyone should be available - subject to their interest in learning - to take on any task. Work should be organised around goals, at most one at any given time. Work should be organised so that there are work queues linked in such a way that transition from each queue to another to lead to operationalisation of the task. Let me outline this in a bit more detail.

Consider three queues: design, engineering and production.

Tasks in the design queue are raw unqualified tasks. Every design task results in one or more engineering tasks. To do this requires that a system design capturing the needs of the target users as well as a project design on how the system design will be produced. This will necessitate numerous hours of user studies to deeply understand the design problem.

The system design paints broad strokes about what should be delivered. For example, one design requirement could be that the upper limit of processing time for a certain input is one second. It is the responsiblity of the engineering queue to discover how to deliver on these requirements in such a way that the solution, usually rough and ready, is also fully tested. This may involve writing benchmarking tests, using an alternative performance language and so on - tasks which require more than ordinary development skill. Once all system design tasks that require engineering input have received this, the tasks may now move on to the production queue.

The production queue is like any assembly line. It involves operational tasks - well-defined, standard, routine tasks that don't require special engineering insight, as well as final testing to verify that the final product works to specifications. Nearly every aspect of production should rely on some form of automation to make it as streamlined as possible. The standard nature of the work is what makes this queue open to any team member and adding more team members can greatly enhance the pace of delivery.