Decision making frameworks
In his 2015 shareholder letter, Jeff Bezos talks about two types of decisions - those that are best represented as one-way doors, and those that are more like two-way doors.
This thought process about how reversible a decision is applies anywhere in business, and it's no different when it comes to software development.
Bezos describes the difference himself quite succinctly in that shareholder letter:
Some decisions are consequential and irreversible or nearly irreversible – one-way doors – and these decisions must be made methodically, carefully, slowly, with great deliberation and consultation. If you walk through and don’t like what you see on the other side, you can’t get back to where you were before. We can call these Type 1 decisions. But most decisions aren’t like that – they are changeable, reversible – they’re two-way doors. If you’ve made a suboptimal Type 2 decision, you don’t have to live with the consequences for that long. You can reopen the door and go back through. Type 2 decisions can and should be made quickly by high judgment individuals or small groups.
Software engineering and design is all about decision-making. This includes small things like how you name a variable, or how many functions to create in order to implement some logic. It also includes big things like which public cloud to use for hosting your software, or whether to self-host. Things like which language and technologies you will use to write your application code, or which database to use are also in the category of big decisions.
Sometimes companies and engineering organizations mistakenly spend a lot of time on decisions where that investment in the decision-making process isn't merited. If the impact is small, or if you can easily change course on the decision later, then the person with the right authority and knowledge to make the decision should just make it and move on. Naming a variable is certainly one of these types of decisions - the impact of a poor variable name is fairly inconsequential, and easily fixed later.
Conversely, companies often don't spend enough time on decisions where the impact is large and the effort to reverse the decision is significant. For example, resume-driven development can influence teams to make decisions that are bad for the long-term health of the project and the company. If engineers constantly implement new libraries, frameworks, and technologies purely for the sake of following something new and shiny, it can lead to a very fractured approach where software becomes harder to maintain.
In the same shareholder letter from 2015, Bezos calls out the importance of treating the two types of decisions appropriately:
As organizations get larger, there seems to be a tendency to use the heavy-weight Type 1 decision-making process on most decisions, including many Type 2 decisions. The end result of this is slowness, unthoughtful risk aversion, failure to experiment sufficiently, and consequently diminished invention. We’ll have to figure out how to fight that tendency.
In other words, companies shouldn't spend a bunch of time on decisions that don't matter that much. If there isn't much impact, why worry about it? And if you can easily reverse the decision, why put a huge process around it?
At Corso, when we make decisions we take it a step further and actually optimize for creating as many two-way doors as we can. Most modern engineering best practices allow for flexibility and adaptability as software is built, and tooling and processes allowing teams to move quickly yet safely is almost always optimized for allowing those same teams to more easily reverse decisions made at various points.
While "reversibility" isn't a binary property, nor is "impact", we can visualize how much effort we should put into a decision as follows:
Any time a decision is easily reversed, let the right person make the call - and move on.
If a decision isn't easily reversed, but will have a low impact, take some time to be sure that the impact is low. Then, let the right person make the call and move on.
Only on the decisions that are actually hard to reverse do we need to spend time thinking about them and ensuring we make the right decision. If we spend inordinate amounts of time on those reversible decisions, it causes "slowness, unthoughtful risk aversion, failure to experiment sufficiently, and consequently diminished invention" in the words of Jeff Bezos.
So, how do you know what types of decisions are which?
In software engineering, few things are true one-way doors since almost anything can be changed with enough work. Some decisions will require so much work that we want to be careful though. Some of the factors that we consider when determining how close a decision is to a one-way door:
- How much skill overlap is there with expertise already on the team? If we need to make a change at some point, will it require hiring different people or requiring our existing team members to learn a completely foreign technology?
- How much time and effort is required to reverse a decision we make? Is it right-clicking and renaming all references in our IDE, or is it working with new vendors, learning a new stack, and re-writing a large amount of code?
- Are there any software licensing costs, or other direct expenses? Is a large financial commitment (up-front or otherwise) required?
There's really no hard-and-fast rules, only guidelines. Here's how Corso has approached different types of decisions to be made, and our framework for thinking about how "one-way" a decision might be:
- PaaS providers and services: We are careful to avoid vendor-specific offerings when we build our software. Our services are built on native Kubernetes, which is nearly identical from cloud to cloud. We leverage S3-compatible storage, which is a de facto standard available just about everywhere. We use vanilla Postgres with widely-available extensions. Because we treat each service as an independent decision, we have been able to stick with technologies that are universal enough that they are found in all major platforms. Would it be work to move to a different cloud? Sure - but it's definitely doable, and decisions made about our PaaS usage are always a two-way door.
- Code structure and architecture: How you organize your code is an important decision. Well-structured code lets you move faster, find bugs better, etc. In a modern IDE, restructuring your code with the built-in refactoring capabilities is straightforward. If you realize you need to improve your code structure in any way (and who doesn't occasionally do a little bit of refactoring?), doing so is easy enough. Our entire development stack has been engineered to make sure that these types of things are very much two-way doors.
- Database technologies: One of the hardest technologies to trade out in any stack is your database. Going from one RDBMS to another is possible, but not easy. Going from one style of database (say, a document database like MongoDB) to another one like PostgreSQL or MariaDB is also possible - but even less easy. We have likely spent more time and effort on our database decisions than anywhere else, simply because the decision is so hard to back away from. Our decision to focus on Postgres is likely the closest thing we have to a one-way door in our stack.
- Database schema changes: We have worked very hard to make sure that schema changes in our database are two-way doors. We still design any changes very carefully, but we have the ability to make updates with confidence knowing that migrations due to changes can be handled with a minimal amount of work when we need to iterate on a design.
- Programming language choices: Node is arguably the most widespread technology for writing web API's, with Typescript in place to create a great developer experience. Wether Node/Typescript is the "best" or not is a subject for those who care enough to argue about such things, and we are pragmatic enough to recognize that the large base of developers using this particular combination makes working in its ecosystem very straightforward. We also use Python for our data-related tasks, given that it is the closest there is to a lingua franca for those sorts of things. While these aren't one-way doors, it would be unreasonable to have to rewrite this amount of code to something else - so we choose very carefully before introducing a new language to the stack.
- Software architecture: This is a catch-all for a lot of different points. Our various front ends are all built using React with a rest backend, for example. This is in part because React is a mature, well-supported system with a large ecosystem and plenty of developer expertise available. Making a change here would be quite a bit of work, so we treated this like a one-way door. On the backend, we use a fairly monolithic approach to our architecture. However, we still keep systems compartmentalized with an eye to how we expect to split our various systems in the future as we outgrow a monolith. We are unlikely to ever fully embrace a true micro services architecture however, just because that decision is one that would be difficult to reverse.
As you think about this topic on your team or in your company, any approaches are likely to be more like guidelines than rules - but with a moderate amount of experience it's fairly easy to understand the directionality of a decision you make.
Every company is different and is working under different circumstances, but considering "how reversible is this decision?" can be helpful as a step in the decision making process. Making sure you spend the right time on decisions that are closer to a one-way door, and making sure you don't belabor decisions that are easily reversed are both ideals worth working towards.
Above all else, hire the best people possible and let them build. The best people make the best decisions quite naturally, all on their own.