Tuesday, 10 August 2021

First Time Right by Failing Fast

A leader buzzword is "First Time Right" at the moment. I hear it all the time. But I wonder if people really understand the implications of this. FTR means that when something is published to the target audience it will deploy into production without errors. It actually sounds to good to be true. In all honesty I have never seen a 100% success record of things going into production. So there might be more than just whishing everything will go flawlessly.

There is no try

FTR means that when a thing is put into production, it won't go wrong. Deployment into the real world is full of risks and we can mitigate them to a certain level. No one can guarantee FTR. But we can mitigate the risks in changing the way we make things. When we think about FTR we can see that it should be more than just prayers. I can still see the picture of a former project manager demanding FTR without putting in resources to make it happen. The words of the project managers where literally: "Just try harder to do it right next time". Which reminds me of Yoda's famous words: "Do or do not. There is no try".

We cannot try harder. We can improve our work methods to limit the chance of fails but it is impossible to add another 100 pounds of try. So we cannot try harder, but we can try smarter. Building up to FTR is possible if we focus on the process towards production. We need to fail before we go to production. We need to fail often and as fast as possible. This will reduce the risk of failure at the most critical moment. And even then, there are strategies that we can use to reduce outage when deploying.

Fail Fast

FTR is only possible if we Fail Fast. FF means that we run a lot of tests and experiments early in the process. To FF we need to focus on the process, get a grip on what we do and standardise our way of working. With standardise I mean that a team has a standard way of working. Preferably an automated process in the form of a build pipeline. To that pipeline we add the "Move it to the Left" principle. Which means that, when we look at a pipeline that starts left at development and ends right at deployment, we add tests early in the process on the left side. Tests on the left are cheap, they look at small parts of the product and test as much as possible. These are technical tests or unit tests. The next step is to create integration tests; to see if all new parts of the product work in conjunction with the old parts. These tests are fewer in numbers than the unit tests, but will take more time to complete. Take a look at the testing pyramid to get an understanding of the testing strategy. 

Failing fast means that if there is anything to fail at, that we do that as fast as possible, so we learn fast as well and can mitigate risks when going to production.

Deploy safe

When we mitigated most risks by failing fast the chance remains that we fail our deployment in production. Deployment is never 100% waterproof. But we can use deployment strategies to reduce the chance of an outage. We can use blue-green deployments; in which a deployment is made to a shadow environment and when all is up and running, production is switched from the old to the new one. We can also think of a canary release strategy. In this strategy a new deployment is made next to the old one and instead of switching all over to the new environment, we switch small groups of user over, say 10% at the time. We can measure closely the impact of the users on the new system and if things go south we can roll back to the old instance.

So to reach a First Time Right situation we need to focus on the process of development: Move it to the Left in order to Fail Fast. And we need to focus on the deployment by using a deployment strategy that fits our situation. If you want to achieve FTR you need to put in effort, be ready to fail a lot and be open to learn when you fail. Have fun!

No comments:

Post a Comment