Tuesday, 27 July 2021

Team skills for cloud solutions

If you want to go to the cloud, you need to learn how to fly. In order to prevent castles in the air, teams need to have a fair understanding of the risks and capabilities that are needed to successfully move to the cloud. Whether internal or public, any team needs to obtain knowledge to prevent screw-ups. A cloud environment is a hostile environment, without proper knowledge or experience any team moving to the cloud is bound to fail if they don't educate themselves.

The main risks of cloud are complexity and costs. Both can be managed if a team understands them well. Let's start with costs. In the cloud, we are talking container platforms here, billing is done by compute. Compute means usage of system resources. Containers with apps that are switched off don't cost a thing, containers with apps that are running are billed by the second. Traditionally apps ran on (virtual) machines, which were always on. So a monthly bill about the costs of the VM was about it, there was a fair understanding of how much the app would cost upfront. With cloud this is a bit harder to predict. If your app isn't very much used and is always available, the bill might be high. If it is not used and switched off (ready to start when a request to the app is made) the bill might be much lower. And if your app is populair and scales up, thus using more compute, it will costs you a lot more than perhaps anticipated. 

To take advantage of the cloud model, up- and out scaling is important to understand. Breaking your application down into micro or macro services is paramount for a successful scaling model. This model can be used to take advantages of the cloud in shutting down low used services, limiting costs, or scaling up much used services to improve revenue.

A team building and maintaining the app in the cloud needs have some capabilities to support this cloud model. They need to be autonomous, command themselves, be able to take action without approval (or with implicit approval). To facilitate this an agile way of working is required. It doesn't have to be a particulair practice, method or framework. The agile concepts need to be understood and implemented. 

What they need to do next is to automate the development pipeline. Everything from check-in of a change to deployment should be automated. There cannot be any manual steps in this process. This requires rethinking of testing, approval and deployment. Another part of automation is monitoring. The app needs to be monitored, the business process needs to be monitored and the team should monitor its own performance. All these metrics should be easily available in a transparent manner. This enforces interaction and trust with the stakeholders. 

This was all about mitigation, but what if the shit hits the fan and there is a disruption in the service provided by the app? Think of break ins or outages. The first thing is security. This is a discipline that should be practiced by all members of the team. It's just like locking your house or car, or providing a good lock on your bike. A team should understand security risks by applying some form of thread modeling (STRIDE) and they should practice outages on a regular bases. The last one can even be automated by the use of a Simian Army. 

Also think about when there is actually a compromised situation or an outage. No one else is going to be called upon then the team. So they need to organise themselves to be available at all times. 

Moving to a cloud solution might sound easy, but it's much more complicated. Teams need to learn a lot of new practices to be able to safely navigate the complex cloud world. They need to feel responsible, they need to inspect their performance constantly and they need to learn new things each day.

No comments:

Post a Comment