Friday, 18 August 2023

The horror of the standup

I hated standups when the team I was in started with Scrum. It was disruptive for my work, and we just answered the famous three questions and did nothing with it. It was just a status meeting. It was a daily chore that went fairly quick, so I thought it was more a coffee break than something useful. 

My standard answer each day was about the same:

  • SM: "What did you do yesterday?"
  • Me: "I worked on PBI x"
  • SM: "What are you going to do today?"
  • Me: "I'll be continuing my work on PBI x"
  • SM: "Anything that impedes your work?"
  • Me: "Nope"

This useless conversation made me resent Scrum. Until I actually opened the guide (when it was still more than the current 18 pages) and read about the purpose of the standup. It was not about those three questions, it is about making a plan for the day. It is about what the team is going to do in the next 24 hours to come closer to the sprint goal. It is a support meeting that promotes collaboration. Who is going to do what, to what purpose and who are helping each other to get things done. The questions are only there to support the definition of the daily goal, but are interpreted wrong. I guess that's why they removed the questions all together from the Scrum guide.

After this realisation I changed the way I do standups, they actually help me to get things done. I coach many teams and this is how I teach them to get more out of a standup.

  1. Reorganise the board: the lanes are used for the PBIs. Don't order it by team members. But order it by PBIs with the most important ones on top, the least important ones on the bottom.
  2. Make the columns as simple as possible: Todo, Doing, Done is more than enough.
  3. Start at the first item that is not done and discuss with the team what can be done today to finish that item. Focus on making pairs to make sure the item is finished in the shortest time possible. At least two or three people should be working on the same PBI.
  4. Check for anything that will impede progress on the PBI and make that the teams highest priority to solve.
  5. Take the next item on the list until you run out of team members.

After changing a standup like this, I have found out that people value the standups more. Most of the time the standups took less than the allotted 15 minutes. And as a result the teams collaborate more, have better knowledge sharing, and they got more stuff done. 

 

Tuesday, 18 July 2023

How should we use metrics in DevOps?


At DevOpsDays Amsterdam I joined an Open Space about metrics. It was very interesting to hear that some companies make the use of DORA metrics mandatory. For those unfamiliar with these DORA metrics I will explain.

  1. Mean Time To Recover (MTTR): the amount of time it takes to restore a downed service
  2. Change Failure Rate (CFR): how many changes to production cause outage
  3. Lead Time (LT): How much time does it take to bring a change from inception to production
  4. Deployment Frequency (DF): how often can you deploy a change
  5. Availability (A): what is the uptime of your main process from a users perspective

These metrics are derived from teams in a great number of people in companies (about 33.000) via a survey. It is important to note that these metrics are just a subset of what is possible to measure. Which means that they might not be best suited for your own situation. Teams in organisations should select metrics based on the goals they want to achieve. Metrics can therefor differ from team to team. Each team is responsible for their own metrics and it is crucial that they regularly reassess the metrics they use.

In the workshops I do with teams about metrics I explain the most important rule: metrics are owned by the team. No one outside the team should decide which metrics to use. You see that I struggle with the mandatoriness of metrics. Because metrics can help increase performance in a team, they should be in control of the metrics, they should use it as a learning mechanism.

Metrics should also not be used to compare team performance between teams, especially not from a management perspective. This will kill the dialogue between the team and the leaders that should facilitate them in growing. Discussing team metrics with a leader is good; they can understand and help a team remove impediments and grow. Comparison leads to competition, competition leads to selective development of capabilities and will most probably result in improvements on the measured items, which are not necessarily the items that the team needs to improve on.

As Peter Drucker said: "What you measure improves". In other words, if you know where you want to improve, start measuring. Just measure what you think will help you improve. There are a couple of things to bear with you. First of all think of the possible side effect of the metric, how you can cheat it. For instance: measure velocity of story points to see how much work gets done. This sounds like a great metrics, but can easily be cheated. The increase in story points will not guarantee more delivery of work. It might be the case that stories just get more points; thus cheating the metric. 

If you define a metric, you should also define a goal where you want to be with that metrics after a certain amount of time. Don't go for 100%, that will be demoralising, set achievable goals.

What kind of metrics might be useful to a team? Here are a couple metrics you might want to use.

  1. Mean Time to Detect (MTTD): It measures the average time taken to detect issues or failures in the production environment. A lower MTTD indicates effective monitoring and alerting systems.
  2. Mean Time to Resolve (MTTR): It measures the average time taken to resolve issues or failures once they are detected. A lower MTTR indicates efficient incident response and problem-solving capabilities.
  3. Change Failure Rate: It calculates the percentage of changes that result in incidents or require rollbacks. A lower change failure rate indicates a higher level of stability and quality in the software delivery process.
  4. Test Coverage: It measures the percentage of code or system coverage by automated tests. Higher test coverage indicates a reduced risk of introducing bugs and promotes code reliability.
  5. Customer Satisfaction: This metric captures customer feedback and satisfaction levels, indicating how well the delivered software meets customer expectations and needs.
  6. Infrastructure as Code (IaC) Compliance: It measures the percentage of infrastructure managed as code and tracks adherence to infrastructure automation practices.
  7. Team Morale: While not directly tied to technical metrics, monitoring team morale and job satisfaction can provide insights into the health of the DevOps culture and its impact on productivity and collaboration.

My favourite metrics are the following:

  1. Predictability: this is a percentage of estimated story points versus delivered story points. It tells you how accurate you planning is. Measuring predictability will improve refinement, planning and the discussion of too much work on the sprint log. It also reduces unplanned work. In other words, how well do you understand changes on your product.
  2. Availability: how available is my product to the user of the product, this is measured from outside the organisation to simulate real users by running the main process of your product. This metric gives direct feedback on how the user is able to use your product.
  3. Technical Debt: how much time do we need to resolve all technical debt (you can use SonarQube for this). Technical debt will accumulate if you don't act upon it. It suffers from compound interest and will eventually block any new features. It is very important to get a grip on Technical debt and as a general rule you should always keep it below 40% of your total work.
  4. Age: what is the age of PBIs / Stories / Issues on your backlog. Stuff that is on the end of the backlog will probably never get done, or are total irrelevant when they are up. Define the maximum age of items and throw away stuff that is older. The odd thing about things you throw away is that if they come back, they will come back better (new insights are added, they are updated to the latest specs)

I have found these four metrics to help increase team performance more than the first four DORA metrics.

The big sidenote here is that it's up to a team to use them. They can remove or add metrics to this list. When I start with metrics in my teams, my first goal is to define one to three metrics to start with. Each retrospective we inspect the metrics and adapt. This means we are going to focus on something to improve. Which can result in modification (change description, measurement, KPI) or adding a new metric.

To conclude, metrics can really help you to understand the work and your performance. Metrics should be defined by the individual teams, they should be visible to everyone and are modified when needed.

What kind of metrics do you like to use with your team?

Tuesday, 4 April 2023

Why don't we have enough time?

 

Time is relative, there are enough sayings about this phenomenon. Time is important to us. Saying we don't have time for something means a lot more than just time itself. What are the reasons we don't have time?

  1. No priority: the thing we need to do has no priority over other things. Or we think other stuff is more important than the thing we need to do.
  2. No benefit: the thing we need to do doesn't benefit us. It has no gain in any aspect, be it money, time or anything else.
  3. Angst: we fear the thing we need to do or the result it brings.
  4. No dedication: we are not dedicated in doing the thing we need to do. The result would be sloppy.
  5. No motivation: we are not motivated in doing the thing we need to do.
  6. Too much work / overflow: it is too much work, our list of things to do is overflowing
  7. No challenge: there is no challenge in the thing we need to do, it will be boring work, or we will not learn anything new
  8. No fun: the work is not fun. If it's no fun, there is less motivation to make quality things
  9. Too complicated: it is very hard to achieve the thing we need to do, work should be made simpler.
  10. Not relevant / unnecessary: the thing is not relevant to us, we don't identify with the problem the thing should solve. It might feel unnecessary.
  11. No vision: there is no vision about the thing we need to do, no higher purpose.
  12. Fear of change and failure: we fear the change the thing we need to do brings us. We fear it because we can fail and feel punishment.
  13. Not interested: we are simply not interested in the thing
  14. Irresponsible: if we would do the thing the results would be irresponsible
Note: if you have any other reasons, feel free to add them in the comments.

So how should we change the thing we should do to have it done? First of all, the end result should be clear, there should be a vision and relevance about the work so the thing becomes important and can be prioritised. The thing we need to do must be described simple, the solution might be complicated, but the description should be clear and simple. It has to have some kind of benefit like fun, money or knowledge and it mustn't bring punishment with failure.

For example. Vacuuming the house is a task with a simple description which might work out very complicated when you have a couple of teenagers, some pets and things laying around the house. Vacuuming has enough  benefits like health and ease of mind when everything is tidy. It might also be fun when you use a nice vacuum cleaner.

TL:DR whenever somebody says that they don't have time for something. It is always another reason, find out that reason and fix it to get stuff done.

Wednesday, 1 March 2023

No time

There is never enough time. Every successful company has more work than they can handle. There is no problem in that, it is more a luxury problem. Except when you can't handle it. A lot of work can lead to stress and stress leads to suffering. You can work harder, make more time (time is relative) and try to go faster but that depletes resources in no time. There are a couple of common pitfalls we see that leads to too much work and no time to do it. Let's have a look at those and find out how we can resolve this issue.

The loudest customer

First of all there is the situation of the loudest customer. That is the customer that screams the hardest and makes the most emotional impact on the introvert engineer. The engineer feels threatened and will try to make this go away by implementing the change as fast as possible. We can mitigate this by denying direct contact between engineer and customer and place a product owner in between them. PO's are more extravert, they manage the backlog and can do politics with customers, shielding the engineer in the process.

The most persistent customer

Besides a loud customer it is also possible there is a very persistent customer, who doesn't play on emotion, but will frequently contact the engineer or PO for a specific change. Because it is impossible to do all the things at once, prioritizing is the solution. We call that prioritization the backlog. A PO is owner of the backlog and plays all the games with the customers to order the items on the backlog by importance or business value. With a persistent customer the PO can forward the customer to the backlog.

The helpful employee

A helpful employee sounds good, but too helpful isn't good at all. When the employee helps the customer, and make them happy, they don't work on the backlog and are thus not working on the most important items. They dissatisfy other customers who's work has been planned and possibly delayed or not done. A team should deny working on anything else than what is on the backlog, forwarding the customer to the PO to negotiate a spot on the backlog.

The planning PO

In many occasions PO's assign work to engineers in the planning session. They focus on efficiency instead of effectiveness by doing this. Which means that everybody is busy with individual work, forgetting about sharing, building knowledge and most importantly they forget to put the team in control. Work should be ordered by the PO in the backlog. Engineers pick the most important item during the planning of the day and form a team to build a solution to that particular change. More than one engineer is working on the same change, ensuring quality. This way of working focusses on effectiveness and getting things done.

The important manager

Oh, how often have I seen managers walking into a team room and sorting out one engineer to do something for a particular customer completely skipping the product owner, any agreements, any priorities. The engineer will not say no to a manager (most of the time). This form is very disruptive and undermines the position of all team members, the product owner and the scrum master. Modern leaders should facilitate, not delegate. Solve this by coaching managers into a role of servitude; great leaders serve the team.

Feature driven

Nowadays most teams are setting a step from agile to devops, by incorporate run into their daily work. This means teams need to reserve time for maintenance, housekeeping and incidents. Too much focus on developing new features will result in an increase in technical debt. The more technical debt is acquired the slower development of new features will get, ultimately grinding to a halt. Reserve time for maintenance and incidents and keep technical debt to a minimum to ensure flexibility to build new features.

SPO(F/K)

A single point of knowledge is a huge problem. Engineers with a lot of knowledge in a team with less knowledge, are busy doing all the work while some team mates sit idle. Forbid any engineering work for this kind of people and make them teach and coach until the rest of the team is up to par.

In summary, get PO's in place to handle the backlog and set priorities fencing off customers and management. Forbid working from anything else than the backlog. Let the team plan. Share your knowledge.


Monday, 19 September 2022

About the importance of LCM

Life Cycle Management is about keeping your assets up to date. When systems are not up to date they tend to create unplanned work in the form of incidents. Unplanned work means we cannot deliver business value that we promised to our customer. Teams that don't have grip on maintenance are going to be reactive and fix assets when they are broken. The cascading effect of not having systems up to date can be big. It can even result in so much technical debt that your team cannot work on improvements and is dead locked in maintenance.

The first step in overcoming reactive maintenance is to plan it, and reserve resources for it. Each new feature that is developed by a team will result in extra capacity for LCM. Keeping a healthy balance on maintenance and new features is important to keep quality of work on an acceptable level. Unbalanced LCM can lead up to 90% of time spent on maintenance, reducing to amount of time that can be spent on delivering new business value.

To get a grip on LCM it is advisable to create a year calendar that shows when assets need to be updated. This includes hardware, software, licenses and certificates. For all those assets the current version needs to be know and the current available version. The current version doesn't need to be the latest and greatest, it needs to be a recent stable version with all security updates. Alas, the newest version can break your systems sometimes, so be alert on running the newest (n) or the one before the newest (n-1).

When you have the basic information start extrapolate on the history of the versions from your assets, nine out of ten times you can find this information on the asset providers website or in a changelog. Plot out in cadence the upcoming versions. 

Now we have a view on all the LCM work that needs to be done in the upcoming year. The next step is to calculate the workload on the items and get a grip on the time needed to maintain the assets. The calculation is just a sum that you and your team predict for what is needed to perform this maintenance. You end up with a table of all the assets, the predicted maintenance dates and the manpower that is needed to perform that maintenance. All maintenance can now be planned in the correct timeslot you use for planning, be it sprints or months for example.

The collected LCM information can be feed back to any roadmap plans to make realistic planning on upcoming sprints, months or quarters. You just made LCM planned work instead of unplanned or reactive work. 

Tuesday, 28 June 2022

Working together

Once there where two kids next to each other, playing with toy cars in a sandbox. Their parents were in joy because they saw their kids playing together. But where they? The kids played with their own toys in their own patch of sand. They didn't play together, they just played next to each other.
A couple of years later, the kids sat in the sandbox again, but this time they shared their toy cars and build a road in the sandbox. Now they just didn't sit together, they were actually playing together and had a lot of fun.

Most teams work together as the little kids in the story, they don't actually work together, they just happen to be in the same team. They pull their work individually from the worklog. They are just individuals sharing a desk space. Each morning during standup they tell each other what they did, and no one has a clue about what it exactly is they have done or they don't care. 

Some engineers put their name on a couple of items to claim them as future work. When they have one item done, they move on to the next. Others finish their work and pull stuff from the backlog into the sprint because there are no free items to work on. See where this is going? Not everything the team committed to is done but extra stuff is delivered as well. This doesn't fall well with stakeholders.

What can we do about it? First of all no team member can have more than one story on their name. Second, in the daily scrum we talk about the work that is on the scrum board. From top right: the stuff that is almost done, to bottom left: the stuff that still needs to be done. We ask ourselves: "What can we do today to finish this story". Then we ask if anyone wants to help finish it, together. As we progress through the stories in the end we run out of engineers.

So now we have at least two engineers on one story. We need something to enable us to work together. The idea is that everybody participates in fixing the story. We use pair or mob programming to accomplish this.

With pair programming there is a single driver behind a computer and there is one or more navigator who sit close to the driver. The driver is just inputting in the computer, they put in their implementation. The navigator(s) talk about the problem and try finding a solution. Every once in a while the driver switches with a navigator.

When we use this setting we can skip peer review, we are peering the whole time, so there's no use for that. Because more brains have worked out the solution it is of better quality which result in less incidents. And most importantly we shared knowledge and learned from each other. 

Below are a couple of resources to have a look at, the first is Stacy who talks about pair programming, the second is an article about mob programming on medium.

Tuesday, 21 June 2022

The cascading effect of the DevOps way of working

When you are going to work in a DevOps way it all starts with removing the first silo. This means removing the wall between Dev and Ops by putting them together in a team. The consequence is that the whole team is responsible for change and run. When the whole team becomes responsible, no individual can fall back on their role, they are all committed to the cause. Dev can do Ops, test can do Dev, Ops can do test and Dev and so on. This means more collaboration in the team and less push off of work to another role. As a DevOps team you are all in it together.

Working together is only possible if it is done in a small team, and with a small team I mean no more than five engineers (I like to call all team members engineers and not by their role). I state a team of five is the best, a team of three is minimum. With five it will still be a team, with six it will become two teams of three. Six or seven is still possible but communication and therefor collaboration will become increasingly harder. 

If you have smaller teams, you also need smaller software components and with smaller components you need more interfaces. These interfaces are twofold, one is the technical one with APIs, versioning, release cycles and so on. The other is the communication interface. How do you, as a team, communicate with other teams, how are your dependencies organised? You need to think about hard and soft dependencies and come up with a plan. Hard dependencies are those that make your team wait for work from another team or individual. Soft dependencies are those that interface with another team but don't require any work from another team. One of the goals for a DevOps team is to lessen hard dependencies and turn them into soft ones. This can only be done if it is clear who your teams customers are, which is one thing the team should know and investigate.

Working in a small team means a lot of collaboration, you need to share information and to improve sharing it is wise to minimise the work in progress by introducing a WIP limit. A WIP limit is a limit on the amount of work that can be in progress. If you need to learn a lot, make it a low number, so all team members need to work on the same thing together. One of the most errors I see is that every engineer works solely on items of the backlog not sharing knowledge and information. When two engineers work on the same item, they need to communicate and explain what they are doing, teaching the other one about their ways and maybe even learn something themselves.

It can happen that an engineer can't collaborate and that the WIP limit is reached. The purpose of DevOps is not to be as efficient as possible (max. resource utilisation) but to be as effective as possible (max. customer value). This means that when an engineer isn't working on items on the backlog, they should investigate improvements, refactor or learn.

Investigating the way of working and finding new ways to work leads to better processes and automation. When a DevOps team starts automating, many quirks in the process will be found. By solving these the quality of work and the speed of work will increase. 

So what are the steps to take?

  1. Form mixed teams with the disciplines needed to convert customer requests in a usable product or service
  2. Find out all the dependencies of the team
  3. Make communication plans with other teams, technical and social
  4. Introduce WIP limit to increase learning
  5. Automate

Let's start doing DevOps!