Continuous improvement is the practice of iteratively reviewing processes and impediments to then make incremental changes to reduce waste and increase quality.
This document provides some theory and advice on practising continuous improvement. However, you do not need a lot of processes to get going. There will almost certainly be problems that people are already aware of, which provide a great starting point for improvement work.
Set up regular retrospectives with the whole team and commit to spending time acting on the things which are uncovered. Avoid committing the team to more than can be achieved in a reasonable amount of time; pick one or two changes which you think will be achievable in the next Sprint.
Set out with the intention of having this as a permanent part of how you work, iteratively checking how things are, thinking of what to do to improve things, making a small change and repeating.
During a retrospective, the team identify that the product owner frequently finds problems with features once they have been implemented, causing costly rework late in the delivery cycle.
They consider various options, including adding more detailed requirements to stories during Sprint planning, introducing a just-in-time "analysis and elaboration" stage to their agile process, and showing the working software to the product owner during development for earlier feedback.
They can see potential value in all three but decide to choose one to start with, remembering that continuous improvement is an iterative process. As the product owner is often busy and is sometimes not available at short notice, they decide to try adding more detailed requirements to stories during Sprint planning.
Teams who find it is easier to get input from the product owner at short notice may prefer to add the "analysis and elaboration" stage instead to get the benefit of doing this analysis just in time. It is important to choose the right action for the specific scenario the team is facing.
The Improvement Kata is a product of the Toyota Production System and provides a useful structure for wrapping continuous improvement activities:
- Understand the direction or challenge — expressed as a measurable signal e.g. Reduce lead time
- Grasp the current condition — e.g. Mean lead time is 4 days
- Define the target destination — e.g. Mean lead time 2 days or less
- Move toward the target iteratively, which reveals obstacles to overcome — using Plan-Do-Check-Act improvement cycles
Plan-Do-Check-Act improvement cycles
The iterative continuous improvement process can be described as a cycle, and the most common process is called Plan-Do-Check-Act (PDCA). This gives a mental model rather than a process to follow, but this can still be useful when adopting and maintaining continuous improvement. The PDCA cycle is attributed to Deming and Shewhart, and here adapted from ASQ.
There are four stages which are performed in a continuous loop:
- Plan: Identify an opportunity and plan for change.
- Do: Implement the change on a small scale.
- Check: Use data to analyse the results of the change and determine whether it made a difference.
- Act: If the change was successful, reinforce it or implement it on a wider scale and continuously assess your results. If the change did not work, begin the cycle again — i.e. try a different approach to driving improvement in this area.
At any one time, you may have several improvement initiatives in progress.
Continuous improvement has significant benefits for teams.
Maintain and improve processes
- Reduces waste, leading to improved efficiency and productivity.
- Improves quality and reduces error rates.
- Leads to happier people and improved engagement, retention, and recruitment.
It takes continuous effort to maintain and evolve processes in response to challenges and changing circumstances. Productivity and quality can decline over time without this sustained effort.
Control technical debt
Technical debt is a term which refers to things about the way a system is built which are not apparent to users of the system, but impact the ability of the team to make changes to it quickly and safely. Tech debt arises due to processes or practices in the past, but has an ongoing impact on the present.
- Leads to bugs and loss of reliability.
- Means changes take longer.
- Makes it harder to predict how long any given change will take.
- Causes dissatisfaction and disengagement in the team.
Without sustained improvement effort these all get worse over time, reducing capacity to deliver features. If little or no tech debt improvement work is done, delivery may initially be faster but over time it becomes progressively slower and less predictable.
Improve reliability and operability
Some important improvement work consists of implementing or improving reliability or operability features, such as improving resilience or optimising performance, or adding monitoring dashboards, application logs or automated security testing.
Where things like this are missing or need improvement, it is not tech debt — improving them brings a tangible benefit to service reliability, which users do notice and care about.
The benefits of improving these areas are:
- Service has fewer bugs.
- Performance is more reliable.
- Services are more available.
- Services are more secure.
- Incidents are less severe.
- Incidents are detected earlier.
- Incidents are fixed more quickly.
Identifying improvement opportunities
Regular team retrospectives are an effective way to identify improvement opportunities and actions. Another potential source is periodic reviews using tools such as the AWS or Azure Well-Architected Frameworks and the Infinity Works Delivery Quality review. And of course, tech debt is often uncovered or deliberately introduced as a short-term tactic in the course of making changes to a system (see Prioritising tech debt).
As outlined in Benefits, in high-level terms the opportunities for reducing waste or improving quality tend to be in these areas:
1. Process or practice
The Lean principles give some useful areas to consider.
- The way stories are analysed or elaborated.
- The way code is written or reviewed.
- The tools and techniques for testing.
- Communication and collaboration mechanisms within and between teams.
- Team structures.
Further reading: Lean Software Development: An Agile Toolkit by Mary Poppendieck and Tom Poppendieck.
2. Technical debt
- Code which needs to be refactored.
- Technologies which should be replaced.
- Areas with insufficient, inefficient or ineffective testing.
3. Reliability and operability
- Automated build and deployment pipelines.
- Monitoring dashboards.
- Automated alerts.
- Application logs.
- Automated security testing.
Select items which will have the most impact for the effort required. If you have many potential options, you will want to prioritise them. One option is to score how much each will move the metrics that matter to you — its value. You can then use Weighted Shortest Job First prioritisation, selecting items with the highest ratio of
(value + urgency) / effort
Value, urgency and effort are judgements and estimates in arbitrary relative units, and it is common to use a modified Fibonacci scale (1, 2, 3, 5, 8, ...) for each.
Prioritising tech debt
There is a lot of mileage in following the scout's rule as applied to coding by Uncle Bob —
always leave the code behind in a better state than you found it
— a practice which Martin Fowler calls opportunistic refactoring. Making time for this as part of feature work is a very effective way to make incremental improvements to a codebase over time. Factor in any time needed for refactoring or other tech debt improvement when estimating features.
Sometimes you may uncover an issue which is too big to tackle as part of an individual feature. When this happens, record the required improvement as an item in your backlog. For these more substantial issues, it can be useful to apply a little more structure to help with the difficult job of judging the value of fixing any given bit of technical debt. It is helpful to focus on the impact that each is having by considering aspects such as those listed in Benefits — Control technical debt, shortened here as bugs, delays, uncertainty and unhappiness. The score for each of those aspects will depend on how heavily that part of the system is worked on. Another important consideration is the business criticality of the affected part of the system. Combining the scores for each aspect yields a measure that describes the total impact of the issue and the value of fixing it, which can be fed into Weighted Shortest Job First prioritisation as above.
value = criticality x (bugs + delays + uncertainty + unhappiness)
Visualising technical debt using an approach like Colin Breck's Quality Views can help facilitate conversations about how much improvement effort is required and where it should be focussed.
Treat changes as experiments and consider ways to explore them safely. For example, only apply the change to some of the work or be explicit that it is a trial to be re-evaluated at a predetermined time (usually at the next retrospective).
Be clear what benefit you hope to get from each change so that you can objectively measure whether it has been a success and either reinforce or reverse the change.
Express each experiment as a hypothesis:
To support \<Improvement Kata direction or challenge>,
we believe \<this capability>
will result in \<this outcome>.
We will have confidence to proceed when \<we see a measurable signal>.
To support reducing the lead time,
we believe automating regression tests for our website
will result in shorter test cycles.
We will have confidence to proceed when regression test cycles are shorter, and the rate of bugs being missed by has not risen noticeably.
Break down larger problems into smaller ones which can be tackled with smaller changes more incrementally.
The problem "We don't communicate enough with the other team working in this area." could break down into several more specific points, helping drive incremental action:
- We don't have visibility of each other's backlogs
- We don't coordinate changes and end up clashing
- We don't have the same code style
- We don't have the same test approach
Some teams have success with forming a single backlog covering feature and improvement work. This requires product and tech people to work together and build a shared understanding of the relative priority of each item so that a single priority order can be decided.
Other teams find it too difficult to determine relative priorities between features and improvement work and instead use a time budget approach. For example, they may decide that each Sprint roughly 70% of the capacity should go into feature and bug fix work and the remaining 30% into operability and improvement work. The appropriate split will depend on the specific situation the team finds themselves in, and can vary over time.
When seeking to identify and prioritise improvements, it can be helpful to have agreed metrics as a guide. These will be specific to each team, but some good defaults to start with are:
- Deployment frequency
- Lead time for changes
- Incident rate
- Mean time to recover
- Team happiness
It is also useful to track the proportion of time being spent on various activities so that the balance can be corrected if required:
- Bug fixing
- Tech debt
- Other improvement work
What does this mean for me as a less-technical person?
If you are a Product Owner, Delivery Lead or Business Analyst, continuous improvement will help you:
- Reduce waste a deliver more user value each Sprint.
- Improve user experience by improving service reliability (see benefits).
- Maintain a rapid, reliable pace of delivery long term, instead of delivery becoming slower and less predictable over time.
- Focus more team time on delivering features rather than menial or repetitive work.
- This can be achieved over the mid/long-term by investing in better automation of tests, deployment and so in the short term.
You have an important role to play!
- Remind the team that users expect systems to be reliable, even if they don't think to mention it. Make sure time is built in for engineering work to deliver that reliability each Sprint.
- Explicitly consider the needs of support staff as users of the system and write stories to express those needs, e.g. As a support engineer in order to respond quickly when our service is inaccessible I need to receive and automated alert when the service is unresponsive.
- Ensure your delivery roadmap includes a healthy balance of operability/reliability features along with the functional features.
- Actively monitor the amount of improvement work being done each Sprint and the outstanding improvement work to do, and adjust the balance when required.
What does this mean for me as a technical person?
If you are an Engineer or Tester, continuous improvement will help you:
- Reduce waste and spend more time delivering value for users.
- Implement features and fix bugs more quickly and safely.
- Spend less time on menial or repetitive work.
You have an important role to play!
- Make sure you understand the user needs well enough to have an informed conversation about the relative priority of the functional work items.
- Express technical work in terms of the benefits it will deliver so that as a team you can have a meaningful conversation about relative priorities.
- Play an active role in backlog refinement and planning, ensuring that operability and reliability work is adequately represented.
- Be bold and make sure technical quality is maintained.
- But also, be pragmatic and accept that all systems have imperfections and some degree of tech debt.
As we have seen, the recipe to start or give a boost to continuous improvement is essentially very simple:
- Make a start, keeping changes small and iterating.
- Bake improvement work into the way you work with regular retrospectives which feed a trickle of improvement work into your activity within each Sprint.
- Track metrics over time so you can measure the effect of improvement work.