WIP limits: choose your own adventure edition

, , ,
people waiting in line

What are they?

A limit on the amount of work that can be pulled into the team and most likely columns on your board that you use to manage your teams work.

Why?

The idea being that instead of work being pushed into the team as a when it comes up the team pulls the work into their work stream if they have capacity to actually work on it. This prevents people before you in the delivery pipeline from starting new work/tickets before they have handed their current ticket off.

I’ll use a very crude board below to demonstrate how this would work and the issues it tries to solve.

In this case below there are 4 devs to 2 testers. Which mean that the dev team can do more work than the test team can pick up.

As you can see in this example the test column has 5 items but only capacity to work on 2 items at a time with 3 waiting to be worked on next. Mean while the dev team has already started working on 4 new items. This is a push based system where as work is completed it is pushed onto the next person/team/column.
The test team in this example can just pick up the next item in their list of things to do and devs can keep busy working on the next ticket.

So what is wrong with this? Well one thing is that part completed work is sitting in a queue (the 3 tickets waiting in the test column). If there is a problem with the tickets that the Testers are working on then it would need to go back into dev, but they are busy working on new tickets so what now?

Jump to Option 1 if you think they should stop working on their ticket to deal with the issue.
Jump to Option 2 if you think they should finish their current ticket and then pickup the issue.
Jump to Option 3 to see what applying a WIP limit would do.

Option 1: Dev stop working on current ticket and picks up the issue

If the developers jump onto the problem ticket then they have part completed work now sitting in the queue. There is also a cognitive load issue in that the dev now needs to context switch to understand the problem ticket again and depending on the issue it could take some time to fix. In the mean time what is the tester to do? They could pick up one of the tickets in the queue or wait until the dev comes back with a fix. As we don’t know how long this is going to take the testers decided to pick up a new ticket instead of waiting around.

Move to Everyone is busy again…

Option 2: Dev completes existing ticket first and then goes back to the new issue

If the developer opts to complete the ticket they are currently working on then the ticket with the issue is now waiting in a queue but which does it go into? Back into Ready, Dev or stays in the Test column? In some ways you could think of this ticket with the issue as rework as it’s going backwards through the process not forward as you would expect. Also what is the Tester to do now that they are blocked from working on the ticket? Do they pick up a new ticket or wait?

Move to everyone is busy again…

Everyone is busy again and all is good…

Well not exactly we now have part completed work just waiting around providing value to no one. On top of that if a live issue or something else happens (unplanned work) and the dev or tester was away from work (e.g. training, illness or holiday) then you could end up in a situation where you now have 3 potential tickets partly completed. Who would then pick up this work? One of the other devs or testers who may have no context and potentially spend even more time trying to get familiar with the work.

This is when you would likely to see your lead times starting to increase due to in progress work now waiting in queues. Want to know what lead time is and why is this important to understand? Then let me know in the comments and I’ll follow up.

What if the fix requires some significant changes to the system which could impact the other work that is currently in progress. Would the other tester have to stop as the fix could invalidate their testing? What about the other devs who now need to pull in the changes requiring them to now understand the fix and how it would impact their work. This one issue (yes on the extreme side of thinking) could affect up to 9 tickets, all 4 tickets in Dev and 5 tickets in Test. Also do tickets in Done mean released or ready for release? If it’s ready for release then the this fix could also affect them. This leads to a lot of value stuck in your system of delivering software as apposed to getting that value out and into the hands of your end users.

Is there another way?

The two options above are very much a pushed based system where work is pushed onto the next stage as it is completed whether there is capacity in the next stage to do the task or not.

So how would a WIP limit actually help this situation? Are you not just artificially stopping productive people from doing what they are paid to do? In essence yes that is actually what you are doing. Stop people moving onto more work before the existing work is actually done. So how is this a good thing?

Let see what applying a WIP limit to the above situations would do.

Option 3: Implement a WIP limit which encourages a pull based system of work not push.

In the scenario above new work can only be pulled into Dev or Test if someone is actually free to do the work which means that until there is free capacity downstream they can not pull in new work and must sit with the ticket until it can be handed off. Now all the issues with Option 1 and 2 can not occur as you can’t pick up new work until there is free space.

In this case the bottleneck is still test so there is a likelihood that a developer is likely to be waiting around more often than a tester. So what is the developer to do? They can simply wait for the tester to become free or maybe look to see if the ticket can be tested in such a way that when the tester does become free they only need to have a conversation about what testing the developer has already done to mitigate any issues. The tester can then decide if they wish to do more exploratory testing on the ticket as they no longer need to check the ticket for basic functionality or simply move it straight to Done.

As the tester is in short supply and being one of the best feedback mechanism in your development process (apart from real end users of course) you are only going to be passing them the tickets that is most worthy of their time not any old ticket that could have easily been verified by some automated test at the code or perhaps UI level. This starts to encourage your team to not use your test team as a safety net of checking developers work but more informing on the quality of it. But don’t forget quality means different things to different people.

If the tester finds an issue with a ticket and it needs to go back then there should be capacity within development to pick it back up straight away with no issues around dropping existing tickets. Not only that the tester doesn’t have a queue of tickets waiting for them which adds the pressure of them trying to work faster and possibly miss things. Instead they stay with the ticket and possibly assist the developer with understanding the issue and providing them with real time feedback on their fix.
This could even lead to a scenario where the tester and developer are paired together from the very beginning of the ticket and possibly even removing the In Test Column…

By leaving capacity within the dev team then you are leaving slack in the system. This slack or free engineer could be used to help with technical debt, live issues or even working with the testers to make their lives easier and make the work they do spread even further.

This also has the added benefit of helping the team to “Focus on Finishing”. If you start something then you see it through to the end without it waiting around in queues for people to become free to work on it. It also starts getting that whole team mentality towards getting things done and improving the system as a whole instead of individuals working on their own patch of the system.

WIP limits get a lot of criticism from some people in that they are simply slowing things down by placing limits on productive parts of the engineering team and that they should be left to keep being productive. But what they do show is that there are bottlenecks in any system and by keeping some slack in the team you can start addressing them and start making your development teams more productive by focusing on finishing and improving everyday work.

A word of warning when implementing WIP limits

When first using wip limits in software teams be warned that they can cause a lot of disruption. All of a sudden parts of the team that where able to keep busy are now waiting for other parts of the team to be freed up. If this isn’t managed well then you could end up with a lot of unhappy team members and especially if there wasn’t any perceived problems in the previous way of working. This can be even more problematic if you are now asking people to do work that they hadn’t needed to do before like developers getting more hands on with testing.

Another side affect of wip limits is that they cause people within the team to have to communicate more to find out where work is coming from next or how they can help each other. Previously they would have just picked off whatever was waiting in their next column. This can cause a lot of the uncomfort for some as they don’t know what they need to do next and don’t have the perceived safety of the next/waiting for columns.

One of the biggest issues I’ve seen cause these types of issues are individuals within a teams focused on working on tickets as apposed to working collaboratively towards a goal.

We need to keep in mind that the tickets are just how the overall goal has been broken up into distinct deliverable pieces of testable value. The aim being does the new information we have just learnt head us in the direction we want to go in?

This is one place I think tester can help add some real value to teams. Testers can help the teams understand if these iterations are actually helping to achieve the goal but also how the actions of the team affect the overall system within which they work.

Maybe a better way to frame the goal is a hypothesis that the team is going to test as a goal sounds like you already know what the outcome will be.

What do you think of WIP limits do they aid or hinder team productivity?

Have you transcended WIP limits all together and found other ways to stop queues forming?

Let me know in the comments.

Update 18/6/19: Interesting post on modelling WIP limits to see how different limits affect the throughput of work Why limiting work-in-progress works

0 replies

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply

Your email address will not be published. Required fields are marked *