How We Built Testability with Psychological Safety [External post]

Ben Linders recently interviewed me for my talk at AgileTD on how we failed at testability. That resulted in this InfoQ post about how to build in testability you need developers and testers to collaborate. But to be able to do that, you need psychological safety

Testability can enable teams to make changes to their code bases without requiring extensive regression testing. To build testability, team members must collaborate and leverage each other’s unique skills. Unfortunately, effective collaboration does not come naturally to people and therefore needs leadership to nurture people’s ability to speak up and share their knowledge.

To continue reading, head over to https://www.infoq.com/articles/testability-psychological-safety/

The courage to supercharge your testability

Testability is all about building quality-in. It’s about identifying known issues before they become a problem while coding. Pairing testers into this process can supercharge the testability feedback loop. It can allow you to pick up known and unknown issues.

But pairing devs and testers together needs courage. Courage so that both disciplines can take interpersonal risks and share hard things such as what they don’t know, don’t understand or mistakes they’ve made. This will need both groups to listen, understand and ask questions to help each other through the process. Both groups will need to show curiosity, humility and empathy for one another. You will not only feel uncomfortable during the process but it will take time too. The temptation to go back to inspecting for quality – dev and test handing work off to each other – will be hard to resist.

Pairing for testability is not just pair programming but working together to understand what the behaviour of the code being written should and shouldn’t do.

Devs and testers should work together to leverage the skills that each have, not get hung up about the skills they lack. If your pair is more exploratory focused identify ways that allow you to make the best use of those skills. If they are more technically inclined then focus there.

Remember the key is to build quality-in not inspect for quality. So what can you do now that helps your team move in that direction?

Three things of 2021

Every week I spend some time reflecting on what I learned or found interesting and this is a summary of my year. After doing this for nearly 3 years one of the biggest ways it’s helped me with is seeing the thread through my work which reminds me of this quote:

You can’t connect the dots looking forward; you can only connect them looking backwards. So you have to trust that the dots will somehow connect in your future…
Steve Jobs’ 2005 Stanford Commencement Address

Where is that thread leading me? On a strategy that could help with improving team collaboration and heading towards a more generative culture.

Remote workshops

Remote workshops are constrained in ways that I hadn’t appreciated before the lock down. Such as by tools, participants work environments and people just getting tried in ways that just doesn’t happen in real life. I’m going to be following up with a blog post on 13 things I’ve learned from running remote workshops so keep an eye out if you want to know more.

Uncertainty

Your ability to identify and work through uncertainty, I believe, will be a big predictor in how successful you will be in the long run but also how satisfied you will be with life. The more I’ve learned about uncertainty and how it affects our behaviour the more I’ve changed the way I look at uncertain situations and approach them. What I’ve found is my attitude towards uncertainty has changed in a way that has made me much more comfortable to be uncomfortable with it.

How? By identifying what about the situation makes me uncomfortable. For example a situation has multiple directions each one with unknown outcomes. Then looking at how it makes me feel uncomfortable. For example a feeling in my stomach, a tremor in my hands, a tightness in my chest, a dry throat etc

This is known as interoception or the ability to sense your internal bodily state and this Guardian article does a good job of explaining it. Only then proceeding to work through the situation and deciding which direction to go in. To be honest this is much easier said than done but with practice can become habit and almost become a default way to approach unknown situations.

My experiences of this has been that by paying attention to how the situation makes me feel internally (interception) I’m able to make much more rational decisions and feel more in-control of myself even if I don’t have control of the situation.

This I believe is what helped me get over my fear of public speaking. It’s not that I got over the fear of getting up on stage but I was able to show my brain that there was nothing to fear in the first place. Over time (and this is important) my brain learns that fear isn’t the right response and tones down my bodies automatic reaction to the situation. Which in turn make me feel much more able to handle the uncertainty of it.

This I think is what can help people move out of their comfort zones and get them more comfortable with being uncomfortable.

Psychological safety

The idea of psychological safety has been on my radar for a few years. Starting with reading the The Phoenix Project in 2016 , The DevOps Handbook book in 2017. Which led me to State of DevOps Report 2018 and hearing about Google’s Project Aristotle the same year which both mentioned psychological safety for me for the first time. But I didn’t look into it until I read Amy Edmundson’s Fearless Organisation in 2019 via reading Kim Scott’s Radical Candor: How to get what you want by saying what you mean which referenced Amy’s work.

Then all through 2020 and 2021 all I could see was how so many people are holding themselves back in their teams by not saying what’s on their minds due to the uncertainty of what would happen. But I still didn’t act on psychological safety as I believed it was confirmation bias leading me to think that it was the key to getting people to speak up.

It wasn’t until late 2021 and I did an internal talk on Psychological safety: What the heck is it and why should you care? that I began to realise that this wasn’t confirmation bias. That we have a problem with speaking up in teams but we never tried to tackle what’s preventing them from speaking in the first place.

It was only after this talk that I felt much more certain that what Google had discovered back in 2012 that psychological safety is foundational to highly effective teams. Why? As this is what enables people to speak up and share what they do and don’t know. Speaking up is key for effective inter-team collaboration and enabling them to work through problems and head towards continuous improvement.

Which teams will need if we ever want them to be able to autonomously use the 4 key metrics to improve their throughput and stability of their products.

Connecting the dots

It is now that I feel I can now look back through all the different things I’ve done and learned over the years. And see how it is all connecting together into a strategy that could be helpful in increasing psychological safety at the team level.

I’ve worked at a product level in teams to see how listening and asking questions is key for being able to work through problems. I’ve immersed myself at the process level trying to understand and apply agile and DevOps principles to improve those products. I’ve collaborated with as many different disciplines to try and understand what their problems are at applying those principles to deliver those products.

But as Steve Job said you can’t see how things will connect in the future. I could never have predicted how all the little things I’ve done over the years would line up in the future.

You have to just trust that they will. This is why living with and working directly through uncertainty is going to be the biggest predictor of your success and happiness.

If you can get comfortable being uncomfortable, work through uncertainty and trust that things will workout you might just get what you want… or at least closer to where you want to be.

Interested to see my other past dots then check out my 3 things of 2020 and 2019.

The risk with direct questions

The risk with the direct question is that the person being asked could assume intent within the question. E.g. asking what risk there in this release could be assumed that you think there is a risk in the release or that you don’t trust the individuals ability.

This could lead to a break down in your relationships and make asking any further question almost impossible. This is more likely if you don’t have a working relationship with the person and is another good reason why taking time to get to know each other is so important. See foundations of great teams start with relationships to learn more.
So what do you do if you need to ask questions that could be interpreted as having intent?

Use indirect questions

The indirect question come across much more tentatively and allows the person being asked to offer more if they want to. If it is taken in the wrong way it also allows you to back out and try and get back to a productive conversation.

Now if they respond in the negative with no additional information as to why then you can tentatively inquire as to what makes the individual so sure. e.g. That’s great, what is it about this release that makes you so certain?

Examples

Direct: What risks are there in this release?
Indirect: Do you think there could be any stakeholder impact in this release?

Direct: What could go wrong with this release?
Indirect: Are there any ways in which you think this release could behave unintentionally?

Direct: What risk mitigation have been carried out for this release?
Indirect: Are there any areas you think we could have impacted with this release?

The indirect questions asks the person for their opinion on the situation which takes away any emphasis on their work. While the direct questions don’t mention anything about their part in the work the risk that they could interpret your body language/tone or some past interaction as the reason behind you asking could derail the conversation. Essentially they may not give you the benefit of the doubt and jump straight to malicious intent even though there is none.

Trade-offs of indirect questions

The downsides of indirect questions is that they take longer to ask and more effort to construct. Which slows down feedback loops and learning from each other. It also makes long term collaboration that much harder and more likely for people to avoid situations all together.

While building effective working relationships seems like a lot of effort I believe the long terms benefits of more effective collaboration is well worth it. Good relationships lets you just talk to each other.

Scales of Collaboration

Reading time: 3 minutes

Idea in brief: The scales of collaboration can help you and your teams to work more effectively by improve your collaboration. It allows you to measure how you are currently collaborating and what you can do to improve its effectiveness. But what’s wrong with our current approach and how do you use the scale?

Issues with existing collaboration

Whenever I talk with people who work in teams one of the things I hear quite often is how much they are collaborating. But when we start digging into what they are doing you begin to notice that everyone has a different idea of what collaboration means.

This results in behaviours between team members that puzzles them when they think they’ve done everything right but the other people don’t respond in the way they anticipated.

Examples I’ve heard of collaboration :

‘They should know where to find all the information’
‘I sent them an email with all the details, they just never did anything with it’
‘I gave them an opportunity to feedback anything they wanted, they didn’t so it must be fine’

In all three cases the people involved believed they where attempting to collaborate but in reality all they where doing was making information available. It was up to the recipient to decide what to do with the information if anything.

Scales of collaboration

If this isn’t collaborating then what is it and for that matter what is collaborating? This is where the scales of collaboration could come in useful. Taken from the work of Bruce B. Frey et al 2004, Measuring Change in Collaboration Among School Safety Partners . Which was originally developed from Levels of Community Linkage Model (Hogue, 1993)*. It was developed as a questionnaire to measure how well groups of people collaborated.

*Which unfortunately I’ve been unable to find the original paper only references to it

This works on 0 to 5 scale with each level having a defined set of characteristics. Where 0 is no interaction at all and 5 being collaboration. With each level building on top of the previous one.

**Scales of collaboration** developed from Levels of Community Linkage Model (Hogue, 1993)

When applied to the collaboration examples above you can see that example 1 is just making the information available which would indicate level 1 – Networking. Example 2 while is providing the information isn’t asking them to do anything which is level 2 Cooperation. Example 3 would welcome feedback but isn’t explicitly asking or providing them with a mechanism to do so therefore it would also be level 2 Cooperation.

Following the scale up towards level 5 begins to highlight what else each example would need to do to improve their collaboration.

Characteristics of collaboration

I have further augmented the scale with a few extra characteristics. This will also help you work out where you are on that scale and what you trying to achieve. This includes

How you make information available to others
Consumer/provider interaction model of this information
Speed of decision making
Engagement levels of the people involved
Examples of what each level of collaboration could look like

I’ve also left off level 0 on this diagram as that would indicate no interactions and possibly not even awareness of one another.

How to us it?

Establish where you are on the scale
- You could do this by seeing if what you are doing fits onto the scale based on its characteristics or if it looks similar to the examples on the scale provided
- Once you’ve established where you are on the scale then
Where do you want to be on the scale?
- The best way to do this is to identify the aim you are trying to achieve based on:
  - The information:
    - Is it just information providing, an opportunity to get feedback or to change opinions/direction?
  - Decision Speed:
    - How quickly does a decision needs to be made
  - Engagement:
    - If something needs to change due to that information and/or decision then there will be a greater need for engagement
How will you move up (or down) the scale?
- Use the characteristics on the scale as possible things you could do to move to this level
- What do you need to do to move in the direction you want to go in?
Share the scale with the people you are trying to collaborate with
- This would create a shared understanding of what collaboration means to this group
- Which helps everyone involved understand what is going to be expected of them and what overall outcomes everyone is trying to achieve

If you have already started to work with people then I would also avoid trying to jump straight to where you want to be. The risk being that it doesn’t lead to the collaboration you anticipated. Which could make it much harder to convince those people of your collaborative efforts in the future.

My personal preference is to use each stage of the scale as a stepping stone to the next. This way you iteratively build up your skills and approaches towards getting more of what you want and less of what you don’t. This also allows more room to tweak approaches as you get feedback and are therefore more likely to be successfully in the long run.

What do you think?

What do you think of the scales of collaboration?
Where do your teams sit on the scale?
Would this help you and your teams to collaborate more or less?

Let me know in the comments section below.

Exploratory and Automated testing: Using the right techniques in the wrong contexts

Reading time 2 minutes

Exploratory testing is about testing in an unpredictable context and therefore detecting unpredictable failures in our software. Automated testing is about testing in a predictable context and therefore detecting predictable failures. The mistake we make with automation is we try to apply it to the wrong context. You can’t use testing methods developed for predictable context in an unpredictable environment.

While there is nothing physically stopping you neither practice is particularly efficient if used in the wrong context. Exploratory testing in a predictable environment would just confirm what you already knew only slower and less consistent when repeating the testing . While automated testing in an unpredictable environment would lead to false negatives.

It’s also not a one size fits all solution either as we work in both contexts. Predictable when initially developing the software and unpredictable once running in the live environment.

The only way you can replace exploratory testing with automation is to make the test environment predictable. But that would then mean you are trying to detect predictable issues. This then negates the outcome you were looking for which is trying to detect unpredictable or complex failures.

Testing in unpredictable contexts

The best way to detect unpredictable failures is to use methodologies that can operate in an unpredictable environment.

One of the best known methods is exploratory testing (sometime called manual testing) but there are other technique too. Such as monitoring of the live environment. Which is good for issues we can predict in an unpredictable environment. Observability using logs, graphs and other telemetry to see how the system is behaving in the live environment. This is helpful for issues we can’t predict and need to debug in the live environment. Phased rollout of features using techniques such as feature toggles, blue/green deployments, canary releasing etc. Useful for limiting the impact of unintended issues in a unpredictable environment. Basically anything that allows you to slowly enable a feature for subsets of users.

Using monitoring and observability in conjunction with phased rollouts can greatly improve your ability to understand and limit how new code behaves in unpredictable environments.

Testing in predictable contexts

This is not to say automated testing is invaluable as it can help detect smaller predictable issues. Which if left unchecked could develop into larger unknown failures that only occur with the right mix of other smaller issues. Some issues maybe within our control (software we develop) and some outside of our control (other people’s software). For software in our control (a predictable environment) automated testing is almost a prefect match. For software outside of our control (an unpredictable environment) contract testing, exploratory testing, monitoring and observability and phased roll outs of software is preferable.

Control and isolation

Next time you’re looking at testing techniques think about how much control (and therefore isolation) you have over your test environment. The greater the level of control then the more automation you should consider, but the less control you have then the more you should consider exploratory testing coupled with monitoring, observability and phased rollouts.

Testing techniques

The following diagram will help you see how different testing techniques stack up against each other. This is by no means an exhaustive list and is only comparing them on a speed of feedback, value of feedback and testing environment bases. So the next time you get into a discussion about testing you could use these characteristics as a good way to frame that discussion.

Are there testing techniques that should be plotted on the chart?

Do you agree with the axis? Is there another more important characteristics of testing that should be captured?

How would you plot the testing techniques?

What are your default settings?

2 minutes read

I recently read Enlightenment now by Steven Pinker which I highly recommend reading. Among the many ideas within it he talks about some of the bugs that creep into our ways of thinking and reasoning about the world. If you’ve ever read Thinking, fast and slow by Daniel Kahneman or watched any TED talk about reasoning and decision making you’ll be quite familiar with some of these bugs.

This got me thinking that a lot of these bugs are kind of like our default settings and it needs conscious self-awareness to switch them to something else. The default approach being fast and automatic (System 1 style of thinking from Daniel Kahneman) and more towards conscious self-awareness being the slow and deliberate (System 2 thinking).

While this isn’t an exhaustive list and not everyone is affected by the same defaults to the same extent. I think we can all find examples within ourselves and in other situations where this default approach has influenced our thinking.

I’ve grouped these into

Thinking in generalities
Focusing on self-interests
Believing in magic

These break down further into specific behaviour and thought patterns, see image below

I have also tried to include citations and research evidence, see the grey source boxes in the above image.

I know I have fallen foul to these bugs and I bet there is examples of it throughout this blog. So what can we do?

One approach that I think could help to override these defaults is by developing our self-awareness about them.

Firstly by understanding what they are and what they mean to you. Can you think of any examples of you being affected by this way of thinking, believing and focusing?

Secondly recognise that they do affect you just as much as other people but you probably spot them in others more than yourself. Focusing on evidence that confirms our beliefs while dismissing evidence that contradicts it.

Thirdly slow down and work backwards though your thinking. What facts, observations, correlations and feelings are you using in your analysis of the situation?

This is by no means fool proof and whenever you end up in fast modes of thinking you’re likely to fall prey to one or more of these defaults. So should you even try? I think we should and with plenty of time and practice I believe we can start to alter these defaults.

In the meantime one of the best ways to check yourself is to work in groups. Especially within groups that you believe you can take interpersonal risks* with. This will help with getting feedback in an open and honest way so you can start to make better decisions and more reasoned analysis of situations. This will also help with starting to understand what is influencing your thinking and if it is one of these defaults at play.

I strongly believe in incremental improvement and finding good sources of information about yourself is a great place to start that personal journey of self-improvement.

* You can learn more about interpersonal risks from my why do we need psychological safety in software teams post.

Tips

A lot of these defaults can affect us in such a way that they are interconnected and can be quite difficult to pick part
I’ve found that being able to recall the defaults from memory to be really helpful. This helps when you’re being mindful and looking at your thinking to see if one of these defaults is at play. If you need to keep looking them up it not only slows you down but makes it less likely to happen. The easier something is the more likely you are to do it

What do you do to stop your default thinking taking over?

Have you come across any other defaults that you or others use?

Three things of 2020

3 minute read

Below are three things that when I reflect back on 2020 that stand out to me. I’ve purposely not mentioned COVID because I think this is one thing that all of us would have on our list so didn’t think there was anything more I can say on this that no one else is already thinking. I’ve also included my three things from 2019 at the end which I still think are important.

🌳 You can’t stick your apples on other people’s trees

Something that Sarah has been trying to tell me for some time but it never really clicked until this year 💡
I’ve learned a lot this year about how we learn and what we can do to enable more or it
- My apples…
But there is one thing that keeps coming back
It doesn’t matter how many different ways you find to engage people with the content
- Unless they really care about it they may never have the insights that you think they should have
- They may never see the benefits you do or incorporate that that information into their ways of working
- This is all about trying to stick your apples on other people’s trees…
Usually they are just too busy to even be able to give it the time
The best approach is for them to find ways to incorporate into their own learning
- To encourage them grow their own apples…
This takes a lot more time than simply forwarding a link to read or even sending them on workshops/training courses…

Speaking of apples…

🍎 Informal Relationships

Informal Relationships between team members is the key foundation for high performing engineering teams

Most teams members can work with each other quite efficiently but the level with which we do makes the difference between low and high performing teams
We can cooperate and coordinate quite well as can be seen by how well teams can slice up work into tickets and hand them off to the next stage (cooperating). Some teams take this further and begin coordinating their actions using information from their step in the process to inform the next stages (coordinating)
But coordinating with teams members isn’t enough we need to be able to collaborate because of the level of complexity we work in means no one person can ever know it all. This essential and often forgotten detail makes team members interdependent
It’s the level of how well we can collaborate and work through problems that gets teams towards higher levels of performance. This performance can be measured by how sustainably the team can deliver end user value (throughput)
Psychological safety plays a big part in this interdependence and collaboration and can be characterised by how well people in the team can “just talk to each other”
- Psychological safety being the belief of individual team members that it is safe within their work environment to take interpersonal risk

🎓 Learn more: Fundations of great teams? Start with relationships

🍏 Manager or Leader?

Understanding the difference between the two can be really helpful

One of the things that has really stood out for me this year has been the difference between management and leadership
- Very simply!
  - Management is about planning, organising and solving problems
  - Leadership is about setting the direction, alignment and motivation
  - 🎓 See more The difference between leaders and managers
I always conflated the two and never really appreciated the difference
Since then I’ve not looked at software engineering teams the same again
Do other make the same mistake?
- Leading to confusion on when we should be leading our teams and when we should shift to a more management style
A Hybrid model may also be workable
- especially the closer you get to where the work is happening
- With a heavy slant towards leadership then management
- But the further you get from it the more a leadership style works best
A simple heuristic:
- the less experienced a member of staff then a more hybrid approach
- but the more experienced they are then a more leadership style is appropriate

Three things 2019

Teams

Working as a team will accomplish more than just working alone
I’ve tried and accomplished some things with some good results
But nothing compared to what I’ve contributed to as a team
But it starts with trust…

Trust

that people really do know what the best course of action is
They just sometimes need help thinking things through
Which needs people to listen…

Listen

And I mean really listening
This has by far been the most important thing I’ve done this year
Just asking very open questions and listening to what people say
I’ve learned more about people and what is happening in our teams from this than any other way
The interesting thing is the people I’ve listened to seem to get so much more out of it
- I think this is because not many of our team members get a chance to be listened to…

What are your three things for 2020?

Let me know in the comments below

How to learn from failure

Reading time 13 minutes

Below are my personal notes from Amy Edmundson excellent article Strategies for learning from failure. It’s a long read but I highly recommend it over my notes as it goes into a lot more detail then I have covered.

Summary

Not all failures are the same and categorisation of failures can make a big difference in enabling learning from them.

Why should testers care?

Considering we deal with software failure all the time we have a tendency to forget the human cost of failures. Especially in terms of how that failure occurred (the team), how that failure affects the users and the outcome for the business. This article is a great introduction in how we can learn from failure first and then how we could enable our teams and business to learn from them by reframing errors as different types of failure.

[Organisations] that catch, correct, and learn from failure before others do will succeed
Amy Edmundson

Amy classifies failure into three types of categories

Preventable
Complex
Intelligent

But we have a tendency to view all failures as one type. In software testing we group them into different levels of risk but generally all failures are error. Which means something isn’t right and should be avoided. We’ve started to try and learn from them but the need for interdisciplinary teams to do so is a cost that is often too high to pay so doesn’t happen very often. I think if we focused our efforts to investigate complex failures we can use the learnings to start minimising preventable issues and stop some of the them happening altogether.

How should we respond to failure?

Some people believe that respond constructively to failures could give rise to an anything-goes attitude. They think that If people aren’t blamed for failures, then how else will they try as hard as possible to do their best work? But this has a tendency to try and avoid failure and in some cases cover it up.

What we actually need is culture that makes it safe to admit and report on failure (so we can learn from them) which coexist with high standards for performance (to make use of that learning to get better).

The blame game

If people see failure as something to be avoid you end up in the blame game. Which has a spectrum of reasons for failure from blameworthy to praiseworthy:

🤔Notice how things that are blameworthy are about individuals but praiseworthy are all about the things.

I wonder how many time people don’t blame others but themselves for the failure and hence keeping quiet or downplaying issues when they occur?

To embrace failure we need to classify it better then the catch all term that failure encourages. Amy Edmundson suggest these three categories: preventable, complex and intelligent failures.

Preventable

These are usually found in routine tasks that are well defined and the outcomes are well understood
Preventable failures tend to occur when we deviate from this routine
In software engineering certain routine task can and should be automated. Such as build processes and specific types of checks
If they do need to be performed manually then tasks lists and check lists are well suited to these types of tasks
- Note: exploratory testing falls under intelligent failures
Failures which result from these types of tasks can usually be mitigate through better understanding of the work we do, how we do it but most importantly why
When we spot these types of failures (deviation from the routine) we should immediately address them
This is in part about stopping errors from being passed down the process and building quality in

Complex failures

Many systems we work in are complex and too big for any one person and in most cases even groups of people to fully understand
This means complex systems can be unpredictable and ambiguous and fail in ways we could not have anticipated
The way in which complex failures occur can in some cases be traced to things all happening in just the right way
But assuming failures will never occur can be counter productive and we should build into the process to handle what happens when things go wrong
When complex failures do occur we should recognise them as such and investigated them in a praiseworthy way to understand all the components that led to the failure and identify if any of the smaller issues that resulted in the failure can be made preventable
- For example
- Most accidents in hospitals result from a series of small failures that went unnoticed and unfortunately lined up in just the wrong way.

Intelligent failures

Named by the Duke University professor of management Sim Sitkin as intelligent failures
These are the failures that occur during experimentation
They help you understand what works and what doesn’t
- And importantly quickly
These are situations where the answers are not knowable in advanced
The only way you can find out is to actually do it
Exploratory testing is all about raising awareness of intelligent failures
As Amy Edmondson calls them they are failures at the frontier
- Situations that haven’t happened before
- Or maybe won’t happen again
For software engineering this is a lot of the work that we are doing
- Hence agile software development so we can adapt to the changing environment
- To do things in a way that helps you learn from your work
- We should be producing lots of intelligent failures that help us learn about the system we’re building , the people that use it and the domain in which it used
- Exploratory testing is all about exploring a system and seeing in what ways it can fail to better understand how it works

Small experiments over Big Bang experiments

At the frontier, the right kind of experimentation produces good failures quickly. Managers who practice it can avoid the unintelligent failure of conducting experiments at a larger scale than necessary.

Trail and failure?

“Trial and error” is a common term for the kind of experimentation needed in these settings, but it is a misnomer, because “error” implies that there was a “right” outcome in the first place.

Tolerance of failure

We need to be able to accept complex and intelligent failures and understand that doing so does not mean mediocrity. Tolerance is actually something that we need in order to be able to learn from these types of failures. The problem with failure is that there is almost always an emotional element to it and so needs leadership to enable the learning that needs to happen.

How do you learn from failure?

Leaders should insist that their organizations develop a clear understanding of what happened—not of “who did it”—when things go wrong.

This requires consistently:

reporting failures, small and large;
systematically analysing them; and
proactively searching for opportunities to experiment.

Anyone working on experimental work needs to clearly know that the faster we fail the faster we will succeed but most people don’t understand this subtle but important concept.

The quicker things fail the quicker you can pivot or try another idea that can succeed
But the longer that failure takes the longer you are executing on an idea that will not help your objective
What is the opportunity cost of working on one thing and not the other?

Some people may approach experimental work as if it’s well defined and understood such as production line style of work where you need to produce the same thing over and over.

For example, statistical processes control, which uses data analysis to assess unwarranted variances, is not good for catching and correcting random invisible glitches such as software bugs.

In a typical software team this would be predefined test cases or automated checks

There are three main ways to learn from failure: detection, analysis, and experimentation.

Detection

We need to detect and make issues visible earlier on in our processes before they become bigger issues later on

Don’t shoot the messenger

Unfortunately a lot of people are reluctant to raise issues early on in the process for all manor of reasons. The biggest culprit being people unwilling to take interpersonal risks in raising issues.

One of the best ways to combat this is for management to lead by example and not only encourage the raising of issues earlier on in the process no matter how small but also applauding the people that do and having a system in place to make something happen about it.

Another issue is a human tendency to not admit failure due to the stigma attached to it “it failed therefore I’ve failed”. Therefore people keep going hoping that things will get better when they should have admitted failure or worse they haven’t realised they’ve failed due to inadequate measures or goal when starting out.

Changing the stigma around failure is one way to improve the situation such as failure parties to encourage the reporting of failures and help people look at the situation in another way.

Example of how other organisations detect errors

Through speaking up supported by management from Amy Edmundson:

In researching errors and other failures in hospitals, I discovered substantial differences across patient-care units in nurses’ willingness to speak up about them. It turned out that the behavior of midlevel managers—how they responded to failures and whether they encouraged open discussion of them, welcomed questions, and displayed humility and curiosity—was the cause. I have seen the same pattern in a wide range of organizations.

Building quality in

The idea of the andon cord from the Toyota production system is doing just this; noticing small deviations in process and correcting them there and then to constantly improve the system.

For software engineering this is all about building quality into the process instead of inspecting it at the end. Inspecting at the end is almost too late to make difference due to the increased cost in time and cognitive load to make the change. This usually ends in discussion such as /users are never going to notice X/, /no one is ever going to do Y/ or /let’s see if it’s going to become a problem first/.

Analysis

Once failures have been detected it is important to not just look at the symptoms of the problem and move on but to dig into the root cause of the issues.

Unfortunately we tend to not want to do this as it can be painful to admit that something went wrong especially if we are the cause of it and can negatively affect our self esteem and confidence. There is also an element of interpersonal risk associated with admitting failure that can add towards people not wanting to spend too long looking at issues too deeply. “What if people think I’m incompetent?”

Culture is another aspect that needs to be in place for inquire into failure to occur. Digging into failures needs:

inquiry and openness, patience, and a tolerance for causal ambiguity

But a lot of organisational cultures are geared towards actions and results not reflection as needed for learning from failure.

We are also highly susceptible to fundamental attributes error. This is where we downplay our responsibility and blame external factors when we fail and do the opposite when others do.

Amy research back in 2010 showed that failure analysis is often limited and ineffective – sadly I think this is still the case for a lot of organisations.

Analysing complex failures is difficult as they tend to occur across teams and departments and due to the reason listed above most people only focus on the symptoms rather then getting at the underlying causes of the failures. Therefore it’s best to use multidisciplinary teams to carry out the investigation with the support of management that you are looking at what happened not what someone did or didn’t do.

From the NASA Colembine disaster

A team of leading physicists, engineers, aviation experts, naval leaders, and even astronauts devoted months to an analysis of the Columbia disaster.
They conclusively established not only the first-order cause: (symptom)
- a piece of foam had hit the shuttle’s leading edge during launch—but also
second-order causes: (underlying reason)
- A rigid hierarchy and schedule-obsessed culture at NASA made it especially difficult for engineers to speak up about anything but the most rock-solid concerns.

Experimentation

A critical activity for effective learning is strategically producing failures—in the right places, at the right times—through systematic experimentation.

For scientists
* 70% of experiments will fail
* They recognise that failure is not optional but a part of the process
* And that Failure holds valuable information that they need to extract and learn from /before the competition/ 🤔

In contrast when product companies design new products they plan for success. So they setup the product for optimal conditions that work instead of representative ones that they can actually learn from. Therefore the pilot only produced information about what does work not what doesn’t.

From Amy Edmundson:

A small and extremely successful suburban pilot had lulled Telco executives into a misguided confidence.
The problem was that the pilot did not resemble real service conditions: It was staffed with unusually personable, expert service reps and took place in a community of educated, tech-savvy customers.
But DSL was a brand-new technology and, unlike traditional telephony, had to interface with customers’ highly variable home computers and technical skills.
This added complexity and unpredictability to the service-delivery challenge in ways that Telco had not fully appreciated before the launch.
A more useful pilot at Telco would have tested the technology with limited support, unsophisticated customers, and old computers.
It would have been designed to discover everything that could go wrong—instead of proving that under the best of conditions everything would go right.
Of course, the managers in charge would have to have understood that they were going to be rewarded not for success but, rather, for producing intelligent failures as quickly as possible.
What incentives are you setting up for your employees? The things you reward are the things you will get.

What makes exceptional organisations?

exceptional organisations are those that go beyond detecting and analysing failures and try to generate intelligent ones for the express purpose of learning and innovating.

Can you think of any organisation that purposely inject failures into their system to see how they behave? Hint they named the tool after monkeys 🐒 and in the process created a whole new discipline: Chaos engineering. These experiments don’t have to be that big either:

[you] don’t have to do dramatic experiments with large budgets. Often a small pilot, a dry run of a new technique, or a simulation will suffice.

recognise the inevitability of failure in today’s complex work organizations. Those that catch, correct, and learn from failure before others do will succeed
Amy Edmundson

How to move away from the In Test column

We as testers are always looking at ways to improve our testing processes and one way to do that is to move away from the In Test column. You can read more about the why in my post “In Test” column but the how I left pretty vague. I’d like to outline one way in which you could but just like removing the in test column it may seem counterintuitive; by adding even more columns.

How does adding more columns help? Well let me walk you though the process and all will become clear…

Breaking down the In Test column

“What did we actually test?”

One of the first things you’re going to need to do is break down what you do in the “In Test” column. Now this might not be as easy as it sounds but one of the best ways to do this is try to answer the question “What did we test the last time a ticket went through the in test column?” You want to think about all the different activities you carried out for the last few tickets. One way to do this is get some sticky notes and on each one write down the specific type of testing you did e.g. retested defect No. 5487 or pair tested new feature X etc. Then group them up under headings that make sense such as: regression testing, accessibility testing, performance testing, smoke testing, feature testing, automated UI testing , etc. This is best done as a group with people who are familiar with the testing process as they will keep you honest with what you do and don’t do during testing.

These sub-heading are going to become your new columns that come under the banner of Testing.

Now for each sub-heading you want to create entry and exit criteria for when a ticket can be placed into that column. So for instance what entry criteria does a ticket need to begin Accessibility testing e.g. there needs to a UI element to the ticket. What criteria would that ticket need to satisfy so it can leave (exit) the column? E.g. feature run against accessibility guidelines, any issues not fixed have been communicated to accessibility team/product owner etc

So you’ve got all these extra columns now what?

Well, you could just leave it at that and simply making the work more visible the team has a better understanding of what work we as testers actually do. Or you can start to use it as a tool on where to focus your Test improvement process.

Test improvement process

By making all the different testing activities visible you get other benefits too:

Makes testing work explicit instead of being hidden under the banner of “Testing”
Entry/Exit criteria helps the team understand why that testing needs to happen
- And when it’s complete
Length of time a ticket spends in the columns start to make bottlenecks (constraints) visible

Bottlenecks

By recording how long a ticket spends in each of the columns you can start to see which of the activities is taking the longest.
This will identify the most valuable candidate to start your improvement process. Anything before or after the bottleneck is not going to make as greater impact because the biggest issue isn’t being addressed.

Once you’ve identified your bottleneck you can as a team look at the entry/exit criteria as a starting point for improvements. You could look to see if the risks this type of testing is mitigating against can be address in some other way. For example earlier on in the process or rather than manually in some automated fashion but remember the automation fallacy.

Eventually you will find that the bottleneck “moves” to another part of the testing process which is your next candidate to carry out your improvement process. If you keep going you will start to see that tickets spend less and less time in the identified columns. In some cases you will see that the columns isn’t even need anymore and can be removed all together.

This is known as the 5 focusing steps from the Theory of Constraints. You can find more on Wikipedia and elsewhere but these are:

Identify the system’s constraint(s)
We do this by measuring how long each tickets spends in the columns and work out which testing task is taking the longest.
Decide how to exploit the system’s constraint(s)
By looking at the entry and exit criteria and seeing what could be improved as a team.
Subordinate everything else to the above decision(s)
Once your improvement process has been identified assign it as a task for someone to carry out or if possible as a team work on it together (swarm).
Alleviate the system’s constraint(s)
Carry out the mitigation methods identified above, update the entry/exit criteria for the column and keep measuring how long the ticket stays in the column going forward.
If in the previous steps a constraint has been broken, go back to step 1, but do not allow inertia to cause a system’s constraint
If the mitigation method(s) alleviate the bottleneck start the process over BUT don’t stop, keep going till you’ve address all the different types of testing.

Eventually you will be left with some testing that cannot be mitigated but the entry and exit criteria will indicate exactly what testing needs to happen (column heading), why you need to do it (entry criteria) and when its done (exit criteria).

You can then simply make this a part of something that happens for a ticket and you should be left with just the “In Progress” column.

Remember this strategy doesn’t just work for improving testing but all activities that a team carries out. You just need a way to identify the activities and make them visible.