Test Automation: Don’t report the bugs it catches

Reading time: 3 minutes

Don’t report the bugs your test automation catches. Report the reduction in uncertainty that the system works.

When you report the bugs you send the signal that test automation is there to catch bugs. But that’s not what it’s for. Test automation is there to tell you if your system is still behaving as you intended it to.

What are automated tests for?

Each automated test should be some isolated aspect of the behaviour of the system. Collectively these tests tell you that when you make a change to the system it still behaves as you want it to. What automated tests do is reduce your uncertainty that the system still behaves as you expect it to.

Framing test automation as reducing uncertainty

Framing test automation as reducing uncertainty help emphasize that there are always things we don’t know. Whereas if you frame it as increased certainty it can give the impression that we know more than we do.

Framing testing as increasing certainty
Framing testing as reducing uncertainty

What happens when a test passes or fails

When an automated test passes it’s sending a signal that this specific behaviour still exists. Therefore reducing some of your uncertainty that whatever changes you made have not affected this specific behaviour.

When a test fails it signals that this expected behaviour didn’t occur, but that’s it. What it doesn’t tell you is if it is a bug or if it was due to the change to the system. Someone still needs to investigate the failure to tell you that.

So what we should report is to what extent our uncertainty has been reduced by these tests. But how do we do that?

How to frame test automation as reducing uncertainty

Well a good place to start is to help people understand what behaviour is covered by the tests. For instance, you could categorise the behaviour of your system into 3 buckets such as primary, secondary and tertiary.

Primary could be things that are core to your product’s existence. For example for a streaming service, this could be video playback, playback controls and sign up etc. Tests in this bucket must pass before a release can be made.

Secondary could be behaviour that supports the primary behaviours but if they didn’t exist would be annoying at most but still allows the core features to function. For example, searching for new content or advanced playback controls (think variable playback speeds). Tests in this bucket can fail but they should not render the application unusable. Issues discovered here can be fixed with a patch release.

Tertiary behaviours could be experiments, new features that haven’t yet been proven out or other less frequently used features that are not considered core. Tests in this bucket can also fail and don’t have to be fixed with patch releases.

But be careful of accessibility behaviours falling into Secondary and Tertiary buckets. They might not be your biggest users but those features are critical for others to be able to use your systems.

Defining these categories is a team exercise with all the main stakeholders as it is key that they have a joint understanding of what the categories mean and what behaviours can fall into them.

Then when you report that your primary and secondary tests are passing you signal that the core and supporting features are behaving as expected. This reduces the team’s uncertainty that the system behaves as we expect. You can then decide what you want to do next.

What is Contract Testing?

And Consumer-driven contract testing

This is a follow on from Contract testing: Why do it

First some quick definitions:

Consumer
Is someone (a dev team for instance) that makes use of a third party component or a combination of components (a system). They consume the service provided by the component/system.

Producer
Are the people (a dev team) who build the component or system and make it available to others to use.

Test double
To keep the tests fast you will be using a Test double of the producer in the majority of your tests. More specifically a stub that is very simple and responds how you tell it to.

Remember don’t mock what your don’t own.

Avoid using mocks for contract tests otherwise you’ll be creating another job for yourself if you attempt to mock the behaviour of your producers. Always think of the producer as a blackbox so don’t make assumption on how the internals of the producer work. That is not your responsibility. A stub should be simple and easy to see how it works and will generally just respond with a simple response.

What is a Contract test?

Contract tests are automated code level tests written from the viewpoint of the consumer. They check that the producer exists, responds to a given request and responds in the format expected by the consumer. A simple rule could be

  • For every unique call you make to the producer write a contract test
  • If output from a producer is going to cause a unique behaviour change in you (the consumer) then write a contract test e.g. an error condition would fall into this category

They wouldn’t go further then this and begin to check that the response contains all the correct data or the behaviour of the producer. That’s the job of the producer not the consumer. The producer will be a black box to the consumer, simply input and output. Whatever transformations that happen to the data on the inside of the producer are unknown.

If you do test that a response contains the correct data then I would only test very specific types of data. Specifically ones that if they where not returned would cause problems for your system. For which your system should handle gracefully in response.

How will Contract testing help?

Focused
If the test follows the guidance above they will focus on just the boundary at which the consumer and producer interact. Therefore if they fail you know exactly where the problem is but also what the issue is as they cover only a small area of interaction. This will allow you to quickly identify if the problem is with your integration of the producer or some other part of your code.

Fast
These test will be written at the code level usually with a native unit testing framework of the language you are working with. The vast majority of the tests will also execute against a stub to keep them fast. If you ran them against the actual producer then they could run slower. Also due to each test being so focused on the interaction boundary they will run well under a second allowing the whole suite of tests to execute in a matter of seconds.

Reliable
Circle of control / Circle of influence
Do to the simplicity of the tests the number of false positives is very low and will only fail if something had changed within the test, your interaction with the test double or the test double itself. Everything is now within your circle of control therefore any brittleness can be remedied quickly and easily.

Automated documentation
You now have tests that document your usage of the dependency that are also executable so will stay up-to-date with every change you or your dependency makes.

Running the tests

These test can now be easily kept as part of the main suite of tests within the code base and run through the development pipeline as usual. Any change to the code base would result in the whole suite of contract tests running and letting the dev team know if there was any issues.

Occasionally you would also want to run the contract tests against the real dependency separately from the main build pipeline just to let you know if the contract had changed and that your test double is still a true stand-in for the real thing.

New version of the dependency

Now when a new version of the dependency is released you can run the contract tests against it and check to see if there are any breaking changes. If no issues are detected then maybe some light exploratory testing of changes detailed in the release notes.

If running the contract tests does detect an issue then it should be quick and easy to pinpoint where the issue is (you or them) and what the necessary mitigation steps should be (fix in your code or reject the release). All this while keeping your build pipeline running and your code base shippable.

If an issue is detected in the live environment then it’s going to be easy to know what changed and how to fix it. Which could be either fixing forward or backing out the change.

Confidence for the Consumer team

Contract tests allow the consuming team to move to a new version of a producer much quicker and with greater confidence than before. If something in the release notes looks risky still then your test team can carry out focused exploratory regression testing and if possible putting the update behind a feature flag for a controlled release to your end users.

What is Consumer-driven Contract Testing?

The thing with contract tests is that they are very much in the consumer domain. If the producer is making regular releases which result in the contract tests failing often then in one hand at least you know before taking the actual update but in the other you still can’t take it without work arounds or additional new releases from the producer. This may lead you to thinking about a new supplier. Why even bother with all the pain with writing contract tests when you knew this already?

What if you could help your producer see that each update is going to cause you issues before they even made a release? What if they told you prior to making the release that they need to introduce a breaking change or better yet that the current API will be deprecated after a certain date/version allowing you to move to the new API in your own time? What if you could work with your producers collaboratively that way they get what they want (easy and quick uptake of new versions) and you get what you need (new bug fixes/features, improved confidence of each update working as intended, less time testing)? This is where Consumer-driven Contract testing can help and really starts to show the benefits of Contract testing.

Benefits of Consumer-driven Contract Testing

As mentioned earlier the Contract Test sit in your circle of control. That is everything in this domain is in your direct control. The producer however is out of your control but can be in your circle of influence.

  • Note the level to which you will have influence over your dependency will depend on your overall relationship. If they are within the same organisation then things maybe easier, outside of your org but a supplier that you have a financial contract with then probably require some contract negotiation so not impossible but still some effort. No financial contract and just something you use through an open source license then contract testing is all you will likely have as a relationship.

One of the ways to start moving your producers into your circle of influence is to start a dialogue with them around your contract tests. These tests will show the producer exactly how you integrate their service and the types of response you expect from them. Also due to the simplicity of the tests and test doubles it should be easy for them to understand without your intervention (another good reason to keep them focused and simple).

Showing them the tests is a good place to start (it’s just code that’s what we are all working with none of that touchy, feely stuff about relationships) but a better way to progress the relationship, sorry, chat would be see if they could run the contract tests as a part of their development pipeline. Perhaps every time they plan to make a release or better yet on every commit (another reason to keep the tests fast and reliable).

This way they not only see how you use them, but they get an early warning if any changes in their code is likely to cause their consumers any problems. They can then see if they really need to make that change or see how they can mitigate the impact to their consumers. If they need to do it they can start a dialog with their consumers and start to migrate them onto a new API. This all helps to improve the relationship between consumer and producers, facilitated with some simple tests. Who knew testing could build stronger relationships between development teams?

Who owns the Contract tests?

Just in case it’s not clear the responsibility to write the contract tests in the first place is always with the consumers. It’s only them that know how they plan to use and integrate the producers. The producers can always offer best practice and how they intend consumer to use their services but it’s up to the consumers to decide if they plan to use the service the way it was intended.

Contract tests only become Consumer-driven once they are executed by the producers. Until then they are just Contract tests and even then just in name. If they test anything more then what was outlined earlier they become something else entirely.

New problems to solve

Figuring out how to share the tests, run them, making the results visible and letting consumers and producers know about breaking changes is a whole host of other issues that need to be resolved. The web testing frameworks have already made some progress in this area but I don’t know of any tools that facilitate this between internal teams other then having access to each other’s build infrastructure and source code repos.

Don’t use contract tests to do functional testing

Contract tests need to be quick and simple to understand and therefore only test at the boundary. If you go further than this they will become more complicated and harder for other teams to understand.

It’s not the producers team to understand how you use their service but giving them some insight into how you integrate it could be beneficial to both teams. There is nothing stopping you from writing more integrated tests but don’t expect your producer to run these. This is your responsibility and the feedback from this would be more beneficial to you than them. Besides you don’t want them thinking you’re trying to fob your testing onto them.

If you do more testing further than what was described above don’t call them contract tests otherwise you’ll cause more confusion. Be specific and call them what they are.

The unintended consequences of automated UI tests

Whenever I see people talking about automated testing I always wonder what type of testing they actually mean? Eventually someone will mention the framework they are using and all too often it’s a UI based automation tool that allows tests to be written end-to-end (A-E2E-UI). 
They are usually very good at articulating what they think these tests will give them: fast automated tests that they no longer need to run manually, amongst other reasons.

But what they fail to look at is the types of behaviours these A-E2E-UI tests encourage and discourage within teams. 

They have a tendency to encourage  

  • Writing more integrated testing with the full stack rather then isolated tests 
    • Isolated behaviour tests (e.g. unit, integration, contract tests etc) run faster and help pinpoint where issues could be
    • A-E2E-UI test will just indicate that a specific user journey is not working. While useful from an end user prospective someone still needs to investigate why. This can lead to just re-running it to see if it’s an intermittent error. Which is only made worse by tests giving false negatives which full stack tests are more likely to because of having more moving parts 
  • Testing becomes someone else responsibility 
    • This is more apparent when the A-E2E-UI test are done by somebody else in the team and not the pair developing the code 
    • Notice ‘pair’ if you’re not a one-person development army then why are you working alone? 
      • Pairs tend to produce better code of higher quality with instant feedback from a real person 
      • It might be slower at first but it’s worth it to go faster later 
      • This is really important for established businesses with paying customers 
      • A research paper called The Costs and Benefits of Pair Programming backs this up but it’s nearly 20 years old now so if you know of anything more recent let me know in the comments.
  • Pushing testing towards the end of the development life cycle 
    • The only way A-E2E-UI tests work is through a fully integrated system therefore testing gets pushed later into the development cycle 
    • You could use Test doubles for parts but then that is not an end-to-end test.
  • Slower feedback loops for development teams 
    • Due to testing being pushed to the later stages of development developers go longer without feedback into how their work is progressing 
    • This problem is increased further when the A-E2E-UI tools are not familiar to the developers who subsequently wait for the development pipeline to run their tests instead of doing it locally
  • Duplication of testing 
    • As the A-E2E-UI test suits get bigger and bigger it becomes hard and harder to see what is and isn’t covered by automation 
    • This leads to teams starting to test things at other levels (code and most likely exploratory testing ) which all add to the development time 

These are just some of the behaviours I’ve observed A-E2E-UI tests encourage, but they also discourage other behaviours which maybe desirable. 

They can discourage development teams from

  • Building testability into the design of the systems 
    • Why would you if you know you can “easily” tests something end-to-end with an automation tool? 
  • Maintainability of the code base
    • By limiting the opportunities to build a more testable design you decrease the maintainability of the code though tests 
    • If you need to make a change it’s harder to see what the change in the code affects
    • By having more fine grained tests you can pinpoint where issues exist
    • A-E2E-UI tests just indicate that a journey has broken and how it could affect the end users
    • Not where the problem was actually introduced  
  • Building quality at the source 
    • You are deferring testing towards the end of the development pipeline when everything has been integrated.  Instead of when you are actively developing the code.
    • Are you really going to go back and add in the tests especially if you know an end-to-end test is going to cover it?
  • The responsibility to test your work 
    • With the “safety net” of the A-E2E-UI tests you send the message that it’s ok if something slips though development 
    • If it affects anything the A-E2E-UI tests will catch it
    • What we should be encouraging is that it’s the developers responsibility to build AND test their work
    • They should be confidant that once they have finished that piece of code it can be shipped 
    • The A-E2E-UI tests should acts as another layer to build on your teams confidence that nothing catastrophic will impact the end users. Think of them as a canary in the coal mine. If it stops chirping then something is really wrong…   
  • More granular feedback loops
    • By having A-E2E-UI tests you’re less likely to write unit and integration tests which give you fast feedback on how that part of the code behaves 
    • Remember code level tests should be testing behaviour not implementation details 

If A-E2E-UI tests cause undesirable behaviours in teams should we stop writing them? While they are valuable at demonstrating end users journeys we shouldn’t be putting so much of our confidence that our system works as intended into them. They should be another layer which helps build the teams confidence that the system hangs together. 

If we put the vast majority of our effort and confidence into these automated end-to-end tests than we risk losing one of the teams greatest abilities: building testability into the design of our systems. But just like the automated UI tests building in testability takes conscious effort. This will take time, patients and experience for the whole team to understand and benefit from.