Exploratory and Automated testing: Using the right techniques in the wrong contexts

Reading time 2 minutes

Exploratory testing is about testing in an unpredictable context and therefore detecting unpredictable failures in our software. Automated testing is about testing in a predictable context and therefore detecting predictable failures. The mistake we make with automation is we try to apply it to the wrong context. You can’t use testing methods developed for predictable context in an unpredictable environment.

While there is nothing physically stopping you neither practice is particularly efficient if used in the wrong context. Exploratory testing in a predictable environment would just confirm what you already knew only slower and less consistent when repeating the testing . While automated testing in an unpredictable environment would lead to false negatives.

It’s also not a one size fits all solution either as we work in both contexts. Predictable when initially developing the software and unpredictable once running in the live environment.

The only way you can replace exploratory testing with automation is to make the test environment predictable. But that would then mean you are trying to detect predictable issues. This then negates the outcome you were looking for which is trying to detect unpredictable or complex failures.

Testing in unpredictable contexts

The best way to detect unpredictable failures is to use methodologies that can operate in an unpredictable environment. 

One of the best known methods is exploratory testing (sometime called manual testing) but there are other technique too. Such as monitoring of the live environment. Which is good for issues we can predict in an unpredictable environment. Observability using logs, graphs and other telemetry to see how the system is behaving in the live environment. This is helpful for issues we can’t predict and need to debug in the live environment. Phased rollout of features using techniques such as feature toggles, blue/green deployments, canary releasing etc. Useful for limiting the impact of unintended issues in a unpredictable environment. Basically anything that allows you to slowly enable a feature for subsets of users.

Using monitoring and observability in conjunction with phased rollouts can greatly improve your ability to understand and limit how new code behaves in unpredictable environments. 

Testing in predictable contexts

This is not to say automated testing is invaluable as it can help detect smaller predictable issues. Which if left unchecked could develop into larger unknown failures that only occur with the right mix of other smaller issues. Some issues maybe within our control (software we develop) and some outside of our control (other people’s software). For software in our control (a predictable environment) automated testing is almost a prefect match. For software outside of our control (an unpredictable environment) contract testing, exploratory testing, monitoring and observability and phased roll outs of software is preferable. 

Control and isolation

Next time you’re looking at testing techniques think about how much control (and therefore isolation) you have over your test environment. The greater the level of control then the more automation you should consider, but the less control you have then the more you should consider exploratory testing coupled with monitoring, observability and phased rollouts. 

Testing techniques

The following diagram will help you see how different testing techniques stack up against each other. This is by no means an exhaustive list and is only comparing them on a speed of feedback, value of feedback and testing environment bases. So the next time you get into a discussion about testing you could use these characteristics as a good way to frame that discussion.

Testing techniques plotted on a speed, value and environment axis
Testing techniques plotted on a speed, value and environment axis

Are there testing techniques that should be plotted on the chart?

Do you agree with the axis? Is there another more important characteristics of testing that should be captured?

How would you plot the testing techniques?

The unintended consequences of automated UI tests

Whenever I see people talking about automated testing I always wonder what type of testing they actually mean? Eventually someone will mention the framework they are using and all too often it’s a UI based automation tool that allows tests to be written end-to-end (A-E2E-UI). 
They are usually very good at articulating what they think these tests will give them: fast automated tests that they no longer need to run manually, amongst other reasons.

But what they fail to look at is the types of behaviours these A-E2E-UI tests encourage and discourage within teams. 

They have a tendency to encourage  

  • Writing more integrated testing with the full stack rather then isolated tests 
    • Isolated behaviour tests (e.g. unit, integration, contract tests etc) run faster and help pinpoint where issues could be
    • A-E2E-UI test will just indicate that a specific user journey is not working. While useful from an end user prospective someone still needs to investigate why. This can lead to just re-running it to see if it’s an intermittent error. Which is only made worse by tests giving false negatives which full stack tests are more likely to because of having more moving parts 
  • Testing becomes someone else responsibility 
    • This is more apparent when the A-E2E-UI test are done by somebody else in the team and not the pair developing the code 
    • Notice ‘pair’ if you’re not a one-person development army then why are you working alone? 
      • Pairs tend to produce better code of higher quality with instant feedback from a real person 
      • It might be slower at first but it’s worth it to go faster later 
      • This is really important for established businesses with paying customers 
      • A research paper called The Costs and Benefits of Pair Programming backs this up but it’s nearly 20 years old now so if you know of anything more recent let me know in the comments.
  • Pushing testing towards the end of the development life cycle 
    • The only way A-E2E-UI tests work is through a fully integrated system therefore testing gets pushed later into the development cycle 
    • You could use Test doubles for parts but then that is not an end-to-end test.
  • Slower feedback loops for development teams 
    • Due to testing being pushed to the later stages of development developers go longer without feedback into how their work is progressing 
    • This problem is increased further when the A-E2E-UI tools are not familiar to the developers who subsequently wait for the development pipeline to run their tests instead of doing it locally
  • Duplication of testing 
    • As the A-E2E-UI test suits get bigger and bigger it becomes hard and harder to see what is and isn’t covered by automation 
    • This leads to teams starting to test things at other levels (code and most likely exploratory testing ) which all add to the development time 

These are just some of the behaviours I’ve observed A-E2E-UI tests encourage, but they also discourage other behaviours which maybe desirable. 

They can discourage development teams from

  • Building testability into the design of the systems 
    • Why would you if you know you can “easily” tests something end-to-end with an automation tool? 
  • Maintainability of the code base
    • By limiting the opportunities to build a more testable design you decrease the maintainability of the code though tests 
    • If you need to make a change it’s harder to see what the change in the code affects
    • By having more fine grained tests you can pinpoint where issues exist
    • A-E2E-UI tests just indicate that a journey has broken and how it could affect the end users
    • Not where the problem was actually introduced  
  • Building quality at the source 
    • You are deferring testing towards the end of the development pipeline when everything has been integrated.  Instead of when you are actively developing the code.
    • Are you really going to go back and add in the tests especially if you know an end-to-end test is going to cover it?
  • The responsibility to test your work 
    • With the “safety net” of the A-E2E-UI tests you send the message that it’s ok if something slips though development 
    • If it affects anything the A-E2E-UI tests will catch it
    • What we should be encouraging is that it’s the developers responsibility to build AND test their work
    • They should be confidant that once they have finished that piece of code it can be shipped 
    • The A-E2E-UI tests should acts as another layer to build on your teams confidence that nothing catastrophic will impact the end users. Think of them as a canary in the coal mine. If it stops chirping then something is really wrong…   
  • More granular feedback loops
    • By having A-E2E-UI tests you’re less likely to write unit and integration tests which give you fast feedback on how that part of the code behaves 
    • Remember code level tests should be testing behaviour not implementation details 

If A-E2E-UI tests cause undesirable behaviours in teams should we stop writing them? While they are valuable at demonstrating end users journeys we shouldn’t be putting so much of our confidence that our system works as intended into them. They should be another layer which helps build the teams confidence that the system hangs together. 

If we put the vast majority of our effort and confidence into these automated end-to-end tests than we risk losing one of the teams greatest abilities: building testability into the design of our systems. But just like the automated UI tests building in testability takes conscious effort. This will take time, patients and experience for the whole team to understand and benefit from.

UI Automation, what is it good for? 

TL;DR: What automation at the UI level does and doesn’t give you.
UPDATE: I originally wrote this back in March 2015, lost it in my drafts and found it again recently so thought I get it out there. Don’t agree then let me know in the comments.

Automation fallacy

Every time I speak with different teams and organisations a theme constantly comes up, UI automation and how it’s going to solve all their problems. The thinking goes that if we can automate more of our tests – read test scripts –  then the Testers no longer have to check that item anymore. This then frees them up to do more interesting things like exploratory testing or that the Tester can be done away with altogether.

There is also a notion that automating all the regression checks will drop the regression test cycle from days to hours. This then supposable allows the team to move faster and release  quicker then before.

What everyone seems to miss is that Automated checks are generally built to check one thing and will tell you if that thing is still there or behaving as the script has been programmed to tell you. If anything else happens that wasn’t programmed into the check then it fails or stops dead, relaying on someone having to then go look and see what went wrong.

A Tester on the other hand can look for workarounds, workout what may have caused the issue or go find other issues based on the information they’ve just learned.

So should we give up on UI automation and face that we’ve got to do rounds and rounds of regression testing and hire more Testers? Well no, what we need to do is ask ourselves

Why are we automating?

It’s looks like a simple question and most people (including myself in the past) would be able to give you a list of answers but what we forget to question is by automating this check what does it tell me when it passes or fails? If it passes does that now mean I no longer have to check that feature or scenario again? If it fails what does that tell me? That I have to check that scenario manually?

When a check fails what do we expect the team to do? Stop everything and investigate the issue? Carry on as normal and hope someone else will check it? Ignore the issue altogether? Who is responsible for checking the issue? Developers, Testers, dedicated automation engineers?

There are a lot of reasons that people give as to why they want to automate their testing such as

  • Reduce test/regression testing
    • The reason for regression testing is to see if the changes you’ve made to your code base haven’t broken anything existing.
    • Unless you have automated all your UI Checks/regression suits then automation is not going to help you as much as you think it will
  • Spot issues/bugs faster
    • Automation doesn’t find new bugs it only tells you that the check you’ve scripted has broken in someway. You need to tell the script that if action A doesn’t produce result B then fail with an error message. What normally happens is the check fails in away your didn’t anticipate.  Don’t forget if you knew before hand how something would break you would probably have put in a fix. Thats why they are called defects something behaving the way you didn’t want/anticipate
  • Free up Testers
    • Potentially but that is if they trust the automation
  • Consistently check a feature the same way
    • This is one thing a automation check is very good at
  • Something that is laborious or difficult to setup and check
    • Another good candidate for automation. We use it to do policy testing of our apps as it’s time consuming and prone to error when trying to manually test
  • We’re doing Behaviour Driven Development (BDD)
    • BDD is not about automating but more about collaborating to understand and create features. The automation is just one small part of it and even then it’s not about testing the UI but the business logic which could be tested at the Unit level
    • If you ever hear a development team saying ’The BDD tests are failing’ then its good indicator that they are probably using BDD incorrectly
  • To release faster
    • Again because you need to do less testing, see Reduce test/regression testing
  • It’s a part of continues integration/delivery so we have to
    • No thought into what you are automating other then it’s what people say you have to do
  • Test manager or some other higher up tells you to
    • Someone thinks that just telling a development team to automate their testing will help them, see above
  • People within the development team or key stockholders don’t trust the developers work
    • The test team are being used as a safety net to check the developers work which tends to have a self-fulfilling prophecy for the developers who start using the test team as that

What does an Automated Check actually do?

Let’s start with an example from an mobile app but could very easily be any platform of your choice:

A simple automated scenario could be when the home page is loaded and I’ve selected an option then I expect to see items X, Y and Z.

Things this scenario will need to do are:

     Start the application
     Wait for it to load up
     Select a menu option
     Wait for the new screen to load
     Then check that the expected items are on screen.

animated gif of example automated check
Click to view animated gif of example automated check

So you can run this check over and over and know that as long as the sequence doesn’t change and the items you are checking for are there then the test will pass. What it isn’t going to tell you though is

  • Formatting issues with any of the screens loading up
  • It is starting to take longer for pages to load in
  • The ordering of the menu options has changed
  • There are new menu options
  • The items you are checking can be seen by the automation framework but nothing is actually visible on screen
  • There are new items on screen that the check is not looking for

All of the above could also be scripted into the check but would likely take quite a bit of effort and you can’t always predict how an app will behave and therefore not be able to script for it.

This is where a real Tester has the advantage. You don’t need to tell a Tester to look for these things they will do this without being prompted and not only that generally a lot faster then an automated check. They can also tell you if it doesn’t feel right or perform in away that would be acceptable for end users which can be very hard to quantify and therefore automate. They can also use the information that they’ve just learned and apply it to what else they can discover. An automated test isn’t going to be able to do this, not with the tools we are using at the moment

Where a Tester can’t match an automated check (or will find very hard to) is checking the same thing in the same way consistently and quickly. As long as they are no physical moving parts an automated check can normally carry out the scenario above in seconds only being delayed by waiting for things to install or load.

So should we stop automating our checks?

Before any team starts to think about automating their testing via the UI they should first, as a team, ask themselves:

Why are we automating?

It sounds like a simple question but as I explained earlier people tend to have differing views on what the automation is actually going to do for them. By talking about why they want to automate they are more likely to come up with solutions that will actually address the problems.

One of the main benefits that I’ve seen from automation especially at the UI is faster feedback that the app:

  • Can actually be installed on a real device/displayed in a browser
  • The app can start without crashing
  • The app can reach any endpoints that it relies on
  • The apps core feature, the one thing it is designed to do actually works for your users e.g.
    • BBC iPlayer: Can video actually be played
    • Google maps: can provide directions to a destination
    • Amazon: allows you to buy products
    • Facebook: The feed shows you what your friends and family are doing

To do this manually, every time a build is made, could take some time but not only that is very tedious and from my experience just doesn’t happen. What tends to happen in this scenario is that developers will wait and see what comes back when Testers finally do test the app. This could be some time from when the change was made and the Testers finally being able to test it.

The longer this feedback loop is the hard it is to fix due to the overhead in understanding what went wrong and what changed to cause the issue. This is exacerbated when working with legacy code especially when not written by the developer making that change.

By automating just the core journey the development team know very quickly that whatever was last committed hasn’t caused a catastrophic failure and that the apps core feature is still functioning. If there is a failure then you can back out the change (or ideally fix it) and get back to a working state. This helps the whole team know that the app works and improves the team overall confidence that if I install this app it’s actually going to be worth their time. There is nothing more frustrating especially in mobile development to get a build, find the device you want to test on and install it only to find it can’t carry out it’s main job for the user or worse crashes on start.

When things fail so easily and obviously it does nothing to instil confidence in the development team more so when your key stakeholders find the issues. This also allows you to start using your Testers for what they a really good at testing and not just checking your developers work.

Core Journey

We use the concept of PUMA  to decide what our core journeys are and ultimately what we should and shouldn’t automate. A generally rule of thumb is if it’s not a core journey then can it be covered by a unit/integration test not invoking the UI. If it still can’t then why would automating it help? Who would do it? How often does it need to run and how quickly do we need feedback that it’s broken? Could we monitor the app stats to check if is still working rather then automating it? If it does break how bad would your users be affected/perception be? Could it be controlled by a feature toggle that allows it to be switched off in the live environment?

So the next time someone asks why don’t you just automate your testing ask them “Why are we automating?” You might realise that the problem they perceive can easily be addressed by one simple automated check rather then 100’s of automated UI checks.