Test Automation: Don’t report the bugs it catches

Reading time: 3 minutes

Don’t report the bugs your test automation catches. Report the reduction in uncertainty that the system works.

When you report the bugs you send the signal that test automation is there to catch bugs. But that’s not what it’s for. Test automation is there to tell you if your system is still behaving as you intended it to.

What are automated tests for?

Each automated test should be some isolated aspect of the behaviour of the system. Collectively these tests tell you that when you make a change to the system it still behaves as you want it to. What automated tests do is reduce your uncertainty that the system still behaves as you expect it to.

Framing test automation as reducing uncertainty

Framing test automation as reducing uncertainty help emphasize that there are always things we don’t know. Whereas if you frame it as increased certainty it can give the impression that we know more than we do.

Framing testing as increasing certainty
Framing testing as reducing uncertainty

What happens when a test passes or fails

When an automated test passes it’s sending a signal that this specific behaviour still exists. Therefore reducing some of your uncertainty that whatever changes you made have not affected this specific behaviour.

When a test fails it signals that this expected behaviour didn’t occur, but that’s it. What it doesn’t tell you is if it is a bug or if it was due to the change to the system. Someone still needs to investigate the failure to tell you that.

So what we should report is to what extent our uncertainty has been reduced by these tests. But how do we do that?

How to frame test automation as reducing uncertainty

Well a good place to start is to help people understand what behaviour is covered by the tests. For instance, you could categorise the behaviour of your system into 3 buckets such as primary, secondary and tertiary.

Primary could be things that are core to your product’s existence. For example for a streaming service, this could be video playback, playback controls and sign up etc. Tests in this bucket must pass before a release can be made.

Secondary could be behaviour that supports the primary behaviours but if they didn’t exist would be annoying at most but still allows the core features to function. For example, searching for new content or advanced playback controls (think variable playback speeds). Tests in this bucket can fail but they should not render the application unusable. Issues discovered here can be fixed with a patch release.

Tertiary behaviours could be experiments, new features that haven’t yet been proven out or other less frequently used features that are not considered core. Tests in this bucket can also fail and don’t have to be fixed with patch releases.

But be careful of accessibility behaviours falling into Secondary and Tertiary buckets. They might not be your biggest users but those features are critical for others to be able to use your systems.

Defining these categories is a team exercise with all the main stakeholders as it is key that they have a joint understanding of what the categories mean and what behaviours can fall into them.

Then when you report that your primary and secondary tests are passing you signal that the core and supporting features are behaving as expected. This reduces the team’s uncertainty that the system behaves as we expect. You can then decide what you want to do next.

Reducing Uncertainty in Software Delivery

I recently attended a half day online event that InfoQ held on Reducing Uncertainty in Software Delivery. The thing that made this half day event different was the underlying focus on testing but without a single tester present in the talks or panel discussions. The majority of speakers where developers and there was even a few Engineering managers, Product people and a CEO or two. It also appeared to me that none of them have come from a traditional testing background. However they all made points that a good tester would and then some. The advantage they appear to have over testers is that they were able to incorporate their knowledge of their discipline to give a much broader view than just focusing on the testing itself. 

A key theme that I’m seeing from these talks is that they are spending a lot of effort on learning from failure. Either by analysing ones that have happened in production or actively encouraging teams to cause failures. It was only the more advanced organisations that were taking this approach but the others were not far behind. Why? To make their systems even more resilient. Their approach appears to be using Site Reliability Engineers (SRE) to work along side their engineering teams to help them do the work but also enable the teams to extract the learnings from it too. This isn’t simply having chaos testing to cause failures or postmortems for production failure analysis but to also help teams with the people side of working with and handling failure productively.

The talks that caught my interests were Building in reliability (SRE at Gremlin), User Simulation for Rapid Outage Mitigation (SRE at Uber), and a panel discussion on Testing in production (with 2 CEOs, Product person and an Engineering Manager). 

Now this is a small sample, the speakers are very experienced and working or have worked at some of the best known web based organisations (Google, Uber etc) and US focused too. But I’m seeing a lot of things that testers could advocate for being pursued and implemented by Site Reliability Engineers (SRE). For example: 

  • testing in production,
  • building in observability,
  • pushing testing earlier in the process,
  • encouraging developers to test their own work 

The advantage SREs have is they already have the technical ability and are now starting to build out the socio-technological skills that they were lacking previously. These organisations have another advantage in that they are heavily focused on learning from their failures. So when they do get things wrong they work hard to make sure they extract as much value from that failure as possible. On top of that some of these organisations are actively causing failures within their systems to further limit catastrophic failures that could occur.  Some of these organisations have never had tester and from the looks of things never will. If you’re pursuing a true continuous improvement strategy testers could look like a bottleneck in the process slowing down information flow. How can testers enable the flow of information and what can they add that makes this information even more valuable?  

I’ve pulled my summaries of the talks I found interesting below 

Talk: User Simulation for Rapid Outage Mitigation

Uber uses an alternative approach to end-to-end testing due to their system being so big that no one person can ever fully understand it. Instead they use composable tests that each team will create that allows that team to tests their part of the system but mix in other parts pre and post steps built by their dependent teams. These are then run in a simulation environment that allows them to see how the system will perform when that change is deployed. To incentives team to build the tests they use a mixture of pain (woke-up at 3AM due to production failure) and mitigation support team (hold their hands at 3AM) to encourage them to build the tests. For example if you had these test you wouldn’t be awake at 3AM trying to mitigate the issue. They also don’t try and solve the issues at 3AM but mitigate them so others can also learn about outages that affect their system.

Talk: Building in reliability

Interesting talk focusing in on availability of systems within organisations. The speaker walked through how you could go from 99% availability to 99.99% and how it is a learning journey. Used a simple analogy going from crawling, walking and running to get your availability towards what makes sense for your organisation. Essentially can you do it manually, can you script it and can you automate it? I find this slide as a great way to help others understand what the outcomes are at each stage going from 99% to 99.99%.

Panel: Measuring Value Realisation Through Testing in Production

I usually only see these types of conversation from tester focused panels but none of this panel where testers. Tester focused panels typically focus on testers testing in production but this was very much focused on learning from real users in production. Interesting thing from my prospective was they made all the points that I would expect a reasonably experienced tester to bring. In some cases due to their roles being out of testing they focused on the costs and benefits that were outside of simply testing in production e.g. down side of A/B testing or product management mindset shifts that need to happen to embrace learning from users rather then whatever the road map they have decided says.

In some ways testers testing in production almost act like the middlemen of the learning that happens during testing. Could it be that in some cases testers are getting in the way for teams to learn effectively from testing in production?

Building confidence with automation

To build peoples confidence with automation you first need to understand why you’re doing it. 

Why do we automate things?

If you look at automation in general the reason to do it is because we have some sort of repetitive manual task that we want to be able to do automatically. By doing so you would remove any inconsistency that could occur from doing it manually. This would also make the output of the process reliable and repeatable as and when you need it. In short automation can make processes consistent, reliable and repeatable. 


This usually leads to other benefits too such as the ability to scale up the automated process in terms of frequency and speed all while reducing costs in some scenarios. Essentially you can take advantage of economies of scale


The benefits of consistency, reliability, repeatability and scalability is that this helps the people associated with that process to have confidence in the output of that automaton. They can either see the process happening again and again or can inspect the output to validate their confidence in the process. You could even take it step further and automate the inspection too. 

But what about test automation? 

The above works well for say automating a physical process such as making a glass bottle. You can either see how the bottle is made or inspect the end product. But when it comes to test automation you can’t “see” the test occurring (or any software process for that matter) and the only output is likely to be a result: pass or fail. 

The only way to gain confidence in the automation is to either inspect the code (process) or your confidence is based more on the person doing the automation. You trust that they wouldn’t fake it or maliciously do anything wrong. 

If that confidence is lacking then the only way to feel confident that the system being tested works is to test the system again. Therefore any of the benefits gained from automation have been lost as you are now duplicating the effort. The biggest loss being the economies of scale.

This is by far one of the biggest reasons why tester in teams have very little confidence in the automation. They don’t know what is covers, how it works or even if it’s being done to a high standard. If your job is to understand and raise risks within the team then this almost leaves them with no choice but to test it again. 

Building confidence 

If you’re a developer or automation specialist then you have two options in improving peoples confidence in the output of the automation. Help them “see” the process or build their trust with you. Both of which will go some way to improving their confidence with the process.

Better yet, help them understand the principles behind the automation which if done with humility and compassion is naturally going to lead to those people trusting you as well. By helping them understand the principles behind the automation you enable them to work out for themselves what is and isn’t being automated but also to what standard it is happening to. This then lets them see where the gaps are in the process which they can raise as risks or work to plug them up. 

I’ve written about building a team understanding of unit testing which details how you can document your principles in a way that is accessible. You can use this method to document any team principle not just unit testing.  


Want to read more about automation than check out my previous posts on UI Automation what exactly is it good for and the unintended consequences of UI automation.


The unintended consequences of automated UI tests

Whenever I see people talking about automated testing I always wonder what type of testing they actually mean? Eventually someone will mention the framework they are using and all too often it’s a UI based automation tool that allows tests to be written end-to-end (A-E2E-UI). 
They are usually very good at articulating what they think these tests will give them: fast automated tests that they no longer need to run manually, amongst other reasons.

But what they fail to look at is the types of behaviours these A-E2E-UI tests encourage and discourage within teams. 

They have a tendency to encourage  

  • Writing more integrated testing with the full stack rather then isolated tests 
    • Isolated behaviour tests (e.g. unit, integration, contract tests etc) run faster and help pinpoint where issues could be
    • A-E2E-UI test will just indicate that a specific user journey is not working. While useful from an end user prospective someone still needs to investigate why. This can lead to just re-running it to see if it’s an intermittent error. Which is only made worse by tests giving false negatives which full stack tests are more likely to because of having more moving parts 
  • Testing becomes someone else responsibility 
    • This is more apparent when the A-E2E-UI test are done by somebody else in the team and not the pair developing the code 
    • Notice ‘pair’ if you’re not a one-person development army then why are you working alone? 
      • Pairs tend to produce better code of higher quality with instant feedback from a real person 
      • It might be slower at first but it’s worth it to go faster later 
      • This is really important for established businesses with paying customers 
      • A research paper called The Costs and Benefits of Pair Programming backs this up but it’s nearly 20 years old now so if you know of anything more recent let me know in the comments.
  • Pushing testing towards the end of the development life cycle 
    • The only way A-E2E-UI tests work is through a fully integrated system therefore testing gets pushed later into the development cycle 
    • You could use Test doubles for parts but then that is not an end-to-end test.
  • Slower feedback loops for development teams 
    • Due to testing being pushed to the later stages of development developers go longer without feedback into how their work is progressing 
    • This problem is increased further when the A-E2E-UI tools are not familiar to the developers who subsequently wait for the development pipeline to run their tests instead of doing it locally
  • Duplication of testing 
    • As the A-E2E-UI test suits get bigger and bigger it becomes hard and harder to see what is and isn’t covered by automation 
    • This leads to teams starting to test things at other levels (code and most likely exploratory testing ) which all add to the development time 

These are just some of the behaviours I’ve observed A-E2E-UI tests encourage, but they also discourage other behaviours which maybe desirable. 

They can discourage development teams from

  • Building testability into the design of the systems 
    • Why would you if you know you can “easily” tests something end-to-end with an automation tool? 
  • Maintainability of the code base
    • By limiting the opportunities to build a more testable design you decrease the maintainability of the code though tests 
    • If you need to make a change it’s harder to see what the change in the code affects
    • By having more fine grained tests you can pinpoint where issues exist
    • A-E2E-UI tests just indicate that a journey has broken and how it could affect the end users
    • Not where the problem was actually introduced  
  • Building quality at the source 
    • You are deferring testing towards the end of the development pipeline when everything has been integrated.  Instead of when you are actively developing the code.
    • Are you really going to go back and add in the tests especially if you know an end-to-end test is going to cover it?
  • The responsibility to test your work 
    • With the “safety net” of the A-E2E-UI tests you send the message that it’s ok if something slips though development 
    • If it affects anything the A-E2E-UI tests will catch it
    • What we should be encouraging is that it’s the developers responsibility to build AND test their work
    • They should be confidant that once they have finished that piece of code it can be shipped 
    • The A-E2E-UI tests should acts as another layer to build on your teams confidence that nothing catastrophic will impact the end users. Think of them as a canary in the coal mine. If it stops chirping then something is really wrong…   
  • More granular feedback loops
    • By having A-E2E-UI tests you’re less likely to write unit and integration tests which give you fast feedback on how that part of the code behaves 
    • Remember code level tests should be testing behaviour not implementation details 

If A-E2E-UI tests cause undesirable behaviours in teams should we stop writing them? While they are valuable at demonstrating end users journeys we shouldn’t be putting so much of our confidence that our system works as intended into them. They should be another layer which helps build the teams confidence that the system hangs together. 

If we put the vast majority of our effort and confidence into these automated end-to-end tests than we risk losing one of the teams greatest abilities: building testability into the design of our systems. But just like the automated UI tests building in testability takes conscious effort. This will take time, patients and experience for the whole team to understand and benefit from.

UI Automation, what is it good for? 

TL;DR: What automation at the UI level does and doesn’t give you.
UPDATE: I originally wrote this back in March 2015, lost it in my drafts and found it again recently so thought I get it out there. Don’t agree then let me know in the comments.

Automation fallacy

Every time I speak with different teams and organisations a theme constantly comes up, UI automation and how it’s going to solve all their problems. The thinking goes that if we can automate more of our tests – read test scripts –  then the Testers no longer have to check that item anymore. This then frees them up to do more interesting things like exploratory testing or that the Tester can be done away with altogether.

There is also a notion that automating all the regression checks will drop the regression test cycle from days to hours. This then supposable allows the team to move faster and release  quicker then before.

What everyone seems to miss is that Automated checks are generally built to check one thing and will tell you if that thing is still there or behaving as the script has been programmed to tell you. If anything else happens that wasn’t programmed into the check then it fails or stops dead, relaying on someone having to then go look and see what went wrong.

A Tester on the other hand can look for workarounds, workout what may have caused the issue or go find other issues based on the information they’ve just learned.

So should we give up on UI automation and face that we’ve got to do rounds and rounds of regression testing and hire more Testers? Well no, what we need to do is ask ourselves

Why are we automating?

It’s looks like a simple question and most people (including myself in the past) would be able to give you a list of answers but what we forget to question is by automating this check what does it tell me when it passes or fails? If it passes does that now mean I no longer have to check that feature or scenario again? If it fails what does that tell me? That I have to check that scenario manually?

When a check fails what do we expect the team to do? Stop everything and investigate the issue? Carry on as normal and hope someone else will check it? Ignore the issue altogether? Who is responsible for checking the issue? Developers, Testers, dedicated automation engineers?

There are a lot of reasons that people give as to why they want to automate their testing such as

  • Reduce test/regression testing
    • The reason for regression testing is to see if the changes you’ve made to your code base haven’t broken anything existing.
    • Unless you have automated all your UI Checks/regression suits then automation is not going to help you as much as you think it will
  • Spot issues/bugs faster
    • Automation doesn’t find new bugs it only tells you that the check you’ve scripted has broken in someway. You need to tell the script that if action A doesn’t produce result B then fail with an error message. What normally happens is the check fails in away your didn’t anticipate.  Don’t forget if you knew before hand how something would break you would probably have put in a fix. Thats why they are called defects something behaving the way you didn’t want/anticipate
  • Free up Testers
    • Potentially but that is if they trust the automation
  • Consistently check a feature the same way
    • This is one thing a automation check is very good at
  • Something that is laborious or difficult to setup and check
    • Another good candidate for automation. We use it to do policy testing of our apps as it’s time consuming and prone to error when trying to manually test
  • We’re doing Behaviour Driven Development (BDD)
    • BDD is not about automating but more about collaborating to understand and create features. The automation is just one small part of it and even then it’s not about testing the UI but the business logic which could be tested at the Unit level
    • If you ever hear a development team saying ’The BDD tests are failing’ then its good indicator that they are probably using BDD incorrectly
  • To release faster
    • Again because you need to do less testing, see Reduce test/regression testing
  • It’s a part of continues integration/delivery so we have to
    • No thought into what you are automating other then it’s what people say you have to do
  • Test manager or some other higher up tells you to
    • Someone thinks that just telling a development team to automate their testing will help them, see above
  • People within the development team or key stockholders don’t trust the developers work
    • The test team are being used as a safety net to check the developers work which tends to have a self-fulfilling prophecy for the developers who start using the test team as that

What does an Automated Check actually do?

Let’s start with an example from an mobile app but could very easily be any platform of your choice:

A simple automated scenario could be when the home page is loaded and I’ve selected an option then I expect to see items X, Y and Z.

Things this scenario will need to do are:

     Start the application
     Wait for it to load up
     Select a menu option
     Wait for the new screen to load
     Then check that the expected items are on screen.

animated gif of example automated check
Click to view animated gif of example automated check

So you can run this check over and over and know that as long as the sequence doesn’t change and the items you are checking for are there then the test will pass. What it isn’t going to tell you though is

  • Formatting issues with any of the screens loading up
  • It is starting to take longer for pages to load in
  • The ordering of the menu options has changed
  • There are new menu options
  • The items you are checking can be seen by the automation framework but nothing is actually visible on screen
  • There are new items on screen that the check is not looking for

All of the above could also be scripted into the check but would likely take quite a bit of effort and you can’t always predict how an app will behave and therefore not be able to script for it.

This is where a real Tester has the advantage. You don’t need to tell a Tester to look for these things they will do this without being prompted and not only that generally a lot faster then an automated check. They can also tell you if it doesn’t feel right or perform in away that would be acceptable for end users which can be very hard to quantify and therefore automate. They can also use the information that they’ve just learned and apply it to what else they can discover. An automated test isn’t going to be able to do this, not with the tools we are using at the moment

Where a Tester can’t match an automated check (or will find very hard to) is checking the same thing in the same way consistently and quickly. As long as they are no physical moving parts an automated check can normally carry out the scenario above in seconds only being delayed by waiting for things to install or load.

So should we stop automating our checks?

Before any team starts to think about automating their testing via the UI they should first, as a team, ask themselves:

Why are we automating?

It sounds like a simple question but as I explained earlier people tend to have differing views on what the automation is actually going to do for them. By talking about why they want to automate they are more likely to come up with solutions that will actually address the problems.

One of the main benefits that I’ve seen from automation especially at the UI is faster feedback that the app:

  • Can actually be installed on a real device/displayed in a browser
  • The app can start without crashing
  • The app can reach any endpoints that it relies on
  • The apps core feature, the one thing it is designed to do actually works for your users e.g.
    • BBC iPlayer: Can video actually be played
    • Google maps: can provide directions to a destination
    • Amazon: allows you to buy products
    • Facebook: The feed shows you what your friends and family are doing

To do this manually, every time a build is made, could take some time but not only that is very tedious and from my experience just doesn’t happen. What tends to happen in this scenario is that developers will wait and see what comes back when Testers finally do test the app. This could be some time from when the change was made and the Testers finally being able to test it.

The longer this feedback loop is the hard it is to fix due to the overhead in understanding what went wrong and what changed to cause the issue. This is exacerbated when working with legacy code especially when not written by the developer making that change.

By automating just the core journey the development team know very quickly that whatever was last committed hasn’t caused a catastrophic failure and that the apps core feature is still functioning. If there is a failure then you can back out the change (or ideally fix it) and get back to a working state. This helps the whole team know that the app works and improves the team overall confidence that if I install this app it’s actually going to be worth their time. There is nothing more frustrating especially in mobile development to get a build, find the device you want to test on and install it only to find it can’t carry out it’s main job for the user or worse crashes on start.

When things fail so easily and obviously it does nothing to instil confidence in the development team more so when your key stakeholders find the issues. This also allows you to start using your Testers for what they a really good at testing and not just checking your developers work.

Core Journey

We use the concept of PUMA  to decide what our core journeys are and ultimately what we should and shouldn’t automate. A generally rule of thumb is if it’s not a core journey then can it be covered by a unit/integration test not invoking the UI. If it still can’t then why would automating it help? Who would do it? How often does it need to run and how quickly do we need feedback that it’s broken? Could we monitor the app stats to check if is still working rather then automating it? If it does break how bad would your users be affected/perception be? Could it be controlled by a feature toggle that allows it to be switched off in the live environment?

So the next time someone asks why don’t you just automate your testing ask them “Why are we automating?” You might realise that the problem they perceive can easily be addressed by one simple automated check rather then 100’s of automated UI checks.

The Do’s and Don’ts of Mobile UI Automation

(…from my experience)

Original posted on Medium

My name is Jitesh ‘Jit’ Gosai and I’m a Senior Developer in Test (DiT) at the BBC working in Mobile Platforms, Digital.

I originally wrote this post well over a year ago (2014!) and recently came across it again so thought I might as well get it out there. Most of the points still stand so should be useful for anyone getting into automation or are already on the road in doing so. Got any other tips then let me know in the comments or Twitter (@JitGo).

During our journey of automating our mobile user interface (UI) testing for iPlayer, we’ve learnt many things (good and bad) that I would like to share with you to help you with your work.
Below, I have compiled some do’s and don’ts to help you and hopefully save you from making the same mistakes as we did.

Android: adb (android device bridge) is your friend

Learn all the little things you can do with adb from simply listing connected devices, to grabbing screenshots. Adb will allow you to do quite a lot with a connected device so make sure you are comfortable with it. My suggestion is to begin with the google developer docs and then start searching around for things you would like to do; for example waiting for a connected device to have started, restarting devices, or sending simple commands to control the UI (don’t try to automate your test this way, you’ll be looking at a lot of pain).

Do learn to control your devices remotely

Learning to control your devices remotely will save you a lot of effort both in time and flow, especially if you have devices connected to your build server which is not physically nearby (in our case, locked in a server room)
Being able to remotely view the device screen and restart the device is very useful and can save you countless journeys when your device ends up in a state you can’t identify. It is also great for being able to return your device to the home screen and therefore into a known state before resuming testing.

Do ditch the android SDK emulator

We tried to use this but found it too slow and unreliable, randomly crashing or disconnecting itself from adb. Intel HAXM was faster but proved to be too flaky and would also crash intermittently. Maybe it was something to do with our setup but we just went with connecting a device to run the jobs and proved to be a lot more reliable. Genymotion is another option which is free for small teams so could work for you.

Do be patient!

When first starting out don’t be tempted to keep playing with your test/build environment for little improvements. Let it settle and run. Know exactly what needs fixing and do just that, not things that are nice to have but essential. Then once you have stability then start to slowly add the things that may help improve speed but with an eye on reliability.

Do be available for pairing

Pair with devs and build tests together. This gets devs familiar with how to write them, encourages them to write tests and stops you being the bottleneck when they break. Also show the UI tests to your Testers to get them familiar with what can and can’t be automated. I’ve found that by walking Testers through an automated UI test vastly improves their knowledge of how they work and things that it can miss.

Do test one thing and one thing only

Don’t be tempted to cover lots of things in one test as it can be harder to tell what has broken when they fail and harder to debug.

Don’t use (or use very sparingly) canned steps

When using step definitions for automating tests it is very tempting to keep reusing steps you’ve already written which at first works well. It can even allow non-technical members of the team to automate test but can result in very verbose scenarios – which are harder to read and more difficult to amend at a later date. We always create methods for interactions with the app e.g. go to home or go to channels in the case for iPlayer. We then use these in our step definitions. This way if the path to get to the to home or channels pages changes then you just update the method and all tests get the change, no need to update all your tests. It also allows you to write tests faster in the long run.

Do use the Page object pattern or similar

This will vastly improve the readability, maintainability and reusability of your tests and teach anyone who will be working with the code how to use it effectively.

Do push your test framework (calabash, appium etc) as far down in the stack as possible

Don’t litter your test code with your frameworks commands but instead delegate that to a module that you access via an interface.
This way if a command changes in a new version you only need to update in one place and if you decide to switch frameworks you can potentially a lot easier.

Don’t fall in love (with your tools!)

They are just that, tools, to help you in your task. If you find that it is causing you more problems and no matter how much searching you do no one can help then dump it and switch to something else (if there is).

Don’t automate everything under the sun

Automate the areas that are going to give the most valuable feedback i.e. that something has broken. We’ve found that UI test are most useful when the dev’s are actively working in those areas so focus your UI tests there.

Do have the need for speed

Keep your tests as fast as possible. Running iOS test through the simulator is faster than devices so we tend to stick with running them there. But for android we use devices as it’s more stable. So you need to weigh up what you need – stability always trumps speed. With that said keep your tests as fast as possible. Don’t use sleeps in your tests (or avoid them at all costs). Use waits that will check repeatedly for something to be on/off screen before proceeding or raising an error.

Do stub it out!

Stub out the data your app uses.We use Charles Proxy with some ruby scripts to automate the launching, closing and loading in of config files. The main reason we used this approach was that devs and tester were familiar with it so the learning curve was easier and was a quick solution that we came up with until a more appropriate solution could be developed/re-purposed such as REST-assured or WireMock.

Do use Metrics

Stats, stats and more stats. Use data to back up your ideas and collect as much as you can
Stats such as test execution time, pass/fail rates, number of tests, no of runs and charts these. We us dashing as it’s quick to setup and offers a very nice visual way to display data.

Dashboards displayed by the app development team. From the Top left app statistics via Graphana and app usage. Bottom left dashing board showing test status, code metrics using custom bubble graphs. Far right screen showing current build status

 

dashboard showing stats on TV stands

Dashboards displayed by the app development team. From the Top left app statistics via Graphana and app usage. Bottom left dashing board showing test status, code metrics using custom bubble graphs. Far right screen showing current build status

I’ve found that the Riskshaw widget to be very useful. Some teams have also been experimenting with AtlasBoard which looks promising.

Do learn your tools

If you are using Ruby (as we are) then learn to use a REPL such as Pry. This will save you countless amount of time when debugging or even creating tests. Watch this video by Conrad Irwin for a great introduction to Pry and REPL driven development.

Do have multiple tests around any given area of functionality

This way if a test fails you know it could be flaky but if all the tests in that area fail you know straight away something is wrong.

Do read these posts!

This excellent post by Gojko Adzic for anyone attemptig to automate testing at the UI level.
If you are on the journey of UI automation or plan to start then I also highly recommend reading this post by Dan North, The Siren Song of Automated UI Testing.To give a balance view on automated UI testing and that it’s no silver bullet to removing manual testing altogether or other (arguably better) forms of automated unit/component testing.

Got any tips on automated testing? Then let us know in the comments and you never know it may make it in the list above.

 

 

 

 

Automating BBC iPlayer mobile testing part three: legacy vs new features

Originally posted on the BBC website 30 June 2014

This is the third and final part my series of posts about how the BBC iPlayer Mobile team automate their user interface testing. You can find the first part here as well as the second part here.

Legacy features Vs. New features

Native BBC iPlayer applications on mobile have been around since 2010. As with any tool, we inevitably had a smaller range of automation options available to us in the early days. This means we have a large suite of manual regression tests that are executed prior to deploying new application versions.

It was a decision agreed by everyone that going back and trying to retro fit automated tests to legacy features would be very time consuming and the team wouldn’t get the benefits of having automated tests for the features they were currently building. This meant that we would then only build automation tests for features that are actively being developed.

To help address the backlog of manual tests the DiT’s on the team, when free, will pair up with Test Engineers and see what legacy areas of the app would benefit from automated tests. Using this approach we are slowly building up an automated regression suite to be run each night on our latest development build.

The Future

At present the automation tests are only executed on a small handful of iOS and Android devices plugged directly into our Continuous Integration (CI) server while we get our build processes stable and reporting less false positives. The long term ambition is to run these automation tests on as many real, physical devices as we can. We’re working closely with our Test Tools team who are currently developing a device testing platform for Mobile, Tablet and even smart TVs that will help us scale our testing efforts (look out for a blog post on this soon!)

With the team well underway with creating feature files collaboratively and automating features we hope that this will enable us to release new versions of the app quicker and with more confidence around stability and quality.

Since we started automating more of our testing it has raised some questions on its effectiveness on the development process and whether it really is the right way forward. What do you think? How would you move forward?

50-70 tests are quite easy to manage but what happens when you start to get to 100, 200 or even 500 tests?

UI tests tend to be more flaky than other types of automated testing due to testing the system as a whole rather than in focused units. This leads to greater number of points of failure so how do you limit the number of places the tests could fail?

Our tests currently use a cross platform framework (Calabash) but would a more platform specific tool be better i.e. iOS instruments and Androids Espresso? The DiT team intend to spend some time evaluating the alternative solutions.

UI automation tests only tell you that a specific thing is still working e.g. you can still add an item to your download queue.

However, it will not tell you that the item added is incorrect (perhaps the episode of Eastenders you selected and added is actually Top Gear when you play it back). This is an example that an automated test would never be able to verify and shows where manual testing effort continues to be required and add value.

Automated tests don’t prevent bugs they just tell you that a bug exists (admittedly sooner than typical manual tests). We consider them a small part of the bigger picture of software development which should include other best practices of unit testing, test driven development, pair programming and excellent manual testing techniques.

What would you do? How would you move forward with the automation testing? Comment below and we can explore them in future blog posts.

We’ve learned a lot and still have a lot more to learn but I will be posting again with lessons learned on best practices on how to automate testing for mobile.

If there are other topics that interest you about testing then do let me know in comments!

Originally posted on the BBC website 15 August 2014

Automating testing for BBC iPlayer mobile part two: automation

Originally posted on the BBC website 30 June 2014

This is the second part of a three post series exploring how the BBC iPlayer Mobile testing team has integrated automated user interface (UI) testing into their development practice.

This post will deal with automation.

By creating collaborative feature files through the “3 Amigos” sessions and setting up a robust system for creating and disseminating them, the natural next step is to begin automating them to increase productivity and quality.

To make the tests as easy as possible to write we implemented the page object pattern so that the developers were clear about how to write more maintainable and less flaky tests. This also meant that test were written more consistently and allowed for more code reuse.

In addition to the page object pattern, we created helper modules that contained all the commands that they would need to drive the app, so it was easier for developers to quickly look up what commands are available, and demonstrated how to use the inbuilt debug tools to query the app to find the screen elements.

Although we explored many different options, we decided to use Calabash and Ruby as the predominant tools to automate our tests as they worked cleanly with Cucumber (which is our test runner) and because Calabash had support for both iOS and Android. To help everyone get to grips with the new systems, internal workshops are held to step developers through real life examples, aiding them to organise the feature folders, creating page objects and types of Calabash commands available to drive the app. By providing step by step guidance, everyone is able to get a strong understanding of the process and where they come into it.

Initially creating the automated UI tests is a slow process as you are required to create a fair amount of support code (including page objects, working out how to access elements on screen and working around timing issues with the app) but once these foundational aspects are set in place, automating tests gets faster and faster.

If a developer ever gets into difficulty Developers in Test are available to pair up to help iron out any problems.

There are many advantages to developers writing the automation tests. Ownership creates a sense of responsibility and a smoother process for delivering and testing the products. It also drives the developers to look at the results and take advantage of the benefits of faster feedback.

With developers using the feature files to write their tests, it ensures that the product is as intended, rather than based on an assumption, which speeds up the development process. The benefits of this is that everyone takes mutual responsibility for automation and prevents testing being pushed to manual when a DiT is absent or unavailable, which keeps the process moving more succinctly and effectively.

Running Android tests

Another benefit of using Calabash is that it uses the accessibility labels to access on screen elements. If the developers build the tests they have to enable the labels therefore helping to make the app more accessible. For more information on accessibility practices see Senior Accessibility Specialist Henny Swan’s blog posts. 

You may be wondering what the DiT’s are doing if the developers are creating all the automation code?

DiT’s remain embedded in the team and available for pairing to help automate tests that are not straightforward. They help build up tools to aid automation e.g. worker methods to carry out complex interactions or how a feature could be automated if not immediately obvious. They help keep Continuous Integration (CI) jobs running and investigate brittle tests. DiT’s also tend to be the experts with the automation frameworks so advise if a feature is worth automating or it’s better to test it manually.

Once the feature file has been automated the tests are pushed into the main build pipeline. They will be run approximately 4 times a day and a subset on each check in of code. We have our build jobs status displayed on large screens (one of the advantages of working near the TV platforms team is that they have a lot of reference TV’s that we can use when not being tested on) so if anything fails the whole team know straightway.

Build monitors

In the final post of this series I’ll tell you we handle legacy and new features and what the future holds for our team.

Originally posted on the BBC website 06 August 2014

 

Automating testing for BBC iPlayer mobile part one: 3 Amigos

Originally posted on the BBC website 30 June 2014

In this three part series of blog posts I will be exploring how the BBC iPlayer Mobile team has integrated automated user interface (UI) integration testing into their development practice.

I’m a Senior Developer in Test (DiT) working in Mobile Platforms, BBC Future Media. I work with the BBC iPlayer Mobile team to help them automate their testing, investigate new tools and advising how best to use them in their everyday work, sharing this with other teams across the BBC. In the 16 months that I have been with the BBC I have seen a great deal of change in development practice, which I will be sharing in this series of posts.

I was initially brought onto the team to identify how to automate a greater number of tests in order to increase the speed of release without risking the quality of the end product.

When I first joined the team it was apparent that the developers had all individually started to automate some of the tests, however it became clear that there was no continuity to the test scripts, with each developer using their own styles. Inevitably, when a script broke, if it wasn’t investigated by the developer who wrote the test, it would take a long time to identify the problem and to repair the issue. Because of this, it would usually result in adding a simple patch to keep it running or by disabling it. Because of the issues around automation, the team began to lose confidence in the testing method and reverted back to manual.

The lack of systems within the process was problematic in itself with some features having a lot of automation testing carried out and others receiving little or none, with no-one taking responsibility for ensuring that the testing was happening. This meant that each test was insular with only the designated developer having access to the results.

From the outset, it was decided to take things slowly and begin with the area that would give the most value with the least amount of effort. The team understood that feature files are a great way to describe how the systems should work and that a collaborative approach was needed for successful implementation. It was here that we decided to use the idea of the ‘3 Amigos’ to write the features.

3 Amigos

To set up the ‘3 Amigos’ we needed to recruit a developer from each platform (iOS and Android), a tester, a product owner/business analyst and a DiT. Now this is obviously more than three “amigos” however we needed to have a representative from each area of the process and the DiT to lead the sessions until everyone felt comfortable with the process and able to run them independently.

The advantage of having a DiT, or anyone experienced in writing feature files, is to act as chair and mentor. They are able to guide the team to write concise sceanrios and ensure conversation stays on track. They also help to make sure that everyone in the meeting contributes and is comfortable with what the features where specifying.

Ordinarily, the process would start with the user story, created earlier by the Business Analyst (BA) working with the Product Owner. This will help to identify each scenario to cover the feature and only entering into the given/when/then steps if it wasn’t immediately clear how a scenario would play or if there was confusion amongst the team. Once the sessions are over the DiT or BA will flesh out the remaining given/when/then, attaching it to the user story in Jira.

3 Amigos gather around a BBC iPlayer screen
3 Amigos gather around a BBC iPlayer screen

Because BBC iPlayer is available on iOS and Android we only ever had one feature file that both products would use. This would make sure that we kept feature parity and aided us to start delivering features on both platforms at the same time.

‘3 Amigos’ helped everyone involved develop a strong understanding of a feature and how it may need to be altered to work on each platform. This also helped to foster a more collaborative approach to creating feature files and to develop a better understanding of what the Product Owners wanted without the solution being prescriptive on the team, letting them decide how it should work.

Anyone not involved with the 3 Amigos session could read the feature file or speak with any of the developers or testers present to get a heads up. We try to make sure that different developers and testers attend the ‘3 Amigos’ to make sure everyone can run a session without a particular person becoming a bottle neck.

Once the developer has picked up the ticket to develop, they will submit the feature file into our source control system removing any reference to the feature apart from the user story and any acceptance criteria leaving only a link to the location of the feature file for future access. This ensured there was only ever one version of the truth and if any changes were required then there would be an audit trail to identify who made the alteration.

In my next post I will expand on how we use the feature files to automate our testing.

Originally posted on the BBC website 30 June 2014