Sola Virtus Invicta.

This is a software testing blog I wrote sporadically for a few years between September 2013 and January 2017. The formats here are a bit funky with the images missing. Hit the button over there to view it in situ on my old free Wordpress site.

Teaching, Testing Teaching, Testing

Insanity and Illusion: Testing Test Cases

An experiment highlighting the insanity of test cases.

The cliche goes that the definition of insanity is to repeat the same action over and over again, and to expect a different result.

In the context of testing though, I would posit a different definition. In testing, insanity is to instruct different people to repeat the same actions and to expect the same result.

Yesterday Katrina Clokie, Aaron Hodder and I tested this theory. As part of some training we are preparing for new, graduate testers, we developed an exercise to highlight the inattentional blindness that can occur when executing scripted test cases.

The concept was simple. We wrote a set of eight detailed test cases which instruct the tester to check that the sort function on a particular auction based shopping website worked as it should. Each tester (or pair of testers) had the exact same set of eight test cases, identical in every detail. We gave them 20 minutes to execute them and, upon completion, asked them for the number of bugs they had found, as well as the number of tests which had passed, failed or did not get done.

These were our results:

As you can see, when we asked different people to follow the same test cases we got very different results.

Some groups found no bugs. One group found five. Most found two bugs, but based on the subsequent discussion these were not necessarily the same two bugs. The same phenomenon occurred when the testers were asked to allocate black or white pass/fail adjudication to their tests. In some cases, the presence of bugs appeared to result in that test case failing. In others, the tester identified a bug but still felt that the test had passed.

To me, this highlights two really important things.

The first is the inattentional blindness that can occur when executing test cases, which is what we hoped the exercise would show. This is the phenomenon that occurs when you are specifcally focusing on one thing, that means you are more than likely to miss other things that may be going on right in front of your eyes. The most famous example of this is the "monkey business illusion".

In our exercise, and in the execution of scripted testing, inattentional blindness means that because the documented procedure is focusing your attention on a specific line of enquiry, you are liable to miss bugs or interesting behaviours that occur around the functions that the script specifically highlights. In our teaching to our graduates, we want to highlight this so that they are aware of the fallibility of this method and can remember to defocus regularly when executing scripted tests, lest they miss the bigger picture.

The second crucial takeaway from this exercise, and one we didn't initially intend it to demonstrate, is the inherent danger of test cases and the implied level of equivalence that they represent. A great many testers still happily report on their testing by counting the number of tests which passed and the number which failed. This is an incredibly reductive way of representing the information that testing ought to be providing, by representing all test cases as equal.

If we were to try and report on the above set of test results in this manner, what on earth would we do? To say that we had 48 test cases, of which 15 passed and 13 failed would be criminally misleading. Clearly the actual testing which occurred from group to group here was unique. Each group brought different insights, made different judgements and elicited different information relative to the sort functionality on the website under test. To reduce these unique activities into a numeric ranking out of eight would be absurd and provide zero useful information to anyone.

As a further side note, you'll notice that when asked to report on their findings one group (3rd row) had contrived to lose a test case, while another (5th row) had gained one - they reported seven and nine "results" respectively. In both cases, they legitimately had eight test cases the same as everyone else, which further highlights the inevitable human error involved in such a practice even with relatively small volumes.

As such, the concept of a test case is a dangerous one. There is no weighting to the idea of a test case. It does not truly reflect the cognitive and temporal effort of the testing that occurred, instead reducing that effort - however great or small - to a contrived and equivalent rank. The only way to truly represent the outcomes of the testing that was done in our exercise was to talk to the testers about what they did and what they had found. The above numbers are meaningless.

Well, perhaps not entirely meaningless.

These numbers have helped us to highlight the fallibility of the test case. The exercise itself was effective in allowing our attendees to experience the inattentional blindess that comes from over focusing, but the collation of the groups' results refuted one of the oft-cited benefits of scripted test cases - that they provide an opportunity for repeatible, consistent and unambiguous test results.

Read More