A few weeks ago, Steffi spoke at the Berlin Ladies that UX meet-up about the different types of usability tests.
Here is a summary of the talk:
As user experience designers we constantly run into situations where it will be essential that we urge our team members to have a shared understanding of terms and definitions and also shared expectations of outcomes within our team. Otherwise, chances are high that misunderstandings will happen, which can lead to worst-case scenarios, just because people got their wires crossed.
This is also true when it comes to usability testing/evaluations.
Yay! Let’s do a usability evaluation!
You may have experienced the following situation:
At your company, someone has a great idea to invite some users to evaluate a product AKA a usability test. Yes, that is a great idea! Ten seconds later the discussion about how many participants should be involved in your test is in full swing, and nobody is talking about the question which matters most. The discussion should not be about the number of participants but instead about what you want to get out of the test, what you want to learn and the overall goal of this test.
What is the question you are trying to answer with this test?
The answers to this question differ depending on the project stage during which you test. If you hear such a discussion, you might be witnessing people talking about two very different things.
In fact, we can split evaluations into two broad categories, namely summative and formative tests depending on what questions you are trying to answer:
Summative tests evaluate the usability – defined as efficiency – of a completed product. With summative evaluation/testing, your main interest is in the statistics of participants‘ behaviour.
Summative testing can be done with comparative tasks.
Questions comparative evaluations can answer are: “Is some design A better than design B?” ( Is our product better than our competitors, Is the new version better than the old one etc..)
Summative testing can also ask if the solution meets the performance requirements. The question here is “Does our design meet a specific benchmark?” (e.g „our users can accomplish checkout in less than x seconds“, or: „95% of users succeeded in accomplishing task y“)
Benchmark-driven tests are most often seen in performance-critical domains, like e.g healthcare, industry and gaming.
Summative tests are driven by statistics, which means that they require statistical training and they will not explain why something has happened – so the data you get out of summative testing is mostly quantitative.
This also means that the number of participants that you need can vary because it depends strongly on the statistical methods that you will use to calculate the outcome. Often summative tests will require 10 to 20 participants (could be even a lot more) but please ask your statistician of choice.
As we can see, summative testing may be hard to conduct and calculate for people without stats knowledge, so you will need an expert on this. And for this reason and some others – like the increased number of participants – summative tests are rarely used in user experience research – except for website optimization (online A/B Testing).
Summative evaluations/testing can be compared to school exams. Every exam is a summative evaluation in the sense that it will never explain why you didn’t pass the exam – it only shows the outcome: that you did or did not pass the exam – or in our case: the product met a pre-specified criteria/hypothesis.
Much more common in UX research are formative evaluations.
These are tests mostly used as part of the (or: an iterative) design process where we answer questions like “How do people experience our product?” Or: “What are the biggest problems with our product that we need to fix?”. Formative tests are performed when the goal is to identify problems. It helps to „form“ the design for a product or service.
Formative tests give us qualitative insight by answering the “whys”, they can answer questions like how people experience the design and see where and why they might get stuck because you observe them directly, and hear what they say using the thinking-aloud method.
In the best case, these types of evaluations/tests should be conducted and repeated throughout the whole design process to identify problems at an early stage of system design.
In contrast to summative tests where we aim for an outcome like: “40% of our users were able to accomplish tasks in under 30 seconds” an outcome of a formative test might be: “people struggled to complete checkout because the buttons labelled OK / CANCEL seemed confusing to them”. You can spot the “why” here.
Formative usability tests typically need 5 to 7 users, and the data you get is mostly qualitative.
For example, cognitive walkthroughs and heuristic evaluations also count as formative evaluations (done without external participants but by yourself).
Why you need to know evaluation categories and methods
To sum it up, it is important to know that there is more than one kind of evaluation or testing method. Even if you will never do summative testing by yourself, it is important to know it exists. You need to know, how summative testing differs from formative testing and explain to your team or client why you would choose over the other.
As mentioned in the first lines of this post, you must build a shared understanding of outcomes with your team. Sometimes the expectations of an outcome will vary depending on the roles people have and the point of view they have towards the product. User experience designers should be able to explain the goals of your test and why you chose the method.
https://flic.kr/p/dboC2t (CC BY-SA 2.0)