User Experience User Research

Differences in formative and summative evaluations (and why they matter for UX designers)

A few weeks ago Steffi spoke at the Berlin Ladies that UX meetup “An evening about user research” about the different types of usability tests. Here is a brief summary of the talk:

As user experience designers we constantly run into situations where it will be absolutely essential that we have and urge our team members to have a shared understanding of terms and definitions and also shared expectations of outcomes within our team. Otherwise, chances are high that at some point sooner or later misunderstandings will happen, which can lead to worst case scenarios, just because people got their wires crossed.
This is also true when it comes to usability testing/evaluations.

Yay! Lets’do an usability evaluation!

You may have experienced the following situation:
At your company, someone has a great idea, namely to invite some users to evaluate a product aka a usability test. Yes, that really is a great idea! Ten seconds later the discussion about how many participants should be involved in your test is in full swing and nobody is talking about the question which matters most. The discussion should not be about the number of the participants, it should be about what you want to get out of the test, what you want to learn and the overall goal of this test.

What is the question you are trying to answer with this test?

The answers to this question differ depending on the stage of the project during which you test. If you hear such a discussion, you might be witnessing people talking about two very different things.

In fact, we can split evaluations into two broad categories, namely summative and formative tests depending on what questions you are trying to answer:

Summative evaluations/tests

Summative tests, are used when you’re trying to evaluate the usability – defined as efficiency – of a completed product. With summative evaluation/ testing, your main interest is in the statistics of participants behavior.

Summative testing can be done with comparative tasks:
Questions comparative evaluations can answer are: “Is some design A better than design B?” ( Is our product better than our competitors, Is the new version better than the old one etc..)

Summative testing can also ask if the solution meets the performance requirements.
The question here is “Does our design meet a specific benchmark? (e.g “our users are able to accomplish checkout in less than x seconds”, or: “95% of users succeeded in accomplishing task y”)
Benchmark driven tests are most often seen in performance critical domains, like e.g healthcare, industry and gaming.

Summative tests are driven by statistics, which means that they require statistical training and they will not explain why something has happened – so the data you get out of summative testing is mostly quantitative.
This also means that the number of participants that you need can vary because it depends strongly on the statistical methods that you will be using to calculate the outcome. Often summative tests will require 10 to 20 participants (could be even a lot more) but please ask your statistician of choice. 🙂

As we can see, summative testing may be hard to conduct and calculate for people without stats knowledge, so you will need an expert on this. And for this reason and some others – like the increased number of participants – summative tests are fairly rare used in user experience research – except for website optimization (online A/B Testing).
You can compare summative evaluations/testing to school exams – every exam is a summative evaluation in this sense because it will never explain why you didn’t pass the exam – it only shows the outcome: that you did or did not pass the exam – or in our case: the product met a pre-specified criteria/hypothesis.

Formative evaluations/tests

Much more common in UX research are formative evaluations.
These are tests that are mostly used as part of the (or: an iterative) design process where we are answering questions like “How do people experience our product?” Or: “What are the biggest problems with our product that we need to fix?”. Formative tests are performed when the goal is to identify problems. It helps to “form” the design for a product or service.

Formative tests give us qualitative insight by answering the “whys”, they can answer questions like how people actually experience the design and see where and why they might get stuck because you observe them directly, hear what they say using the thinking aloud method.
In the best case, these types of evaluations/tests should be conducted and repeated throughout the whole design process to identify problems at an early stage of system design.
In contrast to summative tests where we aim for an outcome like: “40% of our users were able to accomplish tasks in under 30 seconds” an outcome of a formative test might be: “people struggled to complete checkout because the buttons labeled OK / CANCEL seemed confusing to them”. You can clearly spot the “why” here.
Formative usability tests typically need 5 to 7 users, and the data you get is mostly qualitative.
For example, cognitive walkthroughs and heuristic evaluations do also count to formative evaluations (they are done without external participants but by yourself).

Why this is important

To sum it up, it is important to know that there is more than one kind of evaluation or testing method. Even if you will probably never ever do summative testing by yourself it is important to know that these tests exist. You need to know how they differ compared to formative testing and explain to your team or client why you would choose one type of test over another.
Like mentioned at the beginning it is important that you build a shared understanding of outcomes with your team. Sometimes the expectations of an outcome will vary depending on the roles people have and the point of view they have towards the product and then you as a user experience designer should be able to explain what the goals of your test are and why you chose the method that you did.

Img source: (CC BY-SA 2.0)