UX Research

Multivariate Testing vs A/B Testing: Which Works Better?

When optimizing a website’s conversion rate, A/B testing usually comes up first. Create two versions and see which performs better. Sometimes testing more alternatives proves necessary, for which multivariate testing performs well as a method.

So which to choose, multivariate testing or A/B testing? How do they differ? When does one work better? The following case studies will describe the test methods and compare them to answer these questions.

What is A/B testing?

A/B testing serves well when it comes to major design decisions or in the end of the design process when you need to refine the layout of a given page.

The aim of A/B testing (aka split testing) is to compare two different versions of the design to see which one performs better. Each user sees just one design (A or B), even if they update the interface. This way the same number of people will view each version and you can measure which version worked better.

You can test many things with a split test: different images, call-to-actions, layouts or copy.

It’s important to make sure there is just one design element that differs in the two variants, so you will know what caused the better results. To make it more scientific: we have only one variable here, compared to multivariate tests, where we can change many things at once.

Running an A/B test might look easy, but you have to carefully plan your test to get the right results you can trust. So we collected the 5 things you should know when you prepare your tests.

5 things to pay attention to when you plan your A/B test

The different versions should be out simultaneously. Even within a few weeks, seasonality can have a huge effect on the results. If it is raining one week, it will show in the traffic and people’s behavior. The last week of August is completely different from the first week of September. To avoid the seasonal effects, the different versions should not be tested at different times, rather simultaneously.
There should only be one difference at a time between the versions, otherwise we cannot decide why one version worked better, which modification had a greater impact. (Multivariate testing can solve this, and we will introduce it soon.)
Work with a large sample. As opposed to qualitative tests, here it’s important to get a statistically significant result. Most A/B test tools signal when the result is already significant. But we may also use online significance calculators to help us decide like Kissmetrics’ or VWO’s calculator.
Choose appropriate metrics. If you have a longer flow consisting of several steps, and you change something at a specific point of the process (e.g. on the first screen out of three), then you should not be measuring how many visitors move on from the given screen. The aim is to improve the effectiveness of the whole process, so this is what we need to focus on. In other words, we are measuring how many visitors go through the entire process as a result of the modification.
If you run multiple A/B tests, you also need to define a control version – you can use a certain version that you compare with others, or for instance, the current version of the app/webpage/interface.

How can you set up a test?

Certain apps allow you to execute A/B testing where the application randomly shows the two versions to visitors. You can use Optimizely, Unbounce, Visual Website Optimizer, or Google Analytics Experiments. Some of them have powerful visual editor features, so you don’t even need to code to run experiments.

If you don’t know how big your sample size should be, use Optimizely’s sample size calculator.

A/B testing case study

The Smartwool webshop tested their category page. They had two design versions. One of them was more visual, with a nice picture at the top of the page, presenting individual products playfully, in different sizes. The other followed traditional e-commerce conventions, presenting the products in a rectangular grid, in similar size, and the pages started with the product immediately.

During the test, they wanted to know which induced more shopping (in dollars) in visitors. The traditional and slightly boring version won. The average income increased by 17.1% per visitor, not a bad result considering only the category page changed.

Note they have measured the end-result of the whole shopping process. Perhaps the large photos on the category page led to more clicks through for individual products, but a harder-to-browse page let fewer people find what they wanted, and therefore they purchased fewer products.

At least this is what we would assume, but as we only see the numbers, we don’t know for sure why this design won. Besides the quantitative A/B tests it is useful to run some qualitative research as well (like user tests) to understand why certain designs work better.

It’s also worth noting that this experiment was not perfect, as two different things were changed simultaneously. So we don’t know whether the header image or the product list layout caused the difference in the results.

Read more about how these UX research methods fit into the product design process in our e-book: Product Managers’ Guide to UX Design.

What is multivariate testing?

Multivariate testing requires a bit more statistical knowledge from the research team but is a valuable tool when you are in the early stage of experimenting with many different ideas to try.

As per Optimizely’s definition, “multivariate testing is a technique for testing a hypothesis in which multiple variables are modified.”

Although data collection takes more time to be statistically significant, once you have the results, it’s easier to implement multiple changes simultaneously based on the data.

Multivariate testing looks at different combinations of at least two variables, so you can measure more design elements in all possible variations. These can be, for example, smaller changes on a website (e.g. an image, button, or text change).

This testing method can also grant insights into how the different variables interact with each other. You can examine which copy and design combinations helps more in reaching the website’s goals. You can read more about UX copytesting here.

In this case, the control group plays an even bigger role when analyzing the results, so don’t forget to define it!

How can you set up a test?

You can test it the same way as in the case of A/B tests – the webpage will show the different versions to the same amount of visitors.

Multivariate testing case study

Our Product Design book covers many case studies as well as the following example. In this multivariate test, we had to change things on the search result page. This included the call-to-action button copy with three different versions and the travel agencies’ logos.

At times we displayed the colorful logo and at others, we just put the company name in plain text. We created all the possible combinations to illustrate the impact of the two changes again. One change had three versions and the other two, making a total of six.

Consequently, both versions with the logo and without had three versions each of the call-to-action texts. See them below:

Displaying them all at the same time immediately shows the problem with the multivariate tests. Testing six versions simultaneously and quickly requires quite a few visitors.

Low-traffic sites do not work well with multivariate tests. We can only test two versions or wait a lot for the results. Either way takes too long.

The following diagram shows the test’s result:

The winning version generated five times more leads, quite a good number. This version has the same call-to-action text as the original, so the copy (or at least the new versions) did not have a huge impact on the result. In contrast, taking away the logo resulted in an increase of at least three times higher.

Later we suspected this might have happened because people will not visit the travel agency’s own website and connect with them there if their logo does not appear. This remained only a hypothesis since the A/B test reveals the truth but does not provide answers to ‘why?’ questions. Moderated in-person user tests can.

Multivariate testing vs A/B testing

Here at our UX agency, we use a lot of both A/B and multivariate testing. We collected the most important differences and present them in the table below.

Why use A/B or multivariate testing?

A/B and multivariate tests really shine in showing which solution works best in a live environment, during real use. Most product development teams run A/B tests continuously. These can tell us exactly which solution works best, but it doesn’t reveal why.

For this reason, neither A/B tests nor multivariate on their own ever suffice, always conduct qualitative research, e.g. in-person, moderated user tests.

This UX research case study tried both A/B testing and in-person testing. See how they performed.

Have you used any of these methods? What do you think about multivariate testing vs A/B testing? We would love to hear about your experiences, please share them with us in the comment section below.