Let's take a look at what A/B testing is, why you need it, how to conduct it, what you need to run, how long it takes to test, and how to interpret the results.
Article source: iConText Group blog . Author: Stanislav Kirov, head of the web analytics group at iConText Group.
Let's say the site has a page with a target button that has a low conversion per click. We think about how to improve it, and put forward a hypothesis: if the button is colored, and not black and white, as it is now, users will click on it more often and the conversion will increase. But in fact, we do not know whether the hypothesis will be confirmed or not.
To do this, we conduct a test and create a second version of the page: option A will be with a black button, option B will be a new one with a colored button. And then we run the test.
Users who come to the pages are divided in a 50/50 ratio:
- 50% see option A, with a black button;
- 50% see option B, with a colored button.
At the end of the test, we measure the conversion of the button and look: if the conversion of option B is better, then we roll out this version of the page to the site.
Suppose you have already optimized all advertising campaigns and are 100% sure that you are bringing targeted traffic to the site: you do not have ineffective keywords, bid adjustments are correctly configured everywhere, etc. At the same time, the CPA (Cost Per Action - payment for action) is high : you spent a lot of money on traffic, but there are few conversions. Another example is the problem of abandoned carts, when users leave without completing a purchase, or a high abandonment rate, when visitors do not move further along the funnel on the site.
And here the point may not be that advertising campaigns bring inappropriate traffic. Often the problems lie on the site itself. Customers may leave at different stages of the funnel due to broken interaction logic or an incomprehensible interface. A/B testing helps you deal with these issues.
- Text (such as the title and description of the product).
- Appearance of conversion buttons or their location.
- Dimensions, appearance and location of conversion forms.
- Page layout and design.
- USP (price, product image, characteristics, etc.). ## How to conduct an A/B test
First of all, we conduct research on the site. When doing a UX audit, the analyst is guided by general rules and a subjective opinion about the convenience of the resource. But what is convenient for an analyst is not necessarily convenient for all site users. To exclude the subjective factor, A / B tests are carried out.
At this stage, we put forward hypotheses. For example, that such and such changes on the site can improve the indicator. But they may not improve, but on the contrary - worsen. This is exactly what an A/B test will show.
We create different variants and test them: A - the old version, B - the new one.
Next, we run the test. Since the hypothesis may not work and worsen the performance, we do not test on all traffic, but only on a part, for example, on 10% of users. With this 10% of users we show two different options in a 50/50 ratio.
Let's say 1000 people visit the site per day. In random order, we select only 10% for the test, i.e. 100 users, and then divide them into two groups in half: 50 users see option A, and the remaining 50 users see option B.
In the event of a negative development of events, this will allow you not to sag in conversion as a whole.
The timing of the test and the number of users in the two samples are calculated before the test is run. After the specified time has passed and a sufficient number of visitors have been collected in two samples, we collect statistics and analyze the results.
If the hypothesis is confirmed and the variant with the changes made wins, then we write recommendations for their implementation on the site.
Google Optimize has pre-set goals for session duration, revenue, bounce rate, page views, and transactions.
In addition to the predefined goals, you can use goals from Google Analytics. For example, if you need to track a click on a button, then we set up a special goal in Google Analytics that will record this, and only then select it in Google Optimize.
Google Optimize has an audience setting for which we will show the test. These may be users who came from Google Ads. The audience can be segmented by UTM tags, device categories (mobile only or desktop only), behavior (for example, select only new users).
This is very important, because visitors who have already interacted with the site may be intimidated by new products: new landing pages, interfaces. It will be difficult for them to navigate where and what is located. As a result, the bounce rate may increase.
Therefore, before launching, you need to think about which audience to run the test for. It may be better to target new users who have not previously interacted with the site.
You can also target by geodata (for example, set up a test for a specific region), browsers, and operating systems.
If you have a paid version of Google Analytics 360, then Google Optimize 360 will be available to you. Thanks to this, you will be able to further target audiences created in Google Analytics.
Before you start an A/B test, make sure:
- The results are not affected by anomalies and outliers in the population;
- the traffic division tool works flawlessly;
- data is sent to analytics systems correctly. To do this, conduct an A / A test. It is similar to an A/B test, only the groups are shown not different, but the same versions of the pages.
If the traffic and the A/A testing tool did not fail, there will be no difference in performance. The logic is this: we show the same page to two different groups, and they must have the same conversion. In this case, we can say that the data is collected correctly and the audience is homogeneous.
We had a case in practice. There was an order form on the site where you had to enter the date of birth. There was a bug in the form: it was impossible to enter the date of birth in the field if the user was under 21 years old, and people aged 18 to 21 also came to the page. They couldn't fill out the form. As a result, the bounce rate increased.
During the test, we noticed that the conversion between options A and B differed by 2-3 times. To understand why this happens, we looked at Webvisor. As a result, a bug was identified that greatly affected the final result. I had to redo the test. But if we had initially conducted an A / A test on this landing page, we would have discovered a bug in advance: one of the groups would have had a very different conversion rate.
There are special online calculators that help you quickly, without problems with various formulas from statistics, calculate these indicators. The screenshot below shows one of these calculators. We drive in the basic conversion (say, 20%) and the minimum effect that we want to fix from it, for example, 5%. Five percent relative to 20% is + -1%.
The calculator shows that it takes 25,255 users per option to commit such a change. Accordingly, in order to conduct a test, it is necessary to multiply 25,255 by the number of options (we have 2 of them). As a result, we get the figure of 50,510 users. And then we look at the current traffic. If, for example, 2000 users come to the site on average, then we divide 50,510 by 2000. It turns out that it takes about 25 days to complete the test.
When determining the duration of the test, it is important to understand what bursts of activity the audience has.
- weekdays or weekends;
- holidays (increase in demand for "gift" goods);
- sales, promotions, marketing activities (discounts increase the activity of the audience, changing the purchasing behavior);
- special events (for example, shopping for school supplies in August);
- seasonality of the product (for example, heaters);
- competitor activity (competitors have lowered product prices and your user activity has decreased);
- a change in the political and economic environment (crisis, rising prices, a ban on trade in goods, an increase in its value due to additional duties).
An example with the automotive sector: there was a period when cars rose sharply in price. Accordingly, if you conducted an A / B test during this period, then the results may be unreliable due to price increases.
Or, for example, your client is a toy store and you want to start testing a hypothesis. You calculated in an online calculator that the duration of the test is 4 weeks, and now, let's say, December 15th. The best solution would be to postpone the launch of the test until the end of January, because during the New Year holidays, buying activity changes a lot (if we run the test on December 15, and it lasts 4 weeks, then we will definitely capture the New Year holidays). All confirmed hypotheses may show very different results, for example, in the spring, when buying behavior changes.
In addition to bursts of activity, one should understand the implementation cycle of the measured metric. Most often, it is associated with the purchase decision cycle - the time elapsed from the first thought of purchasing a product to placing an order. You understand that the decision-making cycle for buying an apartment is much longer than buying goods in an online store.
Let's take an example to understand how to interpret the results. For one client, we tested two landing pages that differed in form: on the original version (in the screenshots below it will be designated as the original), it was possible to send an application for a loan with accrual to the card, in the second variant - both with accrual to the card and in cash.
The screenshot below shows statistics from the Google Optimize web interface. There are various indicators in the options table. The probability of superiority is a percentage that reflects the probability that one option is better than another at a given time.
In our example, the original version turned out to be better.
This is a simulated conversion rate (Google Optimize's proprietary model) that communicates conversion rate limits over the long term.
When calculating this indicator, only those sessions that participated in the test, as well as the conversions associated with them, are taken into account.
Having studied the graphs of cumulative, or accumulated data, one can see the following picture. For example, the test lasted 14 days. If you build a graph based on cumulative data, at the point of the first day there will be metrics for this day, at the point of the second day - a set of metrics for two days, at the point of the third - a set for three.
Option 2 is consistently worse in terms of accumulated indicators than the original. At the beginning of the test, there were conversion fluctuations in both groups. The schedule stabilized only towards the end.
Shows the lower and upper bounds of improvements that are expected for this option. So, if you roll out Option 2 to the site, this will lead to a decrease in conversion by 40% in the worst case, and 23% in the best case relative to the conversion of the original.
Based on the results of the experiment, we can say that changes to the landing page will lead to a decrease in the overall goal conversion.
Yes, it happens that the proposed option does not work and worsens the performance. I remind you that the test is not carried out on 100% of the site traffic, but only on a small part of it, so this indicator of option 2 did not greatly affect the final conversion.
Thus, A/B testing helps to accurately measure the effect of the implemented change. Therefore, even if the hypothesis is not confirmed, do not despair. Just test new options to improve your performance targets.