Choosing the right statistical method for A/B testing

Introduction

Making decisions with a controlled level of risk is the heart of A/B testing. However, statistical methods like "Frequentist" or "Bayesian" can be confusing. This article will help you choose the right tool to use for your A/B tests.

Main available paths

When interpreting your A/B test results, you can use the Frequentist, Bayesian, and sequential approaches. Each approach carries its own set of advantages and disadvantages, and employs distinct methodologies.

Fixed Sample Frequentist

The Fixed Sample Frequentist method creates a rigid framework, allowing you to ensure you outline your hypothesis clearly before launching a test. If you stick to the restrictions you created without making any changes based on initial results, you'll get optimal control over statistical risk.

Pros

Maximal statistical power.
Most spread method when it comes to randomized controlled trials.

Cons

Very rigid.

Target

Mature experimentation teams who understand the risk they take when they don't follow the method by the book.

Bayesian

The Bayesian method offers more flexibility by incorporating existing knowledge into the analysis through prior probabilities. This approach is best when there's ample prior data available to calculate a posteriori probability.

Pros

Allows you to leverage your knowledge.
Gives you access to key data based on estimated distribution.

Cons

Can lead you in the wrong direction if prior data is incorrect.

Target

Experimentation teams that are used to working with Bayesian techniques. This method is geared toward advanced users who have an understanding of Bayesian methods and probability distribution.

Sequential

Sequential testing addresses the challenge of premature experiment termination due to interim results that deviate from expectations. Unlike traditional fixed-sample-size tests, sequential testing allows for dynamic decision-making based on accumulating data. While this flexibility offers the advantage of potentially reaching conclusions sooner, it may result in a less precise effect size estimate.

Pros

Very flexible with no need to estimate sample size.
Provides valid confidence interval at any point during the experiment's runtime.

Cons

Lower statistical power.

Target

Fast moving teams that feel pressured by the rigidity of the fixed sample framework and are ready to trade some statistical power for more flexibility.

note

To optimize experimentation, consider combining sequential and fixed sample size methodologies. Employing sequential testing allows for early termination based on significant results, while fixed sample sizes provide statistical rigor. By setting thresholds for early stopping and validating findings with traditional methods, you can accelerate insights without compromising reliability.

Other tools

Multiple testing correction

To maximize the efficiency of experimentation, consider testing multiple variations simultaneously. This approach can accelerate the overall pace of experimentation. However, to maintain statistical integrity, implement appropriate correction methods to mitigate the risk of false positive results.

CUPED

CUPED is a proven method for reducing required sample size in experiments. By leveraging pre-experiment data, CUPED improves statistical power and accelerates experimentation. To maximize its benefits, consider combining CUPED with traditional fixed sample size methodologies. This hybrid approach enables early detection of significant results while maintaining statistical power. CUPED is most effective in the following circumstances:

Your experiment includes returning visitors.
You've ran many experiments in Kameleoon
There is a correlation between the goal conversions before the start of the experiment and during the live experiment.

Introduction​

Main available paths​

Fixed Sample Frequentist​

Pros​

Cons​

Target​

Bayesian​

Pros​

Cons​

Target​

Sequential​

Pros​

Cons​

Target​

Other tools​

Multiple testing correction​

CUPED​

Further reading​

Introduction

Main available paths

Fixed Sample Frequentist

Pros

Cons

Target

Bayesian

Pros

Cons

Target

Sequential

Pros

Cons

Target

Other tools

Multiple testing correction

CUPED

Further reading