Non-Inferiority Designs in A/B Testing
Author
Georgi Z. Georgiev
Abstract
Most, if not all the current statistical literature on online randomized controlled experiments, (commonly referred to as “A/B Tests”), focuses on superiority designs. That is, the error of the first kind is formulated as incorrectly rejecting a composite null hypothesis of the treatment having no effect or having a negative effect. It is then controlled via a statistical significance threshold or confidence intervals, or posterior probabilities, and credible intervals in Bayesian approaches.
However, there is no reason to limit all A/B testing practice to tests for superiority. The current paper argues that there are many cases where testing for non-inferiority is both more appropriate and more powerful in the statistical sense, resulting in better decision-making, and in some cases: significantly faster tests. Non-inferiority tests are appropriate when one cares about the treatment being at least as good as the current solution, with “as good as” being defined by a specified noninferiority margin (sometimes referred to as “equivalence margin”). Certain non-inferiority designs can result in faster testing compared to a similar superiority test.
The paper introduces two separate approaches for designing non-inferiority A/B tests: tests planned for a true difference of zero or more and tests planned for a positive true difference. It provides several examples of applying both approaches to cases from conversion rate optimization. Sample size calculations are provided for both approaches and a comparison is made between them and between non-inferiority and superiority tests.
Finally, drawbacks specific to non-inferiority tests are discussed, with guidance on how to limit or control them in practice.
Keywords
noninferiority, non-infeiority testing, null hypothesis, a/b testing, split testing, conversion rate optimization, landing page optimization, statistical design of online experiments
Selected visuals

