## What is Statistical Design?

The statistical design of an online controlled experiment (a.k.a. A/B test) is the result of the **translation of a substantive business question of interest into an experiment with a well-defined statistical model that allows the use of data in a decision-making process in the presence of uncertainty**. The design should fully describe the experiment in terms of the decisions relevant to its statistical model.

A statistical design can include many elements, such as: well-defined hypotheses (see hypothesis), the number and allocation of test groups, one or more primary KPIs and potentially secondary KPIs, the choice of a proper statistic (e.g. absolute difference vs percent change) and statistical test (e.g. Z-test or T-Test) as well as any corrections (e.g. p-value adjustments) necessary due to multiple testing, the minimum effect of interest, test duration and sample size (after power analysis and risk-reward analysis), the significance threshold required to reject the null hypothesis or the confidence level of an estimator, the choice of using a fixed-sample size or a sequential testing design, and so on. If a sequential design is chosen then the choice of an error-spending function, number and timing of interim analyses, the use of binding or unbinding boundaries is included in the design. With an adaptive sequential design further decisions are included.

The experimental design should also consider the predictive value of the data by taking measures against threats to generalizability.

A statistical design can also include the mis-specification tests that are to be used after the data is gathered to ensure the adequacy of the statistical model vis-a-vis the data at hand. Certain tests with regards to the representativeness of the results are also possible, with certain caveats. Procedures to ensure basic data-quality can also be included in the design.

**A proper statistical design is crucial** for making sure the data that will enter the decision making process is of sufficient quality and thus the causal inference and/or estimation will be valid. Failure to do so may lead to business losses the extent of which depend on how crucial a decision was affected. In some instances having a faulty statistical design can be worse than not performing any A/B test at all since the data will have a false "aura" of certainty around them, making them the last place anyone searchers for a reason once it is found that the results are not what was expected.

## Related A/B Testing terms

Like this glossary entry? For an in-depth and comprehensive reading on A/B testing stats, check out the book "Statistical Methods in Online A/B Testing" by the author of this glossary, Georgi Georgiev.