What does "Statistical Significance" mean?

Definition of Statistical Significance in the context of A/B testing (online controlled experiments).

What is Statistical Significance?

Aliases: statistically significant, significant

For a result of an A/B test to be statistically significant it has to have crossed the predefined significance threshold set when designing the test. The threshold is usually expressed in the terms of a p-value. Observing a p-value lower than which will result in the rejection of the relevant null hypothesis. For example, with a threshold of 0.05, a p-value of 0.02 is statistically significant and thus the null hypothesis can be rejected at that significance level (0.05).

Furthermore, the null could be rejected at any threshold higher than the observed significance level.

If defined by its complementary, the confidence level, as is often the case for historical reasons in the Conversion Rate Optimization industry, a test is statistically significant if it achieves a higher confidence level than the required threshold, e.g. with a threshold of 90% a test with an observed significance level of 0.02 corresponds to a confidence interval at the 98% level and since 98% is larger than 90% the result is statistically significant.

Observing a significant outcome can logically lead to one of three conclusions: (1) a rare outcome was observed, with how rare being equal to the observed p-value; (2) the null hypothesis can be rejected; (3) the statistical model is inadequate (does not reflect reality, its assumptions do not hold).

Articles on Statistical Significance

Like this glossary entry? For an in-depth and comprehensive reading on A/B testing stats, check out the book "Statistical Methods in Online A/B Testing" by the author of this glossary, Georgi Georgiev.

Purchase Statistical Methods in Online A/B Testing

Glossary Index by Letter


Select a letter to see all A/B testing terms starting with that letter or visit the Glossary homepage to see all.