## What is Statistical Significance?

Aliases: *statistically significant, significant*

For a result of an A/B test to be **statistically significant** it has to have crossed the predefined significance threshold set when designing the test. The threshold is usually expressed in the terms of a p-value. Observing a p-value lower than which will result in the rejection of the relevant null hypothesis. For example, with a threshold of 0.05, a p-value of 0.02 is statistically significant and thus the null hypothesis can be rejected at that significance level (0.05).

Furthermore, the null could be rejected at any threshold higher than the observed significance level.

If defined by its complementary, the confidence level, as is often the case for historical reasons in the Conversion Rate Optimization industry, a test is statistically significant if it achieves a higher confidence level than the required threshold, e.g. with a threshold of 90% a test with an observed significance level of 0.02 corresponds to a confidence interval at the 98% level and since 98% is larger than 90% the result is statistically significant.

Observing a significant outcome can logically lead to one of three conclusions: (1) a rare outcome was observed, with how rare being equal to the observed p-value; (2) the null hypothesis can be rejected; (3) the statistical model is inadequate (does not reflect reality, its assumptions do not hold).