What does "Generalizability" mean?

Definition of Generalizability in the context of A/B testing (online controlled experiments).

What is Generalizability?

Aliases: external validity, representativeness

Generalizability of an online controlled experiment refers to the predictive value of its outcomes. It refers to how well the results generalize to time periods and populations other than the test duration and the population that experienced the treatment and. A term with the same meaning is "representativeness" as well as "external validity" in more scientific contexts.

It should not be confused with the statistical validity of the test (the adequacy of its statistical model) nor with the types of errors controlled by statistical methods such as type I and type II errors as these, when viewed in the context of the primary KPI only apply to the internal validity of a test.

The generalizability of the outcome of an A/B test can be threatened by many factors external to the test itself with three main types of such factors: time-related, population change related, and novelty/learning related. We have examined some of these in separate glossary entries, respectively for seasonality, learning effects, novelty effects, cookie churn, survivorship bias and selection bias.

Ways to improve the generalizability include managing the test duration so that the data is balanced across different important known factors (acquiring a "representative sample"), checking for strong trends within the test period, checking for trends persisting after the test has ended an was switched back to control vs control (A/A test), and others. None of these is flawless and all of them have statistical premises which may need testing on their own.

Generalizability is an unsolvable problem in the long term due to the adaptive nature of human behavior as well as the ever changing technological and competitive context. The different measures described above can help alleviate concerns about short and mid-term generalizability, as well as generalizability across populations.

