What does "Severity" mean?

Definition of Severity in the context of A/B testing (online controlled experiments).

What is Severity?

Severity is a principle for assessing the error probability of tests with respect to certain claims about a parameter of interest. It is also the name of the measure of error-detection capability of a test to which a given (statistical) hypothesis was subjected. The strong severity principle states: "We have evidence for a claim C just to the extent it survives a stringent scrutiny. If C passes a test that was highly capable of finding flaws or discrepancies from C, and yet none or few are found, then the passing result, x, is evidence for C." [1].

Mathematically severity has a lot in common with p-values and confidence intervals. A formal expression is SEV(T, x0,H) which translates to "The severity with which claim H passes test T with outcome x0" and from this follows SEV(μ > μ1) = P(d(X) ≤ d(x0); μ=μ1).

The main benefit of using severity logic and presentation is in offering a coherent measure of the evidential support for a specified statistical hypothesis. For example, observing SEV(δ > 0.02)=0.99 means that our testing procedure would have only produced such an extreme result, or a more extreme one, with probability 1-0.99 = 0.01 (1%) if in fact δ ≤ 0.02. Severity can be assessed for different claims about a parameter of interest: severity curves are especially helpful if one wants to assess the test's capacities at a glance.

Severity is useful in combating fallacies of rejection (misguided interpretations of a rejection of the null hypothesis) as well as fallacies of acceptance (misguided interpretations of the failure to reject the null hypothesis) when communicating the outcomes of an A/B test to stakeholders.

References:
[1] Mayo D. (2018) "Statistical Inference as Severe Testing"


Glossary Index by Letter

ABCDEFGHIKLMNOPRSTUVZ

Select a letter to see all A/B testing terms starting with that letter or visit the Glossary homepage to see all.