## What is Severity?

Severity is a principle for assessing the error probability of tests with respect to certain claims about a parameter of interest. It is also the name of the measure of error-detection capability of a test to which a given (statistical) hypothesis was subjected. The strong severity principle states: "We have evidence for a claim C just to the extent it survives a stringent scrutiny. If C passes a test that was highly capable of ﬁnding ﬂaws or discrepancies from C, and yet none or few are found, then the passing result, x, is evidence for C." ^{[1]}.

Mathematically severity has a lot in common with p-values and confidence intervals. A formal expression is SEV(T, x_{0},H) which translates to "The severity with which claim H passes test T with outcome x_{0}" and from this follows SEV(μ > μ1) = P(d(X) ≤ d(x_{0}); μ=μ1).

The main benefit of using severity logic and presentation is in offering a coherent measure of the evidential support for a specified statistical hypothesis. For example, observing SEV(δ > 0.02)=0.99 means that our testing procedure would have only produced such an extreme result, or a more extreme one, with probability 1-0.99 = 0.01 (1%) if in fact δ ≤ 0.02. Severity can be assessed for different claims about a parameter of interest: severity curves are especially helpful if one wants to assess the test's capacities at a glance.

Severity is useful in combating fallacies of rejection (misguided interpretations of a rejection of the null hypothesis) as well as fallacies of acceptance (misguided interpretations of the failure to reject the null hypothesis) when communicating the outcomes of an A/B test to stakeholders.

References:

[1] Mayo D. (2018) "Statistical Inference as Severe Testing"

## Related A/B Testing terms

Like this glossary entry? For an in-depth and comprehensive reading on A/B testing stats, check out the book "Statistical Methods in Online A/B Testing" by the author of this glossary, Georgi Georgiev.