What does "Standard Deviation" mean?

Definition of Standard Deviation in the context of A/B testing (online controlled experiments).

What is Standard Deviation?

Aliases: SD, sigma

Standard deviation is a measure of dispersion in a numerical data set: how far from the “normal” (average) are the data points of interest. It can also be said to be a measure of central tendency: the smaller the standard deviation is, the more "clustered" the data is around its center (the mean). The larger it is, the more spread the values are. It is usually denoted by the Greek small letter sigma (σ) when speaking of the SD of the sample and by the Greek capital letter sigma Σ when we refer to the population standard deviation.

Standard deviation is calculated as the square root of the variance, while the variance itself is the average of the squared differences from the arithmetic mean. Squaring is done to punish larger errors much more severely, and also to make sure the sign of the errors does not interfere. The reason we square the differences is so that larger departures from the mean are punished more severely. Squaring also results in treating departures in both direction (positive and negative) equally.

The standard deviation is usually preferred when communicating dispersion since it is expressed in the same unit as the data itself, making interpretation easier. E.g. if the mean is $9.00 then 1 SD can be equal to $2.00 while the variance will be 4.

The "standard" in "standard deviation" stems from the fact that it is standardized, or expressed in standard units, meaning that it is known what proportion of the values falls within n standard deviations from the mean. For example, for any distribution the number of values between -1 SD and +1 SD will be exactly 68.27% of the total, the number of values between -1.96 SD and +1.96 SD will be exactly 95% of all, while the values below +1.644 standard deviations from the mean are also 95% of total.

If the above numbers sound familiar it is because most statistics such as the z score and the t score - the results of the commonly used Z-Test and T-Test, are in fact standardized scores. A significance test is thus mathematically the calculation of the standardized distance of an observed score from an assumed mean. A p-value from a One-Tailed Test is simply the cumulative probability density function of the whole distribution up to an observed standardized score. For a Two-Tailed Test it is between an observed score and it's value multiplied by -1.

Consequently, the larger the standard deviation of your data is, the larger sample sizes you will require to observe results with low levels of uncertainty (low p-value) and high estimate precision (narrow confidence interval).

Related A/B Testing terms



Articles on Standard Deviation

Like this glossary entry? For an in-depth and comprehensive reading on A/B testing stats, check out the book "Statistical Methods in Online A/B Testing" by the author of this glossary, Georgi Georgiev.

Purchase Statistical Methods in Online A/B Testing

Glossary Index by Letter


Select a letter to see all A/B testing terms starting with that letter or visit the Glossary homepage to see all.