## What is a Hypothesis?

A hypothesis in A/B testing can refer to: (1) a **broad claim** about the mechanism according to which a given intervention will affect user behavior, (2) a **precise claim** related to the effect size and direction of a given treatment/intervention on a specific key performance indicator or a composite KPI, or (3) a **statistical hypothesis** expressed as a specified statistical model of which a given data is a typical realization.

An example of (1) will be the claim "adding trust signals to the cart will improve our business metrics due to persuasion technique X" or "splitting the sign up process in two smaller steps will improve our signup rate due to the sunk cost fallacy". While the first part of these statements can be subject to a hypothesis test, the second part can not usually be tested in the context of conversion rate optimization: even if there is an improvement the problem remains that the second part was not actually tested as any observed effect could be due to a myriad of known and unknown reasons.

An example of (2) is "adding an up-sell to our cart page will improve average order value while not hurting our purchase rate and thus increase our average revenue per user". It can be succeeded by a more concrete number such as "by at least 5%". This is an example of a hypothesis about a set of metrics that can be tested using an online controlled experiment wherein it will appear as the alternative hypothesis and we will in fact test the opposite claim.

An example of (3) can be given by translating the above example (for 2) into a statistical hypothesis (model). It would be something like: assuming a null hypothesis model of δ_{PR} ≤ 0% AND δ_{AOV} ≤ 0 AND δ_{ARPU} ≤ 0, with PE, AOV & ARPU being normally-distributed, independent and identically distributed random variables we would expect to see a discrepancy in PR of more than x, in AOV of more than y and in ARPU of more than y in no more than 1 out of 100 tests with N samples in each of two groups (control and treatment). Mis-specification (M-S) tests will be performed to test the assumptions and an A/A test may be performed as a safeguard.

As you can see, while in common parlance the hypothesis is equal to a fairly simple claim about the effect of a given intervention on a parameter or set of parameters, it is quite more involved to translate it into a proper statistical hypothesis. Due to the hidden premises involved Duhemian problems may prevent the conclusion from the test of a statistical hypothesis to translate directly into a substantive claim (2). A battery of M-S tests helps insure there is less likelihood for this to happen but it will still not ward against some of those. For example, on such assumption is that the data is accurate: issues with data collection (e.g. unknown missing data, faulty data, etc.) may still invalidate the inference even if it is statistically sound. Another assumption is about the external validity (generalizability) of the test data: violations here may render an otherwise perfect A/B test perfectly meaningless, or even worse: misleading.