What is statistical significance ?
One cannot speak about statistical significance without speaking of hypothesis testing. When you launch an experiment within Kameleoon’s platform you want to know if the variation you are proofing is actually improving a certain metric compared to the original version of your site. When you frame this in statistical terms you can translate it by making a hypothesis which you will then try to refute by observing data.
To help you take a decision Kameleoon by default uses the frequentist statistical framework which helps us assess how likely it is to observe the data we observe if the “null” hypothesis is true. This is computed by the “p-value” which is exactly the probability of observing data at least as extreme under the null hypothesis, extreme meaning a wider gap in the metric of interest.
Statistical significance and reliability
An experiment is said to be statistically significant if the sample data we collected is sufficiently inconsistent with the data we should have observed under the null hypothesis.
To help you assess if an experiment result is statistically significant or not we do not provide the test p-value directly but the reliability which is simply computed as:
“reliability” = 1 – “p-value”
Let’s take the example below:
- Reference variation: 142,000 unique visitors tested, 3.52% conversion rate (5,000 sales)
- Variation 1 – tested: 216,000 unique visitors tested, 3.64% conversion rate (1,850 sales)
In this example, we are testing if variation 1 is more efficient than the reference variation. We are trying to reject the null hypothesis which is that the reference is equally or less efficient than the variation. We feel confident in wrongly rejecting the null hypothesis in only 5 % of the case, hence we set our significance level at 0.05.
We now compute the reliability, we notice that variation 1 is more efficient than the reference variation and that the reliability is 96%: the outcome of the experiment is statistically significant at our significance level 5%.
Checking statistical significance
If you are using Google Analytics or Kameleoon as reporting tool, Kameleoon will calculate automatically the statistical significance of your experiment, allowing you to check if one variation of your experiment is more or less efficient than the reference.
To know how to access the results page of a campaign, please read this article.
The results page allow you to see the performance of your campaigns. For each variation or each goal, you will find in the results tables (at the bottom of the page) the reliability rate, or statistical significance.
In the example above, the reliability rate is >99%.
Evolution and reliability
The campaign is only reliable if it has been applied to a sufficiently large amount of visitors. If the number of tested visitors is too low, the campaign loses its value.
Your campaign is not considered complete until this reliability rate stabilizes over time. This rate is considered stable if it remains within a range of +/- 5 points. A visual indicator reflects the stabilization of the rate: if the 3 boxes light up, your rate is stable!
- Not stable: 0/3 boxes
- Stable for 1 to 3 days: 1/3 boxes
- Stable for 4 to 6 days: 2/3 boxes
- Stable for 7 or more days: 3/3 boxes
In this example, the variation has three full boxes: its confidence level is stabilized, its results are reliable.
This variation, on the other hand, has no full box. This means that you have to wait until your reliability level stabilizes.
You can also consult the graph showing the evolution of the reliability over time, by selecting “Confidence rate” in the “Graphs” section of the block.
You can do it on the “Graphs” section of the page or by displaying the graph in a specific goal table.
When the curve flattens, it means that the results of your campaign are stabilized and that you can use this data with confidence.
What can you do if sufficient reliability is not achieved?
If your reliability rate is stable but insufficient, this may be related to different factors. Among them:
- the traffic on the page is not sufficient;
- the difference between the performance of the original and the variation is too small to draw conclusions (for example, the modification you have made has a very small impact on the behavior of your visitors).
However, you can draw conclusions for your website from a reliability rate stabilized at 75%. If the traffic on the page is not sufficient, your reliability rate will probably not reach 95%. But the Kameleoon results page offers a wide variety of data and indicators that will allow you to better understand your audience.
Note: You can change the minimum reliability rate required for Kameleoon to consider that a variation is winning. To do so, go to the “Administrate” > “Sites” page in the Kameleoon App. In the tab dedicated to Experiments, you will be able to modify this minimum reliability rate.
If you would like more details on how Kameleoon’s statistic engine works, you can read our Statistical paper.