What is Sample Ratio Mismatch (SRM)?
Sample Ratio Mismatch means that there is a significant difference between the expected deviations of the experiment variations and the observed ones. You are faced with SRM when one variant in your test receives notably more, or less, traffic than expected. This causes traffic to your variation(s) to be distributed unequally, regardless whether you split your traffic evenly (e.g., 50-50).
Let’s look at an example: You launched a campaign with a traffic allocation of 50% toward the original and 50% toward the variation. Your data shows you have 50,000 visitors on the original and 48,900 on the variation. Kameleoon’s native, in-app SRM detection test will flag these results as positive for Sample Ratio Mismatch. At Kameleoon, we detect SRM by running a “Chi-Square Goodness of Fit test,” which helps us understand whether a variable is likely to come from a specified distribution. We use a threshold of 0.001 for the p-value of the test, meaning that an experiment will be positive for the SRM test if the test’s p-value is lower than this threshold.
Why is SRM relevant?
SRM implies that there is bias in the experiment’s traffic assignment mechanism. This bias can make the experiment’s results invalid because one of the core hypotheses of the test, on which statistical computations for A/B test are based, is violated.
This problem is especially relevant if you have large traffic on your sites because even small discrepancies can have major effects on experiment results and lead to poor decision-making.
Kameleoon’s in-app SRM alerting helps you make decisions based on an actual statistical test, rather than on intuition.
What to do when your experiment is positive for SRM?
You don’t immediately have to throw out experiment results when you detect an SRM. If you manage to identify the origin of the SRM, and the discrepancy evens out after filtering out problematic visitors, you can still draw valid conclusions on your experiment results.
A common cause for SRM is when, due to variation code incompatibility, a browser mis-classifies the visitor. Consequently, the visitor sees the reference instead of the variation and no exposition data is collected. You’d be able to troubleshoot this issue in your reporting dashboard:
- Open the Result of your experiment. (We recommend that you pause your experiment first.)
- Scroll to any of your goals and click on the Breakdown button. (It doesn’t matter which goal you’re using because you’ll apply the breakdown to all goals.)
- Apply a breakdown filter to all of your goals.
- In the breakdown reports, check whether traffic allocation to your variations is impacted by the variable of the breakdown filter. We recommend checking for Device type, Browser, Operating system, and any custom data you might have. If your reports show the discrepancy, you’ve found the variable responsible for the Sample Ratio Mismatch. (If not, keep applying breakdown filters until you identify the culprit.)
- Once you know the cause of the SRM, apply a Filter to your experiment results to exclude visitors impacted by the incorrect allocation. With your results filtered, you can determine the winning variation of your experiment.
- After you addressed the issue that caused the SRM, you can relaunch your experiment, confident that your traffic allocation would no longer be biased.
What causes SRM?
The root causes of SRM can greatly vary between experiments and the organizations or users who are running the experiments.
Based on our experience, we created a checklist for troubleshooting when your experiment is positive for SRM:
- Make sure you did not use a weak unique identifier for your visitors. By default, Kameleoon uses a unique visitor ID (Kameleoon Visitorcode) for each visitor. This code is then hashed in our algorithm which randomly selects a variation for this visitor. If you do not use our default visitor code, or use a custom data to map your own user ID with our visitor ID to run cross-device experiments, your own ID must be random and consist of a string of at least 16 random characters (lower case letters and numerals).
- Check that there is no delay affecting any variation execution that could cause discrepancies, e.g., if your variation is redirecting toward a different page on which Kameleoon is not installed. Based on our experience, experiments with URL redirects increase the likelihood of having an SRM, as some visitor redirected to variant B may fail to see the page OR because data collection will only occur after page B loads. This will cause an amount of data loss in variant B that you would not have in the original page.
- Check if your experiment is leveraging custom variation assignment scripts. Using a custom variation assignment script could potentially cause SRM, as it will bypass our assignation algorithm.
- If you have been updating the test deviation, deleting a variation, and reallocating traffic, your experiment might be flagged as positive for SRM.
It’s a good practice to understand the origin of the mismatch and account for it in future experiments. Make sure you analyze your SRM-positive experiments with different breakdown filters because they can help you identify the root cause(s) of traffic assignment differences faster.
For help diagnosing potential causes you can check out this great and extensive taxonomy of causes.
How frequent is SRM?
Our SRM test is a powerful tool because it offers a single test for detecting a broad range of potential issues. However, because this range is so wide, SRM tests may come out positive more often than expected. On average, 5.5% of experiments run on the Kameleoon platform test positive for SRM.
To give you more insight, Microsoft shared the following about SRM in September 2020:
Recent research contributions from companies such as LinkedIn and Yahoo, as well as our own research confirm that SRMs happen relatively frequently. How frequently? At LinkedIn, about 10% of their zoomed-in A/B tests (A/B tests that trigger users in the analysis if they satisfy some condition) used to suffer from this bias. At Microsoft, a recent analysis showed that about 6% of A/B tests have an SRM.