Handling outliers with Winsorization in A/B Testing

Written by Julie Trenque

Updated on 08/01/2024

3 min

Advanced

Was this content useful?

In this article, we’ll explore how our A/B Testing platform uses Winsorization to manage outliers. Whether you’re a seasoned data analyst or just getting started, we’ve got you covered.

What is Winsorization?

Winsorization is a statistical technique used to limit extreme values in data to reduce the impact of outliers, by using percentiles of your data. Outliers are data points that significantly differ from other observations and can skew the results of your AB tests. By Winsorizing your data, you can ensure that your results are more robust and reliable.

Why Winsorization Matters in A/B Testing

In AB testing, we compare two or more variations to determine which performs better. Outliers can distort the true performance of these variations, leading to misleading conclusions, mainly due to “whale users”. By applying Winsorization, we mitigate the effect of these extreme values, providing you with more accurate and actionable insights.

Winsorization is particularly useful when:

  • Your data contains extreme values that are not errors but are still significantly different from other observations.
  • You are looking for a simple and effective method to handle outliers without resorting to more complex techniques.
  • You need to maintain a balance between data integrity and managing outliers effectively.

Benefits of using Winsorization

Improved Accuracy

Outliers, or extreme values, can skew the results of A/B tests, leading to inaccurate conclusions. These outliers might stem from data entry errors, unusual user behavior, or rare events. Winsorization addresses this by capping extreme values at a specific percentile, thereby reducing their impact. This method ensures that the data’s central tendency more accurately reflects typical user experiences, providing a clearer view of performance metrics.

Accurate data representation enhances the reliability of conclusions drawn from A/B tests in several ways:

  • Reduced Noise: Outliers introduce noise, making it hard to detect true differences between variations. Winsorization minimizes this noise, clarifying performance signals.
  • Stability of Metrics: Winsorized data leads to stable and consistent performance metrics, allowing for confident identification of the better variation.
  • Enhanced Comparability: Applying Winsorization uniformly across all variations ensures fair comparisons, free from distortion by extreme values.

Enhanced robustness

Winsorized results are more robust as they are less likely to be skewed by extreme values. This robustness is essential for making informed decisions that are not disproportionately influenced by anomalies.

Simplicity

Winsorization is straightforward and easy to implement, making it accessible for users with varying levels of statistical expertise. It offers an efficient way to handle outliers without the need for complex algorithms or advanced statistical knowledge.

Risks and good practices with Winsorization

While Winsorization is a valuable tool, it is not without its risks:

  • Loss of data integrity: Excessive Winsorization can lead to a significant alteration of your data, potentially masking important variations and patterns.
  • Over-simplification: By modifying outliers, you may oversimplify your data, which can result in an incomplete understanding of your dataset.
  • Bias introduction: Inappropriate Winsorization thresholds can introduce bias, skewing your results and leading to incorrect conclusions. If you do not ensure symmetricity around the mean (5th and 95th percentile), you might alter the mean.

To mitigate these risks, follow these good practices:

  • Understand your data: Before applying Winsorization, thoroughly understand the nature and distribution of your data. This helps in setting appropriate thresholds for identifying outliers.
  • Set appropriate thresholds: Use industry standards or data-specific insights to set your Winsorization thresholds. Common thresholds include the top and bottom 0.1% up to 5% of your data, but these should be adjusted based on your specific use case.
  • Evaluate impact: After applying Winsorization, evaluate its impact on your data and test results. Compare the Winsorized data with the original to ensure that important information is not lost.
  • Document your process: Keep a detailed record of your Winsorization process, including the rationale for chosen thresholds and the impact on your data. This transparency aids in reproducibility and understanding.

How we implement Winsorization at Kameleoon

1. Create a custom goal

First, you need to create a custom goal that you want to apply Winsorization to.

2. Set limits

Next, you need to set limits to replace the outliers. For example, if you’re using the 95% Winsorization, any data points below the 2.5th percentile are set to the value at the 2.5th percentile, and any data points above the 97.5th percentile are set to the value at the 97.5th percentile.

These bounds can be found in the Advanced Settings of your custom goal. The Winsorization method will be applied on your Revenue if it exists.

Kameleoon will apply the Winsorization method to your custom goal and identify the outliers in your data. Outliers are typically those values that fall outside a specific range — often the top and bottom 1% or 5% of your data.

3. Read your results

Once Winsorization is applied to your goal, you can read your adjusted results in the different results pages containing this goal. The goal container will then have a badge telling you that outliers are handled on this goal. Hovering on it will give you back the parameters you have set.

Example of impact

Consider an A/B test comparing two landing pages (A and B). Without Winsorization, a few high-value outliers (e.g., purchases made by a few very high-spending users) could make one page appear significantly more effective, even if the typical user behavior does not support this conclusion.

  • Original Data (Metric: Revenue per User):
    • Page A: [10, 12, 14, 15, 16, 18, 100]
    • Page B: [11, 13, 15, 15, 17, 19, 110]
  • Winsorized Data:
    • Page A: [10, 12, 14, 15, 16, 18, 18]
    • Page B: [11, 13, 15, 15, 17, 19, 19]

In this example, the extreme values (100 and 110) are capped, providing a more accurate comparison of typical user revenue between the two pages.

Technical considerations

  • When you first apply this method to your custom goal, we will compute and store the values corresponding to the percentiles you have set and use it in the different results pages to adapt your data.
  • The percentiles values used for clipping the outliers are updated once a day for the rest of the day, at 2 AM, for all your goals having the method ON. Note that these values will be re-evaluated instantly if you change the thresholds in the goal settings.
  • Note that the raw data is not changed. You can still find your raw data when requesting a raw export, or when disabling the feature and look at the results page.

  • In this article :