Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
158 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

STATE: A Robust ATE Estimator of Heavy-Tailed Metrics for Variance Reduction in Online Controlled Experiments (2407.16337v1)

Published 23 Jul 2024 in cs.LG

Abstract: Online controlled experiments play a crucial role in enabling data-driven decisions across a wide range of companies. Variance reduction is an effective technique to improve the sensitivity of experiments, achieving higher statistical power while using fewer samples and shorter experimental periods. However, typical variance reduction methods (e.g., regression-adjusted estimators) are built upon the intuitional assumption of Gaussian distributions and cannot properly characterize the real business metrics with heavy-tailed distributions. Furthermore, outliers diminish the correlation between pre-experiment covariates and outcome metrics, greatly limiting the effectiveness of variance reduction. In this paper, we develop a novel framework that integrates the Student's t-distribution with machine learning tools to fit heavy-tailed metrics and construct a robust average treatment effect estimator in online controlled experiments, which we call STATE. By adopting a variational EM method to optimize the loglikehood function, we can infer a robust solution that greatly eliminates the negative impact of outliers and achieves significant variance reduction. Moreover, we extend the STATE method from count metrics to ratio metrics by utilizing linear transformation that preserves unbiased estimation, whose variance reduction is more complex but less investigated in existing works. Finally, both simulations on synthetic data and long-term empirical results on Meituan experiment platform demonstrate the effectiveness of our method. Compared with the state-of-the-art estimators (CUPAC/MLRATE), STATE achieves over 50% variance reduction, indicating it can reach the same statistical power with only half of the observations, or half the experimental duration.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (28)
  1. Robust Probabilistic Projections. In ACM International Conference on Machine Learning (ICML). 33–40.
  2. Peter M Aronow and Joel A Middleton. 2013. A Class of Unbiased Estimators of the Average Treatment Effect in Randomized Experiments. Journal of Causal Inference 1, 1 (2013), 135–154.
  3. Eytan Bakshy and Dean Eckles. 2013. Uncertainty in Online Experiments with Dependent Data: An Evaluation of Bootstrap Methods. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). 1303–1311.
  4. Consistent Transformation of Ratio Metrics for Efficient Online Controlled Experiments. In ACM International Conference on Web Search and Data Mining (WSDM). 55–63.
  5. Pauline Burke et al. 2019. Measuring Average Treatment Effect from Heavy-tailed Data. arXiv preprint arXiv:1905.09252 (2019).
  6. Double/Debiased Machine Learning for Treatment and Structural Parameters. The Econometrics Journal 21, 1 (2018), C1–C68.
  7. Anirban DasGupta. 2008. Asymptotic Theory of Statistics and Probability. Vol. 180. Springer.
  8. Variance Reduction Using In-Experiment Data: Efficient and Targeted Online Measurement for Sparse and Delayed Outcomes. In ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD). 3937–3946.
  9. Improving the Sensitivity of Online Controlled Experiments by Utilizing Pre-Experiment Data. In ACM International Conference on Web Search and Data Mining (WSDM). 123–132.
  10. Zero to Hero: Exploiting Null Effects to Achieve Variance Reduction in Experiments with One-sided Triggering. In ACM International Conference on Web Search and Data Mining (WSDM). 823–831.
  11. Wilfrid J Dixon. 1960. Simplified estimation from censored normal samples. The Annals of Mathematical Statistics (1960), 385–391.
  12. David A Freedman. 2008. On Regression Adjustments to Experimental Data. Advances in Applied Mathematics 40, 2 (2008), 180–193.
  13. Machine Learning for Variance Reduction in Online Experiments. In Conference on Neural Information Processing Systems (NIPS). 8637–8648.
  14. Focusing on the Long-term: It’s Good for Users and Business. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). 1849–1858.
  15. Ying Jin and Shan Ba. 2023. Toward Optimal Variance Reduction in Online Controlled Experiments. Technometrics 65, 2 (2023), 231–242.
  16. Optimization Transfer Using Surrogate Objective Functions. Journal of Computational and Graphical Statistics 9, 1 (2000), 1–20.
  17. Lin and Winston. 2013. Agnostic Notes on Regression Adjustments to Experimental Data: Reexamining Freedman’s Critique. The Annals of Applied Statistics 7, 1 (2013), 295–318.
  18. Chuanhai Liu and Donald B Rubin. 1995. ML Estimation of the T Distribution using EM and Its Extensions, ECM and ECME. Statistica Sinica 5, 1 (1995), 19–39.
  19. Generalized Majorization-Minimization. In ACM International Conference on Machine Learning (ICML). 5022–5031.
  20. David Peel and Geoffrey J McLachlan. 2000. Robust Mixture Modelling Using the T Distribution. Statistics and Computing 10 (2000), 339–348.
  21. Jasjeet S Sekhon. 2008. The Neyman-Rubin Model of Causal Inference and Estimation via Matching Methods. The Oxford Handbook of Political Methodology 2 (2008), 1–32.
  22. Markus Svensén and Christopher M Bishop. 2005. Robust Bayesian Mixture Modelling. Neurocomputing 64 (2005), 235–252.
  23. Control Using Predictions as Covariates in Switchback Experiments. (2020).
  24. High-Dimensional Regression Adjustments in Randomized Experiments. National Academy of Sciences 113, 45 (2016), 12673–12678.
  25. Edward Wu and Johann A Gagnon-Bartsch. 2018. The LOOP Estimator: Adjusting for Covariates in Randomized Experiments. Evaluation Review 42, 4 (2018), 458–488.
  26. Ya Xu and Nanyu Chen. 2016. Evaluating Mobile Apps with A/B and Quasi A/B Tests. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). 313–322.
  27. Li Yang and Anastasios A Tsiatis. 2001. Efficiency Study of Estimators for a Treatment Effect in a Pretest–Posttest Trial. The American Statistician 55, 4 (2001), 314–321.
  28. Wenjing Zheng and Mark J van der Laan. 2011. Cross-Validated Targeted Minimum-Loss-Based Estimation. Targeted Learning: Causal Inference for Observational and Experimental Data (2011), 459–474.

Summary

We haven't generated a summary for this paper yet.