Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
133 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

ForTune: Running Offline Scenarios to Estimate Impact on Business Metrics (2403.00133v2)

Published 29 Feb 2024 in cs.CE and stat.AP

Abstract: Making ideal decisions as a product leader in a web-facing company is extremely difficult. In addition to navigating the ambiguity of customer satisfaction and achieving business goals, one must also pave a path forward for ones' products and services to remain relevant, desirable, and profitable. Data and experimentation to test product hypotheses are key to informing product decisions. Online controlled experiments by A/B testing may provide the best data to support such decisions with high confidence, but can be time-consuming and expensive, especially when one wants to understand impact to key business metrics such as retention or long-term value. Offline experimentation allows one to rapidly iterate and test, but often cannot provide the same level of confidence, and cannot easily shine a light on impact on business metrics. We introduce a novel, lightweight, and flexible approach to investigating hypotheses, called scenario analysis, that aims to support product leaders' decisions using data about users and estimates of business metrics. Its strengths are that it can provide guidance on trade-offs that are incurred by growing or shifting consumption, estimate trends in long-term outcomes like retention and other important business metrics, and can generate hypotheses about relationships between metrics at scale.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (31)
  1. Effective Evaluation Using Logged Bandit Feedback from Multiple Loggers. KDD (2017), 687–696.
  2. The surrogate index: Combining short-term proxies to estimate long-term treatment effects more rapidly and precisely. Technical Report. National Bureau of Economic Research.
  3. Zdravko Botev and Ad Ridder. 2017. Variance reduction. Wiley statsRef: Statistics reference online (2017), 1–6.
  4. A Large Scale Benchmark for Uplift Modeling. In Proceedings of the AdKDD and TargetAd Workshop, KDD, London,United Kingdom, August, 20, 2018. ACM. /Users/gdupret/References/2023/20231114T175543--a-large-scale-benchmark-for-uplift-modeling__criteo_scenario.pdf
  5. Doubly Robust Policy Evaluation and Learning. In Proceedings of the 28th International Conference on International Conference on Machine Learning (Bellevue, Washington, USA) (ICML’11). Omnipress, Madison, WI, USA, 1097–1104.
  6. Bradley Efron. 2000. The bootstrap and modern statistics. J. Amer. Statist. Assoc. 95, 452 (2000), 1293–1296.
  7. Jens Hainmueller. 2012. Entropy balancing for causal effects: A multivariate reweighting method to produce balanced samples in observational studies. Political analysis 20, 1 (2012), 25–46.
  8. M.A. Hernan and J.M. Robins. 2023. Causal Inference: What If. CRC Press. https://books.google.com/books?id=_KnHIAAACAAJ
  9. Miguel A Hernán and James M Robins. 2010. Causal inference: What If.
  10. MatchIt: Nonparametric Preprocessing for Parametric Causal Inference. Journal of Statistical Software 42, 8 (2011), 1–28. https://doi.org/10.18637/jss.v042.i08
  11. Daniel G Horvitz and Donovan J Thompson. 1952. A generalization of sampling without replacement from a finite universe. Journal of the American statistical Association 47, 260 (1952), 663–685.
  12. Bertrand Iooss and Paul Lemaître. 2015. A review on global sensitivity analysis methods. Uncertainty management in simulation-optimization of complex systems: algorithms and applications (2015), 101–122.
  13. Unbiased Learning-to-Rank with Biased Feedback. In Proceedings of the 10th ACM International Conference on Web Search and Data Mining. 781–789.
  14. Online controlled experiments at large scale. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining. 1168–1176.
  15. Global sensitivity analysis for systems with independent and/or correlated inputs. The journal of physical chemistry A 114, 19 (2010), 6022–6032.
  16. Balancing Approach for Causal Inference at Scale. arXiv:2302.05549 [stat.ME]
  17. Impatient Bandits: Optimizing Recommendations for the Long-Term Without Delay. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 1687–1697.
  18. Counterfactual Evaluation of Slate Recommendations with Sequential Reward Interactions. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1779–1788.
  19. Sequential Search with Off-Policy Reinforcement Learning. In Proceedings of the 30th ACM International Conference on Information and Knowledge Management.
  20. Paul R Rosenbaum and Donald B Rubin. 1983. The central role of the propensity score in observational studies for causal effects. Biometrika 70, 1 (1983), 41–55.
  21. Donald B Rubin. 1973. Matching to remove bias in observational studies. Biometrics (1973), 159–183.
  22. Long-term Off-Policy EvaluationandLearning. In Proceedings of the 2024 ACM Web Conference (to appear).
  23. Yuta Saito and Thorsten Joachims. 2021. Counterfactual Learning and Evaluation for Recommender Systems: Foundations, Implementations, and Recent Advances. In Proceedings of the 15th ACM Conference on Recommender Systems. 828–830.
  24. Sensitivity Analysis. Wiley. https://books.google.com/books?id=gOcePwAACAAJ
  25. Marco Scutari. 2010. Learning Bayesian Networks with the bnlearn R Package. Journal of Statistical Software 35, 3 (2010), 1–22. https://doi.org/10.18637/jss.v035.i03
  26. Off-Policy Evaluation for Slate Recommendation. In Advances in Neural Information Processing Systems, Vol. 30. 3632–3642.
  27. Estimating long-term causal effects from short-term experiments and long-term observational data with unobserved confounding. arXiv preprint arXiv:2302.10625 (2023).
  28. Surrogate for long-term user experience in recommender systems. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 4100–4109.
  29. An introduction to survey research, polling, and data analysis. Sage.
  30. Chonggang Xu and George Zdzislaw Gertner. 2008. Uncertainty and sensitivity analysis for models with correlated parameters. Reliability Engineering & System Safety 93, 10 (2008), 1563–1573.
  31. Targeting for long-term outcomes. arXiv preprint arXiv:2010.15835 (2020).

Summary

We haven't generated a summary for this paper yet.