Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
194 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

RetailSynth: Synthetic Data Generation for Retail AI Systems Evaluation (2312.14095v1)

Published 21 Dec 2023 in stat.AP, cs.AI, cs.LG, and econ.EM

Abstract: Significant research effort has been devoted in recent years to developing personalized pricing, promotions, and product recommendation algorithms that can leverage rich customer data to learn and earn. Systematic benchmarking and evaluation of these causal learning systems remains a critical challenge, due to the lack of suitable datasets and simulation environments. In this work, we propose a multi-stage model for simulating customer shopping behavior that captures important sources of heterogeneity, including price sensitivity and past experiences. We embedded this model into a working simulation environment -- RetailSynth. RetailSynth was carefully calibrated on publicly available grocery data to create realistic synthetic shopping transactions. Multiple pricing policies were implemented within the simulator and analyzed for impact on revenue, category penetration, and customer retention. Applied researchers can use RetailSynth to validate causal demand models for multi-category retail and to incorporate realistic price sensitivity into emerging benchmarking suites for personalized pricing, promotions, and product recommendations.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (37)
  1. P. Cremonesi and D. Jannach, “Progress in Recommender Systems Research: Crisis? What Crisis?” AI Magazine, vol. 42, no. 3, pp. 43–54, Nov. 2021, number: 3. [Online]. Available: https://ojs.aaai.org/aimagazine/index.php/aimagazine/article/view/18145
  2. M. Mazumder, C. Banbury, X. Yao, B. Karlaš, W. G. Rojas, S. Diamos, G. Diamos, L. He, A. Parrish, H. R. Kirk, J. Quaye, C. Rastogi, D. Kiela, D. Jurado, D. Kanter, R. Mosquera, J. Ciro, L. Aroyo, B. Acun, L. Chen, M. S. Raje, M. Bartolo, S. Eyuboglu, A. Ghorbani, E. Goodman, O. Inel, T. Kane, C. R. Kirkpatrick, T.-S. Kuo, J. Mueller, T. Thrush, J. Vanschoren, M. Warren, A. Williams, S. Yeung, N. Ardalani, P. Paritosh, L. Bat-Leah, C. Zhang, J. Zou, C.-J. Wu, C. Coleman, A. Ng, P. Mattson, and V. J. Reddi, “DataPerf: Benchmarks for Data-Centric AI Development,” Oct. 2023, arXiv:2207.10062 [cs]. [Online]. Available: http://arxiv.org/abs/2207.10062
  3. C. Fernández-Loría and F. Provost, “Causal Decision Making and Causal Effect Estimation Are Not the Same…and Why It Matters,” INFORMS Journal on Data Science, vol. 1, no. 1, pp. 4–16, 2022, _eprint: https://doi.org/10.1287/ijds.2021.0006. [Online]. Available: https://doi.org/10.1287/ijds.2021.0006
  4. A. De Biasio, A. Montagna, F. Aiolli, and N. Navarin, “A systematic review of value-aware recommender systems,” Expert Systems with Applications, vol. 226, p. 120131, Sep. 2023. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0957417423006334
  5. E. Ie, C.-w. Hsu, M. Mladenov, V. Jain, S. Narvekar, J. Wang, R. Wu, and C. Boutilier, “RecSim: A Configurable Simulation Platform for Recommender Systems,” Sep. 2019, arXiv:1909.04847 [cs, stat]. [Online]. Available: http://arxiv.org/abs/1909.04847
  6. T. Broderick, A. Gelman, R. Meager, A. L. Smith, and T. Zheng, “Toward a taxonomy of trust for probabilistic machine learning,” Science Advances, vol. 9, no. 7, p. eabn3999, Feb. 2023, publisher: American Association for the Advancement of Science. [Online]. Available: https://www.science.org/doi/10.1126/sciadv.abn3999
  7. N. Patki, R. Wedge, and K. Veeramachaneni, “The Synthetic Data Vault,” in 2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA), Oct. 2016, pp. 399–410.
  8. J. Jordon, L. Szpruch, F. Houssiau, M. Bottarelli, G. Cherubin, C. Maple, S. N. Cohen, and A. Weller, “Synthetic Data – what, why and how?” May 2022, arXiv:2205.03257 [cs]. [Online]. Available: http://arxiv.org/abs/2205.03257
  9. H. Wilde, J. Jewson, S. Vollmer, and C. Holmes, “Foundations of Bayesian Learning from Synthetic Data,” in Proceedings of The 24th International Conference on Artificial Intelligence and Statistics.   PMLR, Mar. 2021, pp. 541–549, iSSN: 2640-3498. [Online]. Available: https://proceedings.mlr.press/v130/wilde21a.html
  10. S. Athey, R. Chetty, G. Imbens, and H. Kang, “Estimating Treatment Effects using Multiple Surrogates: The Role of the Surrogate Score and the Surrogate Index,” Feb. 2020, arXiv:1603.09326 [econ, stat]. [Online]. Available: c
  11. H. Parikh, C. Varjao, L. Xu, and E. T. Tchetgen, “Validating Causal Inference Methods,” in Proceedings of the 39th International Conference on Machine Learning.   PMLR, Jun. 2022, pp. 17 346–17 358, iSSN: 2640-3498. [Online]. Available: https://proceedings.mlr.press/v162/parikh22a.html
  12. R. L. Andrews, I. S. Currim, and P. S. H. Leeflang, “A Comparison of Sales Response Predictions From Demand Models Applied to Store-Level versus Panel Data,” Journal of Business & Economic Statistics, vol. 29, no. 2, pp. 319–326, Apr. 2011. [Online]. Available: http://www.tandfonline.com/doi/abs/10.1198/jbes.2010.07225
  13. P. C. Fishburn, “Utility Theory,” Management Science, vol. 14, no. 5, pp. 335–378, Jan. 1968, publisher: INFORMS. [Online]. Available: https://pubsonline.informs.org/doi/abs/10.1287/mnsc.14.5.335
  14. J. Chiang, “A Simultaneous Approach to the Whether, What and How Much to Buy Questions,” Marketing Science, vol. 10, no. 4, pp. 297–315, Nov. 1991, publisher: INFORMS. [Online]. Available: https://pubsonline.informs.org/doi/abs/10.1287/mksc.10.4.297
  15. M. Wan, D. Wang, M. Goldman, M. Taddy, J. Rao, J. Liu, D. Lymberopoulos, and J. McAuley, “Modeling Consumer Preferences and Price Sensitivities from Large-Scale Grocery Shopping Transaction Logs,” in Proceedings of the 26th International Conference on World Wide Web.   Perth Australia: International World Wide Web Conferences Steering Committee, Apr. 2017, pp. 1103–1112. [Online]. Available: https://dl.acm.org/doi/10.1145/3038912.3052568
  16. Y. Ekinci, F. Ulengin, and N. Uray, “Using customer lifetime value to plan optimal promotions,” The Service Industries Journal, vol. 34, no. 2, pp. 103–122, Jan. 2014. [Online]. Available: http://www.tandfonline.com/doi/abs/10.1080/02642069.2013.763929
  17. J. Romero, R. van der Lans, and B. Wierenga, “A Partially Hidden Markov Model of Customer Dynamics for CLV Measurement,” Journal of Interactive Marketing, vol. 27, no. 3, pp. 185–208, Aug. 2013, publisher: SAGE Publications. [Online]. Available: https://journals.sagepub.com/doi/abs/10.1016/j.intmar.2013.04.003
  18. P. S. Fader, B. G. S. Hardie, and K. L. Lee, ““Counting Your Customers” the Easy Way: An Alternative to the Pareto/NBD Model,” Marketing Science, vol. 24, no. 2, pp. 275–284, May 2005. [Online]. Available: https://pubsonline.informs.org/doi/10.1287/mksc.1040.0098
  19. P. S. Fader and B. G. Hardie, “Probability Models for Customer-Base Analysis,” Journal of Interactive Marketing, vol. 23, no. 1, pp. 61–69, Feb. 2009. [Online]. Available: http://journals.sagepub.com/doi/10.1016/j.intmar.2008.11.003
  20. D. G. Morrison and D. C. Schmittlein, “Generalizing the NBD Model for Customer Purchases: What Are the Implications and Is It Worth the Effort?” Journal of Business & Economic Statistics, vol. 6, no. 2, p. 145, Apr. 1988. [Online]. Available: https://www.jstor.org/stable/1391551?origin=crossref
  21. R. Montoya, O. Netzer, and K. Jedidi, “Dynamic Marketing Resource Allocation for Long-Term Profitability: A Pharmaceutical Application,” SSRN Electronic Journal, p. 39, 2008.
  22. O. Netzer, J. M. Lattin, and V. Srinivasan, “A Hidden Markov Model of Customer Relationship Dynamics,” Marketing Science, vol. 27, no. 2, pp. 185–204, Mar. 2008, publisher: INFORMS. [Online]. Available: https://pubsonline.informs.org/doi/abs/10.1287/mksc.1070.0294
  23. G. Liberali and A. Ferecatu, “Morphing for Consumer Dynamics: Bandits Meet Hidden Markov Models,” Marketing Science, vol. 41, no. 4, pp. 769–794, Jul. 2022, publisher: INFORMS. [Online]. Available: https://pubsonline.informs.org/doi/abs/10.1287/mksc.2021.1346
  24. J. Gauci, E. Conti, Y. Liang, K. Virochsiri, Y. He, Z. Kaden, V. Narayanan, X. Ye, Z. Chen, and S. Fujimoto, “Horizon: Facebook’s Open Source Applied Reinforcement Learning Platform,” Sep. 2019, arXiv:1811.00260 [cs, stat]. [Online]. Available: http://arxiv.org/abs/1811.00260
  25. D. Hafner, J. Davidson, and V. Vanhoucke, “TensorFlow Agents: Efficient Batched Reinforcement Learning in TensorFlow,” Oct. 2018, arXiv:1709.02878 [cs]. [Online]. Available: http://arxiv.org/abs/1709.02878
  26. M. R. O. Santana, L. C. Melo, F. H. F. Camargo, B. Brandão, A. Soares, R. M. Oliveira, and S. Caetano, “MARS-Gym: A Gym framework to model, train, and evaluate Recommender Systems for Marketplaces,” Sep. 2020, arXiv:2010.07035 [cs, stat]. [Online]. Available: http://arxiv.org/abs/2010.07035
  27. A. Kastius and R. Schlosser, “Dynamic Pricing under Competition using Reinforcement Learning,” Journal of Revenue & Pricing Management, vol. 21, Feb. 2022.
  28. R. Rana and F. S. Oliveira, “Real-time dynamic pricing in a non-stationary environment using model-free reinforcement learning,” Omega, vol. 47, pp. 116–126, Sep. 2014. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S030504831300100X
  29. S. Serth, N. Podlesny, M. Bornstein, J. Lindemann, J. Latt, J. Selke, R. Schlosser, M. Boissier, and M. Uflacker, “An Interactive Platform to Simulate Dynamic Pricing Competition on Online Marketplaces,” in 2017 IEEE 21st International Enterprise Distributed Object Computing Conference (EDOC), Oct. 2017, pp. 61–66, iSSN: 2325-6362. [Online]. Available: https://ieeexplore.ieee.org/abstract/document/8089863
  30. R. Donnelly, F. R. Ruiz, D. Blei, and S. Athey, “Counterfactual Inference for Consumer Choice Across Many Product Categories,” Jun. 2019, arXiv:1906.02635 [cs, econ, stat]. [Online]. Available: http://arxiv.org/abs/1906.02635
  31. Dunnhumby, “The Complete Journey,” 2014. [Online]. Available: https://www.dunnhumby.com/source-files/
  32. R. Agrawal, H. Mannila, R. Srikant, H. Toivonen, and A. I. Verkamo, “Fast discovery of association rules,” in Advances in knowledge discovery and data mining.   USA: American Association for Artificial Intelligence, Feb. 1996, pp. 307–328.
  33. T. Tulabandhula, D. Sinha, S. R. Karra, and P. Patidar, “Multi-Purchase Behavior: Modeling, Estimation and Optimization,” Aug. 2023, arXiv:2006.08055 [cs, econ]. [Online]. Available: http://arxiv.org/abs/2006.08055
  34. T. Akiba, S. Sano, T. Yanase, T. Ohta, and M. Koyama, “Optuna: A Next-generation Hyperparameter Optimization Framework,” Jul. 2019, arXiv:1907.10902 [cs, stat]. [Online]. Available: http://arxiv.org/abs/1907.10902
  35. S. Gabel and A. Timoshenko, “Product Choice with Large Assortments: A Scalable Deep-Learning Model,” Management Science, vol. 68, no. 3, pp. 1808–1827, Mar. 2022. [Online]. Available: https://pubsonline.informs.org/doi/10.1287/mnsc.2021.3969
  36. F. J. R. Ruiz, S. Athey, and D. M. Blei, “SHOPPER: A Probabilistic Model of Consumer Choice with Substitutes and Complements,” Jun. 2019, arXiv:1711.03560 [cs, econ, stat]. [Online]. Available: http://arxiv.org/abs/1711.03560
  37. T. Doan, N. Veira, S. Ray, and B. Keng, “Generating Realistic Sequences of Customer-level Transactions for Retail Datasets,” Sep. 2019, arXiv:1901.05577 [cs, stat]. [Online]. Available: http://arxiv.org/abs/1901.05577
Citations (2)

Summary

We haven't generated a summary for this paper yet.

Youtube Logo Streamline Icon: https://streamlinehq.com