RetailSynth: Synthetic Data Generation for Retail AI Systems Evaluation (2312.14095v1)
Abstract: Significant research effort has been devoted in recent years to developing personalized pricing, promotions, and product recommendation algorithms that can leverage rich customer data to learn and earn. Systematic benchmarking and evaluation of these causal learning systems remains a critical challenge, due to the lack of suitable datasets and simulation environments. In this work, we propose a multi-stage model for simulating customer shopping behavior that captures important sources of heterogeneity, including price sensitivity and past experiences. We embedded this model into a working simulation environment -- RetailSynth. RetailSynth was carefully calibrated on publicly available grocery data to create realistic synthetic shopping transactions. Multiple pricing policies were implemented within the simulator and analyzed for impact on revenue, category penetration, and customer retention. Applied researchers can use RetailSynth to validate causal demand models for multi-category retail and to incorporate realistic price sensitivity into emerging benchmarking suites for personalized pricing, promotions, and product recommendations.
- P. Cremonesi and D. Jannach, “Progress in Recommender Systems Research: Crisis? What Crisis?” AI Magazine, vol. 42, no. 3, pp. 43–54, Nov. 2021, number: 3. [Online]. Available: https://ojs.aaai.org/aimagazine/index.php/aimagazine/article/view/18145
- M. Mazumder, C. Banbury, X. Yao, B. Karlaš, W. G. Rojas, S. Diamos, G. Diamos, L. He, A. Parrish, H. R. Kirk, J. Quaye, C. Rastogi, D. Kiela, D. Jurado, D. Kanter, R. Mosquera, J. Ciro, L. Aroyo, B. Acun, L. Chen, M. S. Raje, M. Bartolo, S. Eyuboglu, A. Ghorbani, E. Goodman, O. Inel, T. Kane, C. R. Kirkpatrick, T.-S. Kuo, J. Mueller, T. Thrush, J. Vanschoren, M. Warren, A. Williams, S. Yeung, N. Ardalani, P. Paritosh, L. Bat-Leah, C. Zhang, J. Zou, C.-J. Wu, C. Coleman, A. Ng, P. Mattson, and V. J. Reddi, “DataPerf: Benchmarks for Data-Centric AI Development,” Oct. 2023, arXiv:2207.10062 [cs]. [Online]. Available: http://arxiv.org/abs/2207.10062
- C. Fernández-Loría and F. Provost, “Causal Decision Making and Causal Effect Estimation Are Not the Same…and Why It Matters,” INFORMS Journal on Data Science, vol. 1, no. 1, pp. 4–16, 2022, _eprint: https://doi.org/10.1287/ijds.2021.0006. [Online]. Available: https://doi.org/10.1287/ijds.2021.0006
- A. De Biasio, A. Montagna, F. Aiolli, and N. Navarin, “A systematic review of value-aware recommender systems,” Expert Systems with Applications, vol. 226, p. 120131, Sep. 2023. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0957417423006334
- E. Ie, C.-w. Hsu, M. Mladenov, V. Jain, S. Narvekar, J. Wang, R. Wu, and C. Boutilier, “RecSim: A Configurable Simulation Platform for Recommender Systems,” Sep. 2019, arXiv:1909.04847 [cs, stat]. [Online]. Available: http://arxiv.org/abs/1909.04847
- T. Broderick, A. Gelman, R. Meager, A. L. Smith, and T. Zheng, “Toward a taxonomy of trust for probabilistic machine learning,” Science Advances, vol. 9, no. 7, p. eabn3999, Feb. 2023, publisher: American Association for the Advancement of Science. [Online]. Available: https://www.science.org/doi/10.1126/sciadv.abn3999
- N. Patki, R. Wedge, and K. Veeramachaneni, “The Synthetic Data Vault,” in 2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA), Oct. 2016, pp. 399–410.
- J. Jordon, L. Szpruch, F. Houssiau, M. Bottarelli, G. Cherubin, C. Maple, S. N. Cohen, and A. Weller, “Synthetic Data – what, why and how?” May 2022, arXiv:2205.03257 [cs]. [Online]. Available: http://arxiv.org/abs/2205.03257
- H. Wilde, J. Jewson, S. Vollmer, and C. Holmes, “Foundations of Bayesian Learning from Synthetic Data,” in Proceedings of The 24th International Conference on Artificial Intelligence and Statistics. PMLR, Mar. 2021, pp. 541–549, iSSN: 2640-3498. [Online]. Available: https://proceedings.mlr.press/v130/wilde21a.html
- S. Athey, R. Chetty, G. Imbens, and H. Kang, “Estimating Treatment Effects using Multiple Surrogates: The Role of the Surrogate Score and the Surrogate Index,” Feb. 2020, arXiv:1603.09326 [econ, stat]. [Online]. Available: c
- H. Parikh, C. Varjao, L. Xu, and E. T. Tchetgen, “Validating Causal Inference Methods,” in Proceedings of the 39th International Conference on Machine Learning. PMLR, Jun. 2022, pp. 17 346–17 358, iSSN: 2640-3498. [Online]. Available: https://proceedings.mlr.press/v162/parikh22a.html
- R. L. Andrews, I. S. Currim, and P. S. H. Leeflang, “A Comparison of Sales Response Predictions From Demand Models Applied to Store-Level versus Panel Data,” Journal of Business & Economic Statistics, vol. 29, no. 2, pp. 319–326, Apr. 2011. [Online]. Available: http://www.tandfonline.com/doi/abs/10.1198/jbes.2010.07225
- P. C. Fishburn, “Utility Theory,” Management Science, vol. 14, no. 5, pp. 335–378, Jan. 1968, publisher: INFORMS. [Online]. Available: https://pubsonline.informs.org/doi/abs/10.1287/mnsc.14.5.335
- J. Chiang, “A Simultaneous Approach to the Whether, What and How Much to Buy Questions,” Marketing Science, vol. 10, no. 4, pp. 297–315, Nov. 1991, publisher: INFORMS. [Online]. Available: https://pubsonline.informs.org/doi/abs/10.1287/mksc.10.4.297
- M. Wan, D. Wang, M. Goldman, M. Taddy, J. Rao, J. Liu, D. Lymberopoulos, and J. McAuley, “Modeling Consumer Preferences and Price Sensitivities from Large-Scale Grocery Shopping Transaction Logs,” in Proceedings of the 26th International Conference on World Wide Web. Perth Australia: International World Wide Web Conferences Steering Committee, Apr. 2017, pp. 1103–1112. [Online]. Available: https://dl.acm.org/doi/10.1145/3038912.3052568
- Y. Ekinci, F. Ulengin, and N. Uray, “Using customer lifetime value to plan optimal promotions,” The Service Industries Journal, vol. 34, no. 2, pp. 103–122, Jan. 2014. [Online]. Available: http://www.tandfonline.com/doi/abs/10.1080/02642069.2013.763929
- J. Romero, R. van der Lans, and B. Wierenga, “A Partially Hidden Markov Model of Customer Dynamics for CLV Measurement,” Journal of Interactive Marketing, vol. 27, no. 3, pp. 185–208, Aug. 2013, publisher: SAGE Publications. [Online]. Available: https://journals.sagepub.com/doi/abs/10.1016/j.intmar.2013.04.003
- P. S. Fader, B. G. S. Hardie, and K. L. Lee, ““Counting Your Customers” the Easy Way: An Alternative to the Pareto/NBD Model,” Marketing Science, vol. 24, no. 2, pp. 275–284, May 2005. [Online]. Available: https://pubsonline.informs.org/doi/10.1287/mksc.1040.0098
- P. S. Fader and B. G. Hardie, “Probability Models for Customer-Base Analysis,” Journal of Interactive Marketing, vol. 23, no. 1, pp. 61–69, Feb. 2009. [Online]. Available: http://journals.sagepub.com/doi/10.1016/j.intmar.2008.11.003
- D. G. Morrison and D. C. Schmittlein, “Generalizing the NBD Model for Customer Purchases: What Are the Implications and Is It Worth the Effort?” Journal of Business & Economic Statistics, vol. 6, no. 2, p. 145, Apr. 1988. [Online]. Available: https://www.jstor.org/stable/1391551?origin=crossref
- R. Montoya, O. Netzer, and K. Jedidi, “Dynamic Marketing Resource Allocation for Long-Term Profitability: A Pharmaceutical Application,” SSRN Electronic Journal, p. 39, 2008.
- O. Netzer, J. M. Lattin, and V. Srinivasan, “A Hidden Markov Model of Customer Relationship Dynamics,” Marketing Science, vol. 27, no. 2, pp. 185–204, Mar. 2008, publisher: INFORMS. [Online]. Available: https://pubsonline.informs.org/doi/abs/10.1287/mksc.1070.0294
- G. Liberali and A. Ferecatu, “Morphing for Consumer Dynamics: Bandits Meet Hidden Markov Models,” Marketing Science, vol. 41, no. 4, pp. 769–794, Jul. 2022, publisher: INFORMS. [Online]. Available: https://pubsonline.informs.org/doi/abs/10.1287/mksc.2021.1346
- J. Gauci, E. Conti, Y. Liang, K. Virochsiri, Y. He, Z. Kaden, V. Narayanan, X. Ye, Z. Chen, and S. Fujimoto, “Horizon: Facebook’s Open Source Applied Reinforcement Learning Platform,” Sep. 2019, arXiv:1811.00260 [cs, stat]. [Online]. Available: http://arxiv.org/abs/1811.00260
- D. Hafner, J. Davidson, and V. Vanhoucke, “TensorFlow Agents: Efficient Batched Reinforcement Learning in TensorFlow,” Oct. 2018, arXiv:1709.02878 [cs]. [Online]. Available: http://arxiv.org/abs/1709.02878
- M. R. O. Santana, L. C. Melo, F. H. F. Camargo, B. Brandão, A. Soares, R. M. Oliveira, and S. Caetano, “MARS-Gym: A Gym framework to model, train, and evaluate Recommender Systems for Marketplaces,” Sep. 2020, arXiv:2010.07035 [cs, stat]. [Online]. Available: http://arxiv.org/abs/2010.07035
- A. Kastius and R. Schlosser, “Dynamic Pricing under Competition using Reinforcement Learning,” Journal of Revenue & Pricing Management, vol. 21, Feb. 2022.
- R. Rana and F. S. Oliveira, “Real-time dynamic pricing in a non-stationary environment using model-free reinforcement learning,” Omega, vol. 47, pp. 116–126, Sep. 2014. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S030504831300100X
- S. Serth, N. Podlesny, M. Bornstein, J. Lindemann, J. Latt, J. Selke, R. Schlosser, M. Boissier, and M. Uflacker, “An Interactive Platform to Simulate Dynamic Pricing Competition on Online Marketplaces,” in 2017 IEEE 21st International Enterprise Distributed Object Computing Conference (EDOC), Oct. 2017, pp. 61–66, iSSN: 2325-6362. [Online]. Available: https://ieeexplore.ieee.org/abstract/document/8089863
- R. Donnelly, F. R. Ruiz, D. Blei, and S. Athey, “Counterfactual Inference for Consumer Choice Across Many Product Categories,” Jun. 2019, arXiv:1906.02635 [cs, econ, stat]. [Online]. Available: http://arxiv.org/abs/1906.02635
- Dunnhumby, “The Complete Journey,” 2014. [Online]. Available: https://www.dunnhumby.com/source-files/
- R. Agrawal, H. Mannila, R. Srikant, H. Toivonen, and A. I. Verkamo, “Fast discovery of association rules,” in Advances in knowledge discovery and data mining. USA: American Association for Artificial Intelligence, Feb. 1996, pp. 307–328.
- T. Tulabandhula, D. Sinha, S. R. Karra, and P. Patidar, “Multi-Purchase Behavior: Modeling, Estimation and Optimization,” Aug. 2023, arXiv:2006.08055 [cs, econ]. [Online]. Available: http://arxiv.org/abs/2006.08055
- T. Akiba, S. Sano, T. Yanase, T. Ohta, and M. Koyama, “Optuna: A Next-generation Hyperparameter Optimization Framework,” Jul. 2019, arXiv:1907.10902 [cs, stat]. [Online]. Available: http://arxiv.org/abs/1907.10902
- S. Gabel and A. Timoshenko, “Product Choice with Large Assortments: A Scalable Deep-Learning Model,” Management Science, vol. 68, no. 3, pp. 1808–1827, Mar. 2022. [Online]. Available: https://pubsonline.informs.org/doi/10.1287/mnsc.2021.3969
- F. J. R. Ruiz, S. Athey, and D. M. Blei, “SHOPPER: A Probabilistic Model of Consumer Choice with Substitutes and Complements,” Jun. 2019, arXiv:1711.03560 [cs, econ, stat]. [Online]. Available: http://arxiv.org/abs/1711.03560
- T. Doan, N. Veira, S. Ray, and B. Keng, “Generating Realistic Sequences of Customer-level Transactions for Retail Datasets,” Sep. 2019, arXiv:1901.05577 [cs, stat]. [Online]. Available: http://arxiv.org/abs/1901.05577