Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
184 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Simulation-Based Benchmarking of Reinforcement Learning Agents for Personalized Retail Promotions (2405.10469v1)

Published 16 May 2024 in cs.AI, cs.LG, econ.EM, and stat.ML

Abstract: The development of open benchmarking platforms could greatly accelerate the adoption of AI agents in retail. This paper presents comprehensive simulations of customer shopping behaviors for the purpose of benchmarking reinforcement learning (RL) agents that optimize coupon targeting. The difficulty of this learning problem is largely driven by the sparsity of customer purchase events. We trained agents using offline batch data comprising summarized customer purchase histories to help mitigate this effect. Our experiments revealed that contextual bandit and deep RL methods that are less prone to over-fitting the sparse reward distributions significantly outperform static policies. This study offers a practical framework for simulating AI agents that optimize the entire retail customer journey. It aims to inspire the further development of simulation tools for retail AI systems.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (19)
  1. Optuna: A Next-generation Hyperparameter Optimization Framework, July 2019. URL http://arxiv.org/abs/1907.10902. arXiv:1907.10902 [cs, stat].
  2. Amazon Web Services. Aws batch, 2023a. URL https://aws.amazon.com/batch/.
  3. Amazon Web Services. Amazon ec2 instance types. https://aws.amazon.com/ec2/instance-types/, 2023b. Accessed: May 11, 2024.
  4. Simulations in recommender systems: An industry perspective. (arXiv:2109.06723), September 2021. doi: 10.48550/arXiv.2109.06723. URL http://arxiv.org/abs/2109.06723. arXiv:2109.06723 [cs].
  5. George Fei. Contextual bandit for marketing treatment optimization, October 2021. URL https://www.aboutwayfair.com/careers/tech-blog/contextual-bandit-for-marketing-treatment-optimization.
  6. Patric Glynn. Your client engagement program isn’t doing what you think it is. | stitch fix technology – multithreaded, November 2018. URL https://multithreaded.stitchfix.com/blog/2018/11/08/bandits/.
  7. TF-Agents: A library for reinforcement learning in tensorflow. https://github.com/tensorflow/agents, 2018. URL https://github.com/tensorflow/agents. [Online; accessed 25-June-2019].
  8. RecSim: A Configurable Simulation Platform for Recommender Systems, September 2019. URL http://arxiv.org/abs/1909.04847. arXiv:1909.04847 [cs, stat].
  9. An application of causal bandit to content optimization. In Recsys 2022 Workshop on Online Recommender Systems and User Modeling, 2022. URL https://www.amazon.science/publications/an-application-of-causal-bandit-to-content-optimization.
  10. Recommender systems for personalized user experience: Lessons learned at booking.com. In Proceedings of the 15th ACM Conference on Recommender Systems, RecSys ’21, pp.  583–586, New York, NY, USA, 2021. Association for Computing Machinery. ISBN 978-1-4503-8458-2. doi: 10.1145/3460231.3474611. URL https://doi.org/10.1145/3460231.3474611. event-place: Amsterdam, Netherlands.
  11. Ilya Katsov. Introduction to Algorithmic Marketing: Artificial Intelligence for Marketing Operations. 2017. ISBN 0-692-98904-8.
  12. Estimating mutual information. Physical Review E, 69(6), June 2004. ISSN 1550-2376. doi: 10.1103/physreve.69.066138. URL http://dx.doi.org/10.1103/PhysRevE.69.066138.
  13. Artificial Intelligence Index Report 2024. April 2024. URL https://policycommons.net/artifacts/12089781/hai_ai-index-report-2024/.
  14. Composable effects for flexible and accelerated probabilistic programming in numpyro. arXiv preprint arXiv:1912.11554, 2019.
  15. MARS-Gym: A Gym framework to model, train, and evaluate Recommender Systems for Marketplaces, September 2020. URL http://arxiv.org/abs/2010.07035. arXiv:2010.07035 [cs, stat].
  16. An Interactive Platform to Simulate Dynamic Pricing Competition on Online Marketplaces. In 2017 IEEE 21st International Enterprise Distributed Object Computing Conference (EDOC), pp.  61–66, October 2017. doi: 10.1109/EDOC.2017.17. URL https://ieeexplore.ieee.org/abstract/document/8089863. ISSN: 2325-6362.
  17. Davis Treybig. The experimentation gap, February 2022. URL https://towardsdatascience.com/the-experimentation-gap-3f5d374d354c.
  18. RetailSynth: Synthetic Data Generation for Retail AI Systems Evaluation, December 2023. URL http://arxiv.org/abs/2312.14095. arXiv:2312.14095 [cs, econ, stat].
  19. Pearl: A Production-ready Reinforcement Learning Agent, December 2023. URL http://arxiv.org/abs/2312.03814. arXiv:2312.03814 [cs].
Citations (1)

Summary

We haven't generated a summary for this paper yet.