EasyRL4Rec: An Easy-to-use Library for Reinforcement Learning Based Recommender Systems (2402.15164v3)
Abstract: Reinforcement Learning (RL)-Based Recommender Systems (RSs) have gained rising attention for their potential to enhance long-term user engagement. However, research in this field faces challenges, including the lack of user-friendly frameworks, inconsistent evaluation metrics, and difficulties in reproducing existing studies. To tackle these issues, we introduce EasyRL4Rec, an easy-to-use code library designed specifically for RL-based RSs. This library provides lightweight and diverse RL environments based on five public datasets and includes core modules with rich options, simplifying model development. It provides unified evaluation standards focusing on long-term outcomes and offers tailored designs for state modeling and action representation for recommendation scenarios. Furthermore, we share our findings from insightful experiments with current methods. EasyRL4Rec seeks to facilitate the model development and experimental process in the domain of RL-based RSs. The library is available for public use.
- Finite-time analysis of the multiarmed bandit problem. Machine learning 47 (2002), 235–256.
- BEARS: Towards an evaluation framework for bandit-based interactive recommender systems. REVEAL 18, October 6-7, 2018, Vancouver, Canada (2018).
- A distributional perspective on reinforcement learning. In International conference on machine learning. PMLR, 449–458.
- Reinforcing User Retention in a Billion Scale Short Video Recommender System. In Companion Proceedings of the ACM Web Conference 2023 (WWW ’23 Companion). Association for Computing Machinery, 421–426. https://doi.org/10.1145/3543873.3584640
- Two-Stage Constrained Actor-Critic for Short Video Recommendation. In Proceedings of the ACM Web Conference 2023. 865–875.
- Olivier Chapelle and Lihong Li. 2011. An empirical evaluation of thompson sampling. Advances in neural information processing systems 24 (2011).
- Top-k off-policy correction for a REINFORCE recommender system. In Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining. 456–464.
- Off-policy actor-critic for recommender systems. In Proceedings of the 16th ACM Conference on Recommender Systems. 338–349.
- Deep reinforcement learning from human preferences. Advances in neural information processing systems 30 (2017).
- Filter bubbles, echo chambers, and online news consumption. Public opinion quarterly 80, S1 (2016), 298–320.
- Benchmarking batch deep reinforcement learning algorithms. arXiv preprint arXiv:1910.01708 (2019).
- Addressing function approximation error in actor-critic methods. In International conference on machine learning. PMLR, 1587–1596.
- Off-policy deep reinforcement learning without exploration. In International conference on machine learning. PMLR, 2052–2062.
- Alleviating Matthew Effect of Offline Reinforcement Learning in Interactive Recommendation. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (Taipei, Taiwan) (SIGIR ’23). 11 pages. https://doi.org/10.1145/3539618.3591636
- KuaiRec: A Fully-Observed Dataset and Insights for Evaluating Recommender Systems. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management (Atlanta, GA, USA) (CIKM ’22). 540–550. https://doi.org/10.1145/3511808.3557220
- KuaiRand: An Unbiased Sequential Recommendation Dataset with Randomly Exposed Videos. In Proceedings of the 31st ACM International Conference on Information and Knowledge Management (Atlanta, GA, USA) (CIKM ’22). 3953–3957. https://doi.org/10.1145/3511808.3557624
- CIRS: Bursting filter bubbles by counterfactual interactive recommender system. ACM Transactions on Information Systems (2022).
- DeepFM: A Factorization-Machine Based Neural Network for CTR Prediction. In Proceedings of the 26th International Joint Conference on Artificial Intelligence (Melbourne, Australia) (IJCAI’17). AAAI Press, 1725–1731.
- Session-based recommendations with recurrent neural networks. arXiv preprint arXiv:1511.06939 (2015).
- Keeping dataset biases out of the simulation: A debiased simulator for reinforcement learning based recommender systems. In Proceedings of the 14th ACM Conference on Recommender Systems. 190–199.
- Recsim: A configurable simulation platform for recommender systems. arXiv preprint arXiv:1909.04847 (2019).
- Kalervo Järvelin and Jaana Kekäläinen. 2002. Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems (TOIS) 20, 4 (2002), 422–446.
- Wang-Cheng Kang and Julian McAuley. 2018. Self-attentive sequential recommendation. In 2018 IEEE International Conference on Data Mining (ICDM). IEEE, 197–206.
- Vijay Konda and John Tsitsiklis. 1999. Actor-Critic Algorithms. In Advances in Neural Information Processing Systems, S. Solla, T. Leen, and K. Müller (Eds.), Vol. 12. MIT Press. https://proceedings.neurips.cc/paper_files/paper/1999/file/6449f44a102fde848669bdd9eb6b76fa-Paper.pdf
- Yehuda Koren. 2008. Factorization meets the neighborhood: a multifaceted collaborative filtering model. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining. 426–434.
- Matrix factorization techniques for recommender systems. Computer 42, 8 (2009), 30–37.
- Conservative q-learning for offline reinforcement learning. Advances in Neural Information Processing Systems 33 (2020), 1179–1191.
- A contextual-bandit approach to personalized news article recommendation. In Proceedings of the 19th international conference on World wide web. 661–670.
- RLlib: Abstractions for Distributed Reinforcement Learning. In International Conference on Machine Learning (ICML).
- Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015).
- State representation modeling for deep reinforcement learning based recommendation. Knowledge-Based Systems 205 (2020), 106170.
- Exploration and Regularization of the Latent Action Space in Recommendation. In Proceedings of the ACM Web Conference 2023. 833–844.
- Feedback loop and bias amplification in recommender systems. In Proceedings of the 29th ACM international conference on information & knowledge management. 2145–2148.
- Benjamin M. Marlin and Richard S. Zemel. 2009. Collaborative Prediction and Ranking with Non-Random Missing Data. In Proceedings of the Third ACM Conference on Recommender Systems (New York, New York, USA). Association for Computing Machinery, New York, NY, USA, 5–12. https://doi.org/10.1145/1639714.1639717
- Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602 (2013).
- Contrastive State Augmentations for Reinforcement Learning-Based Recommender Systems. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval. 922–931.
- Recogym: A reinforcement learning environment for the problem of product recommendation in online advertising. arXiv preprint arXiv:1808.00720 (2018).
- Open bandit dataset and pipeline: Towards realistic and reproducible off-policy evaluation. arXiv preprint arXiv:2008.07146 (2020).
- Collaborative filtering recommender systems. In The adaptive web. Springer, 291–324.
- Recommendations as Treatments: Debiasing Learning and Evaluation. In Proceedings of the 33rd International Conference on International Conference on Machine Learning (New York, NY, USA). JMLR.org, 1670–1679.
- Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017).
- An MDP-based recommender system. Journal of Machine Learning Research 6, 9 (2005).
- Virtual-taobao: Virtualizing real-world online retail environment for reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 4902–4909.
- Irec: An interactive recommendation framework. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval. 3165–3175.
- Mabwiser: A parallelizable contextual multi-armed bandit library for python. In 2019 IEEE 31st International Conference on Tools with Artificial Intelligence (ICTAI). IEEE, 909–914.
- Jiaxi Tang and Ke Wang. 2018. Personalized top-n sequential recommendation via convolutional sequence embedding. In Proceedings of the eleventh ACM international conference on web search and data mining. 565–573.
- Michel Tokic. 2010. Adaptive ε𝜀\varepsilonitalic_ε-greedy exploration in reinforcement learning based on value differences. In KI 2010: Advances in Artificial Intelligence: 33rd Annual German Conference on AI, Karlsruhe, Germany, September 21-24, 2010. Proceedings 33. Springer, 203–210.
- Gymnasium. https://doi.org/10.5281/zenodo.8127026
- RL4RS: A Real-World Dataset for Reinforcement Learning based Recommender System. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2935–2944.
- Critic regularized regression. Advances in Neural Information Processing Systems 33 (2020), 7768–7778.
- Tianshou: A Highly Modularized Deep Reinforcement Learning Library. Journal of Machine Learning Research 23, 267 (2022), 1–6. http://jmlr.org/papers/v23/21-1127.html
- Ronald J Williams. 1992. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Reinforcement learning (1992), 5–32.
- Self-Supervised Reinforcement Learning for Recommender Systems. In Proceedings of the 43th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’20).
- Dynamic causal collaborative filtering. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management. 2301–2310.
- PrefRec: Recommender Systems with Human Preferences for Reinforcing Long-term User Engagement. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD ’23). Association for Computing Machinery, New York, NY, USA, 2874–2884. https://doi.org/10.1145/3580305.3599473
- A simple convolutional generative network for next item recommendation. In Proceedings of the twelfth ACM international conference on web search and data mining. 582–590.
- Multi-task fusion via reinforcement learning for long-term user satisfaction in recommender systems. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 4510–4520.
- KuaiSim: A comprehensive simulator for recommender systems. arXiv preprint arXiv:2309.12645 (2023).
- Deep reinforcement learning for page-wise recommendations. In Proceedings of the 12th ACM conference on recommender systems. 95–103.
- Recommendations with negative feedback via pairwise deep reinforcement learning. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 1040–1048.
- DRN: A deep reinforcement learning framework for news recommendation. In Proceedings of the 2018 world wide web conference. 167–176.