Reinforcement Learning-based Recommender Systems with Large Language Models for State Reward and Action Modeling (2403.16948v1)
Abstract: Reinforcement Learning (RL)-based recommender systems have demonstrated promising performance in meeting user expectations by learning to make accurate next-item recommendations from historical user-item interactions. However, existing offline RL-based sequential recommendation methods face the challenge of obtaining effective user feedback from the environment. Effectively modeling the user state and shaping an appropriate reward for recommendation remains a challenge. In this paper, we leverage language understanding capabilities and adapt LLMs as an environment (LE) to enhance RL-based recommenders. The LE is learned from a subset of user-item interaction data, thus reducing the need for large training data, and can synthesise user feedback for offline data by: (i) acting as a state model that produces high quality states that enrich the user representation, and (ii) functioning as a reward model to accurately capture nuanced user preferences on actions. Moreover, the LE allows to generate positive actions that augment the limited offline training data. We propose a LE Augmentation (LEA) method to further improve recommendation performance by optimising jointly the supervised component and the RL policy, using the augmented actions and historical user signals. We use LEA, the state and reward models in conjunction with state-of-the-art RL recommenders and report experimental results on two publicly available datasets.
- Tallrec: An effective and efficient tuning framework to align large language model with recommendation. arXiv preprint arXiv:2305.00447 (2023).
- When large language models meet personalization: Perspectives of challenges and opportunities. arXiv preprint arXiv:2307.16376 (2023).
- Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).
- Enhancing job recommendation through llm-based generative adversarial networks. arXiv preprint arXiv:2307.10747 (2023).
- Recommender systems in the era of large language models (llms). arXiv preprint arXiv:2307.02046 (2023).
- Recommendation as language processing (rlp): A unified pretrain, personalized prompt & predict paradigm (p5). In Proceedings of the 16th ACM Conference on Recommender Systems. 299–315.
- Wes Gurnee and Max Tegmark. 2024. Language Models Represent Space and Time. In The Twelfth International Conference on Learning Representations. https://openreview.net/forum?id=jE8xbmvFin
- Hado Hasselt. 2010. Double Q-learning. Advances in neural information processing systems 23 (2010).
- Sparseadapter: An easy approach for improving the parameter-efficiency of adapters. arXiv preprint arXiv:2210.04284 (2022).
- Large language models as zero-shot conversational recommenders. In Proceedings of the 32nd ACM international conference on information and knowledge management. 720–730.
- Session-based recommendations with recurrent neural networks. arXiv preprint arXiv:1511.06939 (2015).
- Nonintrusive-sensing and reinforcement-learning based adaptive personalized music recommendation. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 1721–1724.
- Towards universal sequence representation learning for recommender systems. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 585–593.
- Parameter-efficient transfer learning for NLP. In International Conference on Machine Learning. PMLR, 2790–2799.
- Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685 (2021).
- Reinforcement learning to rank in e-commerce search engine: Formalization, analysis, and application. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 368–377.
- LLM-Adapters: An Adapter Family for Parameter-Efficient Fine-Tuning of Large Language Models. arXiv preprint arXiv:2304.01933 (2023).
- How to Index Item IDs for Recommendation Foundation Models. arXiv preprint arXiv:2305.06569 (2023).
- Kalervo Järvelin and Jaana Kekäläinen. 2002. Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems (TOIS) 20, 4 (2002), 422–446.
- Wang-Cheng Kang and Julian McAuley. 2018. Self-attentive sequential recommendation. In 2018 IEEE international conference on data mining (ICDM). IEEE, 197–206.
- Reward Design with Language Models. In The Eleventh International Conference on Learning Representations. https://openreview.net/forum?id=10uNUgI5Kl
- Emergent World Representations: Exploring a Sequence Model Trained on a Synthetic Task. In The Eleventh International Conference on Learning Representations. https://openreview.net/forum?id=DeG07_TcZvT
- Prompt distillation for efficient llm-based recommendation. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management. 1348–1357.
- How Can Recommender Systems Benefit from Large Language Models: A Survey. arXiv preprint arXiv:2306.05817 (2023).
- Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. Comput. Surveys 55, 9 (2023), 1–35.
- Recent advances in natural language processing via large pre-trained language models: A survey. Comput. Surveys 56, 2 (2023), 1–40.
- Optimal radio channel recommendations with explicit and implicit feedback. In Proceedings of the sixth ACM conference on Recommender systems. 75–82.
- Sentence-t5: Scalable sentence encoders from pre-trained text-to-text models. arXiv preprint arXiv:2108.08877 (2021).
- Justifying recommendations using distantly-labeled reviews and fine-grained aspects. In Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP). 188–197.
- Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems 35 (2022), 27730–27744.
- Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research 21, 1 (2020), 5485–5551.
- Recommender Systems with Generative Retrieval. arXiv preprint arXiv:2305.05065 (2023).
- Contrastive State Augmentations for Reinforcement Learning-Based Recommender Systems. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval. 922–931.
- Markus Schedl. 2016. The lfm-1b dataset for music retrieval and recommendation. In Proceedings of the 2016 ACM on international conference on multimedia retrieval. 103–110.
- Choosing the Best of Both Worlds: Diverse and Novel Recommendations through Multi-Objective Reinforcement Learning. In Proc. of the Fifteenth ACM International Conference on Web Search and Data Mining. 957–965.
- Jiaxi Tang and Ke Wang. 2018. Personalized top-n sequential recommendation via convolutional sequence embedding. In Proceedings of the eleventh ACM international conference on web search and data mining. 565–573.
- Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971 (2023).
- Transrec: Learning transferable recommendation from mixture-of-modality feedback. arXiv preprint arXiv:2206.06190 (2022).
- KERL: A knowledge-guided reinforcement learning model for sequential recommendation. In Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval. 209–218.
- Self-supervised reinforcement learning for recommender systems. In Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval. 931–940.
- Supervised advantage actor-critic for recommender systems. In Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining. 1186–1196.
- Rethinking Reinforcement Learning for Recommendation: A Prompt Perspective. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval. 1347–1357.
- A simple convolutional generative network for next item recommendation. In Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining. 582–590.
- A simple convolutional generative network for next item recommendation. In Proceedings of the twelfth ACM international conference on web search and data mining. 582–590.
- Gangyi Zhang. 2023. User-Centric Conversational Recommendation: Adapting the Need of User with Large Language Models. In Proceedings of the 17th ACM Conference on Recommender Systems. 1349–1354.
- A survey of large language models. arXiv preprint arXiv:2303.18223 (2023).
- Recommendations with negative feedback via pairwise deep reinforcement learning. In Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining. 1040–1048.
- Jie Wang (480 papers)
- Alexandros Karatzoglou (34 papers)
- Ioannis Arapakis (31 papers)
- Joemon M. Jose (27 papers)