Robust Reinforcement Learning Objectives for Sequential Recommender Systems (2305.18820v2)
Abstract: Attention-based sequential recommendation methods have shown promise in accurately capturing users' evolving interests from their past interactions. Recent research has also explored the integration of reinforcement learning (RL) into these models, in addition to generating superior user representations. By framing sequential recommendation as an RL problem with reward signals, we can develop recommender systems that incorporate direct user feedback in the form of rewards, enhancing personalization for users. Nonetheless, employing RL algorithms presents challenges, including off-policy training, expansive combinatorial action spaces, and the scarcity of datasets with sufficient reward signals. Contemporary approaches have attempted to combine RL and sequential modeling, incorporating contrastive-based objectives and negative sampling strategies for training the RL component. In this work, we further emphasize the efficacy of contrastive-based objectives paired with augmentation to address datasets with extended horizons. Additionally, we recognize the potential instability issues that may arise during the application of negative sampling. These challenges primarily stem from the data imbalance prevalent in real-world datasets, which is a common issue in offline RL contexts. Furthermore, we introduce an enhanced methodology aimed at providing a more effective solution to these challenges. Experimental results across several real datasets show our method with increased robustness and state-of-the-art performance.
- Reinforcement learning based recommender systems: A survey. ACM Computing Surveys, 55(7):1–38, 2022.
- A model-based reinforcement learning with adversarial training for online recommendation. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett (eds.), Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019. URL https://proceedings.neurips.cc/paper/2019/file/e49eb6523da9e1c347bc148ea8ac55d3-Paper.pdf.
- Recsys challenge 2015 and the yoochoose dataset. In Proceedings of the 9th ACM Conference on Recommender Systems, RecSys ’15, pp. 357–358, New York, NY, USA, 2015. Association for Computing Machinery. ISBN 9781450336925. doi: 10.1145/2792838.2798723. URL https://doi.org/10.1145/2792838.2798723.
- Sequential recommendation with graph neural networks. In Proceedings of the 44th international ACM SIGIR conference on research and development in information retrieval, pp. 378–387, 2021.
- Minmin Chen. Exploration in recommender systems. In Proceedings of the 15th ACM Conference on Recommender Systems, pp. 551–553, 2021.
- Top-k off-policy correction for a reinforce recommender system. In Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining, pp. 456–464, 2019a.
- Behavior sequence transformer for e-commerce recommendation in alibaba. In Proceedings of the 1st International Workshop on Deep Learning Practice for High-Dimensional Sparse Data, pp. 1–4, 2019b.
- A simple framework for contrastive learning of visual representations. In International conference on machine learning, pp. 1597–1607. PMLR, 2020.
- Generative adversarial user model for reinforcement learning based recommendation system. In International Conference on Machine Learning, pp. 1052–1061. PMLR, 2019c.
- A hybrid online-product recommendation system: Combining implicit rating-based collaborative filtering and sequential pattern analysis. electronic commerce research and applications, 11(4):309–317, 2012.
- Reward shaping for user satisfaction in a reinforce recommender. https://arxiv.org/abs/2209.15166, 2022a.
- Reward shaping for user satisfaction in a reinforce recommender. arXiv preprint arXiv:2209.15166, 2022b.
- An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=YicbFdNTTy.
- Simcse: Simple contrastive learning of sentence embeddings. In Marie-Francine Moens, Xuanjing Huang, Lucia Specia, and Scott Wen-tau Yih (eds.), Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, Virtual Event / Punta Cana, Dominican Republic, 7-11 November, 2021, pp. 6894–6910. Association for Computational Linguistics, 2021.
- A loss curvature perspective on training instability in deep learning. arXiv preprint arXiv:2110.04369, 2021.
- The movielens datasets: History and context. ACM Trans. Interact. Intell. Syst., 5(4), dec 2015. ISSN 2160-6455. doi: 10.1145/2827872. URL https://doi.org/10.1145/2827872.
- Momentum contrast for unsupervised visual representation learning. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13-19, 2020, pp. 9726–9735. Computer Vision Foundation / IEEE, 2020.
- Personalized travel sequence recommendation on multi-source big social media. IEEE Transactions on Big Data, 2(1):43–56, 2016.
- Kaggle. Retailrocket recommender system dataset. https://www.kaggle.com/datasets/retailrocket/ecommerce-dataset, 2017. Online; accessed 16 February 2023.
- Self-attentive sequential recommendation. pp. 197–206, 11 2018. doi: 10.1109/ICDM.2018.00035.
- Conservative q-learning for offline reinforcement learning. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (eds.), Advances in Neural Information Processing Systems, volume 33, pp. 1179–1191. Curran Associates, Inc., 2020. URL https://proceedings.neurips.cc/paper_files/paper/2020/file/0d2b2061826a5df3221116a5085a6052-Paper.pdf.
- Gpt4rec: A generative framework for personalized recommendation and user interests interpretation, 2023.
- Fast, effective, and self-supervised: Transforming masked language models into universal lexical and sentence encoders. In Marie-Francine Moens, Xuanjing Huang, Lucia Specia, and Scott Wen-tau Yih (eds.), Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, Virtual Event / Punta Cana, Dominican Republic, 7-11 November, 2021, pp. 1442–1459. Association for Computational Linguistics, 2021a.
- Contrastive learning for recommender system. arXiv preprint arXiv:2101.01317, 2021b.
- OpenAI. Gpt-4 technical report, 2023.
- Contrastive learning for representation degeneration problem in sequential recommendation. In Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining, WSDM ’22, pp. 813–823, New York, NY, USA, 2022. Association for Computing Machinery. ISBN 9781450391320. doi: 10.1145/3488560.3498433. URL https://doi.org/10.1145/3488560.3498433.
- Factorizing personalized markov chains for next-basket recommendation. In Proceedings of the 19th international conference on World wide web, pp. 811–820, 2010.
- Scaleformer: Iterative multi-scale refining transformers for time series forecasting. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=sCrnllCtjoE.
- Bert4rec: Sequential recommendation with bidirectional encoder representations from transformer. In Proceedings of the 28th ACM international conference on information and knowledge management, pp. 1441–1450, 2019.
- Richard S. Sutton. Learning to predict by the methods of temporal differences. In MACHINE LEARNING, pp. 9–44, 1988.
- Reinforcement Learning: An Introduction. The MIT Press, second edition, 2018.
- Locality and compositionality in zero-shot learning. In International Conference on Learning Representations, 2020. URL https://openreview.net/forum?id=Hye_V0NKwr.
- Jiaxi Tang and Ke Wang. Personalized top-n sequential recommendation via convolutional sequence embedding. In Proceedings of the eleventh ACM international conference on web search and data mining, pp. 565–573, 2018.
- Improving training stability for multitask ranking models in recommender systems. arXiv preprint arXiv:2302.09178, 2023.
- Representation learning with contrastive predictive coding. CoRR, abs/1807.03748, 2018. URL http://arxiv.org/abs/1807.03748.
- Attention is all you need. In I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (eds.), Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017. URL https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf.
- Sequential recommendation with multiple contrast signals. ACM Transactions on Information Systems, 41(1):1–27, 2023.
- Recurrent recommender networks. In Proceedings of the tenth ACM international conference on web search and data mining, pp. 495–503, 2017.
- Contrastive learning for sequential recommendation. https://arxiv.org/abs/2010.14395, 2020.
- Contrastive learning for sequential recommendation. In 2022 IEEE 38th international conference on data engineering (ICDE), pp. 1259–1273. IEEE, 2022.
- Self-supervised reinforcement learning for recommender systems. In Proceedings of the 43th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’20), 2020.
- Supervised advantage actor-critic for recommender systems. In Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining, WSDM ’22, pp. 1186–1196, New York, NY, USA, 2022. Association for Computing Machinery. ISBN 9781450391320. doi: 10.1145/3488560.3498494. URL https://doi.org/10.1145/3488560.3498494.
- Resact: Reinforcing long-term engagement in sequential recommendation with residual actor. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net, 2023. URL https://openreview.net/pdf?id=HmPOzJQhbwg.
- Yelp. Dataset of Yelp’s businesses. https://www.kaggle.com/datasets/yelp-dataset/yelp-dataset, 2021. Online; accessed 16 February 2023.
- Graph convolutional neural networks for web-scale recommender systems. In Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining, pp. 974–983, 2018.
- A dynamic recurrent model for next basket recommendation. In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval, pp. 729–732, 2016.
- A simple convolutional generative network for next item recommendation. In Proceedings of the twelfth ACM international conference on web search and data mining, pp. 582–590, 2019.
- Recommendations with negative feedback via pairwise deep reinforcement learning. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’18, pp. 1040–1048, New York, NY, USA, 2018. Association for Computing Machinery. ISBN 9781450355520. doi: 10.1145/3219819.3219886. URL https://doi.org/10.1145/3219819.3219886.
- Interactive collaborative filtering. In Proceedings of the 22nd ACM international conference on Information & Knowledge Management, pp. 1411–1420, 2013.
- S3-rec: Self-supervised learning for sequential recommendation with mutual information maximization. In Proceedings of the 29th ACM International Conference on Information and Knowledge Management, CIKM ’20, pp. 1893–1902, New York, NY, USA, 2020. Association for Computing Machinery. ISBN 9781450368599. doi: 10.1145/3340531.3411954. URL https://doi.org/10.1145/3340531.3411954.
- Dan: Deep attention neural network for news recommendation. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pp. 5973–5980, 2019.
- Melissa Mozifian (6 papers)
- Tristan Sylvain (20 papers)
- Dave Evans (2 papers)
- Lili Meng (23 papers)