Aligning GPTRec with Beyond-Accuracy Goals with Reinforcement Learning (2403.04875v1)
Abstract: Adaptations of Transformer models, such as BERT4Rec and SASRec, achieve state-of-the-art performance in the sequential recommendation task according to accuracy-based metrics, such as NDCG. These models treat items as tokens and then utilise a score-and-rank approach (Top-K strategy), where the model first computes item scores and then ranks them according to this score. While this approach works well for accuracy-based metrics, it is hard to use it for optimising more complex beyond-accuracy metrics such as diversity. Recently, the GPTRec model, which uses a different Next-K strategy, has been proposed as an alternative to the Top-K models. In contrast with traditional Top-K recommendations, Next-K generates recommendations item-by-item and, therefore, can account for complex item-to-item interdependencies important for the beyond-accuracy measures. However, the original GPTRec paper focused only on accuracy in experiments and needed to address how to optimise the model for complex beyond-accuracy metrics. Indeed, training GPTRec for beyond-accuracy goals is challenging because the interaction training data available for training recommender systems typically needs to be aligned with beyond-accuracy recommendation goals. To solve the misalignment problem, we train GPTRec using a 2-stage approach: in the first stage, we use a teacher-student approach to train GPTRec, mimicking the behaviour of traditional Top-K models; in the second stage, we use Reinforcement Learning to align the model for beyond-accuracy goals. In particular, we experiment with increasing recommendation diversity and reducing popularity bias. Our experiments on two datasets show that in 3 out of 4 cases, GPTRec's Next-K generation approach offers a better tradeoff between accuracy and secondary metrics than classic greedy re-ranking techniques.
- Reinforcement Learning Based Recommender Systems: A Survey. Comput. Surveys 55, 7 (2023), 1–38.
- A New System-Wide Diversity Measure for Recommendations with Efficient Algorithms. SIAM Journal on Mathematics of Data Science 1, 4 (2019), 759–779.
- A Model-Based Reinforcement Learning with Adversarial Training for Online Recommendation. In Proc. NeurIPS. 10735–10746.
- Rodrigo Borges and Kostas Stefanidis. 2021. On Mitigating Popularity Bias in Recommendations via Variational Autoencoders. In Proc. SAC. 1383–1389.
- Jaime Carbonell and Jade Goldstein. 1998. The Use of MMR, Diversity-Based Reranking for Reordering Documents and Producing Summaries. In Proc. SIGIR. 335–336.
- Generative Slate Recommendation with Reinforcement Learning. In Proc. WSDM. 580–588.
- F. Maxwell Harper and Joseph A. Konstan. 2015. The MovieLens Datasets: History and Context. ACM Transactions on Interactive Intelligent Systems (TiiS) 5, 4 (Dec. 2015), 19:1–19:19.
- Wang-Cheng Kang and Julian McAuley. 2018. Self-Attentive Sequential Recommendation. In Proc. ICDM. 197–206.
- Mesut Kaya. 2018. Accurate and Diverse Recommendations Using Item-Based SubProfiles. In Proc. FLAIRS.
- How Does Serendipity Affect Diversity in Recommender Systems? A Serendipity-Oriented Greedy Algorithm. Computing 102, 2 (2020), 393–411.
- Matevž Kunaver and Tomaž Požrl. 2017. Diversity in recommender systems–A survey. Knowledge-based systems 123 (2017), 154–162.
- Playing Atari with Deep Reinforcement Learning. arXiv:1312.5602 [cs]
- Training Language Models to Follow Instructions with Human Feedback. arXiv:2203.02155 [cs]
- Aleksandr Petrov and Craig Macdonald. 2024. RecJPQ: Training Large-Catalogue Sequential Recommenders. In Proc. WSDM.
- Aleksandr V. Petrov and Craig Macdonald. 2022a. Effective and Efficient Training for Sequential Recommendation Using Recency Sampling. In Proc. RecSys. 81–91.
- Aleksandr V. Petrov and Craig Macdonald. 2022b. A Systematic Review and Replicability Study of BERT4Rec for Sequential Recommendation. In Proc. RecSys. 436–447.
- Aleksandr V. Petrov and Craig Macdonald. 2023a. Generative Sequential Recommendation with GPTRec. In Proc. Gen-IR@SIGIR.
- Aleksandr V. Petrov and Craig Macdonald. 2023b. gSASRec: Reducing Overconfidence in Sequential Recommendation Trained with Negative Sampling. In Proc. RecSys. 116–128.
- Language Models Are Unsupervised Multitask Learners. OpenAI blog (2019).
- Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. Journal of Machine Learning Research 21, 140 (2020), 1–67.
- BPR: Bayesian Personalized Ranking from Implicit Feedback. In Proc. UAI. 452–461.
- Trust Region Policy Optimization. In Proc. ICML. 1889–1897.
- High-Dimensional Continuous Control Using Generalized Advantage Estimation. In Proc. ICLR.
- Proximal Policy Optimization Algorithms. arXiv:1707.06347 [cs]
- Mastering the Game of Go with Deep Neural Networks and Tree Search. Nature 529, 7587 (2016), 484–489.
- Choosing the Best of Both Worlds: Diverse and Novel Recommendations through Multi-Objective Reinforcement Learning. In Proc. WSDM. 957–965.
- BERT4Rec: Sequential Recommendation with Bidirectional Encoder Representations from Transformer. In Proc. CIKM. 1441–1450.
- Deep Reinforcement Learning with Attention for Slate Markov Decision Processes with High-Dimensional States and Actions. arXiv:1512.01124 [cs]
- Richard S. Sutton and Andrew G. Barto. 2018. Reinforcement Learning: An Introduction (second edition ed.). The MIT Press.
- TensorFlow Developers. 2023. TensorFlow. https://www.tensorflow.org/
- Attention Is All You Need. In Proc. NeurIPS.
- HuggingFace’s Transformers: State-of-the-art Natural Language Processing. arXiv:1910.03771 [cs]
- Self-Supervised Reinforcement Learning for Recommender Systems. In Proc. SIGIR. 931–940.
- Multi-Task Fusion via Reinforcement Learning for Long-Term User Satisfaction in Recommender Systems. In Proc. KDD. 4510–4520.
- Solving the Apparent Diversity-Accuracy Dilemma of Recommender Systems. Proceedings of the National Academy of Sciences 107, 10 (2010), 4511–4515.
- Reinforcement Learning to Optimize Long-term User Engagement in Recommender Systems. In Proc. KDD. 2810–2818.
- Aleksandr Petrov (10 papers)
- Craig Macdonald (49 papers)