End-to-end Training for Recommendation with Language-based User Profiles (2410.18870v2)
Abstract: There is a growing interest in natural language-based user profiles for recommender systems, which aims to enhance transparency and scrutability compared with embedding-based methods. Existing studies primarily generate these profiles using zero-shot inference from LLMs, but their quality remains insufficient, leading to suboptimal recommendation performance. In this paper, we introduce LangPTune, the first end-to-end training framework to optimize LLM-generated user profiles. Our method significantly outperforms zero-shot approaches by explicitly training the LLM for the recommendation objective. Through extensive evaluations across diverse training configurations and benchmarks, we demonstrate that LangPTune not only surpasses zero-shot baselines but can also matches the performance of state-of-the-art embedding-based methods. Finally, we investigate whether the training procedure preserves the interpretability of these profiles compared to zero-shot inference through both GPT-4 simulations and crowdworker user studies. Implementation of LangPTune can be found at https://github.com/ZhaolinGao/LangPTune.
- On the Theory of Policy Gradient Methods: Optimality, Approximation, and Distribution Shift. arXiv:1908.00261 [cs.LG] https://arxiv.org/abs/1908.00261
- Anthropic. 2024. Introducing the next generation of Claude. https://www.anthropic.com/news/claude-3-family
- A General Theoretical Paradigm to Understand Learning from Human Preferences. arXiv:2310.12036 [cs.AI] https://arxiv.org/abs/2310.12036
- J. Andrew Bagnell and Jeff Schneider. 2003. Covariant policy search. In Proceedings of the 18th International Joint Conference on Artificial Intelligence (Acapulco, Mexico) (IJCAI’03). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1019–1024.
- Constitutional AI: Harmlessness from AI Feedback. arXiv:2212.08073 [cs.CL] https://arxiv.org/abs/2212.08073
- TALLRec: An Effective and Efficient Tuning Framework to Align Large Language Model with Recommendation. In Proceedings of the 17th ACM Conference on Recommender Systems (RecSys ’23). ACM. https://doi.org/10.1145/3604915.3608857
- Amazon’s Mechanical Turk: A new source of inexpensive, yet high-quality data? In Methodological Issues and Strategies in Clinical Research (4th ed.), Alan E. Kazdin (Ed.). American Psychological Association, 133–139. https://doi.org/10.1037/14805-009
- A Simple Framework for Contrastive Learning of Visual Representations. arXiv:2002.05709 [cs.LG] https://arxiv.org/abs/2002.05709
- Cheng-Han Chiang and Hung yi Lee. 2023. Can Large Language Models Be an Alternative to Human Evaluations? arXiv:2305.01937 [cs.CL] https://arxiv.org/abs/2305.01937
- Deep reinforcement learning from human preferences. In Proceedings of the 31st International Conference on Neural Information Processing Systems (Long Beach, California, USA) (NIPS’17). Curran Associates Inc., Red Hook, NY, USA, 4302–4310.
- Deep Neural Networks for YouTube Recommendations. In Proceedings of the 10th ACM Conference on Recommender Systems (Boston, Massachusetts, USA) (RecSys ’16). Association for Computing Machinery, New York, NY, USA, 191–198. https://doi.org/10.1145/2959100.2959190
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv:1810.04805 [cs.CL] https://arxiv.org/abs/1810.04805
- Simplify and Robustify Negative Sampling for Implicit Collaborative Filtering. In Advances in Neural Information Processing Systems.
- A Multi-View Deep Learning Approach for Cross Domain User Modeling in Recommendation Systems. In Proceedings of the 24th International Conference on World Wide Web (Florence, Italy) (WWW ’15). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, CHE, 278–288. https://doi.org/10.1145/2736277.2741667
- KTO: Model Alignment as Prospect Theoretic Optimization. arXiv:2402.01306 [cs.LG] https://arxiv.org/abs/2402.01306
- REBEL: Reinforcement Learning via Regressing Relative Rewards. arXiv:2404.16767 [cs.LG] https://arxiv.org/abs/2404.16767
- MCL: Mixed-Centric Loss for Collaborative Filtering. In Proceedings of the ACM Web Conference 2022 (Virtual Event, Lyon, France) (WWW ’22). Association for Computing Machinery, New York, NY, USA, 2339–2347. https://doi.org/10.1145/3485447.3512106
- Recommendation as Language Processing (RLP): A Unified Pretrain, Personalized Prompt & Predict Paradigm (P5). https://doi.org/10.48550/arXiv.2203.13366 arXiv:2203.13366 [cs].
- Peter D. Grünwald and A. Philip Dawid. 2004. Game theory, maximum entropy, minimum discrepancy and robust Bayesian decision theory. The Annals of Statistics 32, 4 (Aug. 2004). https://doi.org/10.1214/009053604000000553
- Leveraging Large Language Models for Sequential Recommendation. In Proceedings of the 17th ACM Conference on Recommender Systems (RecSys ’23). ACM. https://doi.org/10.1145/3604915.3610639
- LightGCN: Simplifying and Powering Graph Convolution Network for Recommendation. arXiv:2002.02126 [cs.IR] https://arxiv.org/abs/2002.02126
- Bridging Language and Items for Retrieval and Recommendation. arXiv preprint arXiv:2403.03952 (2024).
- LoRA: Low-Rank Adaptation of Large Language Models. arXiv:2106.09685 [cs.CL] https://arxiv.org/abs/2106.09685
- Collaborative Filtering for Implicit Feedback Datasets. In 2008 Eighth IEEE International Conference on Data Mining. 263–272. https://doi.org/10.1109/ICDM.2008.22
- The 37 Implementation Details of Proximal Policy Optimization. In ICLR Blog Track. https://iclr-blog-track.github.io/2022/03/25/ppo-implementation-details/ https://iclr-blog-track.github.io/2022/03/25/ppo-implementation-details/.
- GenRec: Large Language Model for Generative Recommendation. arXiv:2307.00457 [cs.IR] https://arxiv.org/abs/2307.00457
- Sham M Kakade. 2001. A Natural Policy Gradient. In Advances in Neural Information Processing Systems, T. Dietterich, S. Becker, and Z. Ghahramani (Eds.), Vol. 14. MIT Press. https://proceedings.neurips.cc/paper_files/paper/2001/file/4b86abe48d358ecf194c56c69108433e-Paper.pdf
- Do LLMs Understand User Preferences? Evaluating LLMs On User Rating Prediction. arXiv:2305.06474 [cs.IR] https://arxiv.org/abs/2305.06474
- Thomas N. Kipf and Max Welling. 2017. Semi-Supervised Classification with Graph Convolutional Networks. arXiv:1609.02907 [cs.LG] https://arxiv.org/abs/1609.02907
- Matrix Factorization Techniques for Recommender Systems. Computer 42, 8 (2009), 30–37. https://doi.org/10.1109/MC.2009.263
- Daniel Lee and H. Sebastian Seung. 2000. Algorithms for Non-negative Matrix Factorization. In Advances in Neural Information Processing Systems, T. Leen, T. Dietterich, and V. Tresp (Eds.), Vol. 13. MIT Press. https://proceedings.neurips.cc/paper_files/paper/2000/file/f9d1152547c0bde01830b7e8bd60024c-Paper.pdf
- Open Source Strikes Bread - New Fluffy Embeddings Model. https://www.mixedbread.ai/blog/mxbai-embed-large-v1
- CTRL: Connect Collaborative and Language Model for CTR Prediction. https://doi.org/10.48550/arXiv.2306.02841 arXiv:2306.02841 [cs].
- LLaRA: Large Language-Recommendation Assistant. arXiv:2312.02445 [cs.IR] https://arxiv.org/abs/2312.02445
- Is ChatGPT a Good Recommender? A Preliminary Study. arXiv:2304.10149 [cs.IR] https://arxiv.org/abs/2304.10149
- Session-based Recommendation with Transformers. In Proceedings of the Recommender Systems Challenge 2022 (Seattle, WA, USA) (RecSysChallenge ’22). Association for Computing Machinery, New York, NY, USA, 29–33. https://doi.org/10.1145/3556702.3556844
- LLM-Rec: Personalized Recommendation via Prompting Large Language Models. arXiv:2307.15780 [cs.CL] https://arxiv.org/abs/2307.15780
- Meta. 2024. Introducing Meta Llama 3: The most capable openly available LLM to date. https://ai.meta.com/blog/meta-llama-3/
- OpenAI. 2023. Gpt-4 technical report.
- On Natural Language User Profiles for Transparent and Scrutable Recommendation. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’22). ACM. https://doi.org/10.1145/3477495.3531873
- Direct Preference Optimization: Your Language Model is Secretly a Reward Model. arXiv:2305.18290 [cs.LG]
- Transparent and Scrutable Recommendations Using Natural Language User Profiles. arXiv:2402.05810 [cs.IR] https://arxiv.org/abs/2402.05810
- Representation Learning with Large Language Models for Recommendation. In Proceedings of the ACM Web Conference 2024 (WWW ’24, Vol. 33). ACM, 3464–3475. https://doi.org/10.1145/3589334.3645458
- BPR: Bayesian Personalized Ranking from Implicit Feedback. arXiv:1205.2618 [cs.IR] https://arxiv.org/abs/1205.2618
- Large Language Models are Competitive Near Cold-start Recommenders for Language- and Item-based Preferences. In Proceedings of the 17th ACM Conference on Recommender Systems (Singapore, Singapore) (RecSys ’23). Association for Computing Machinery, New York, NY, USA, 890–896. https://doi.org/10.1145/3604915.3608845
- Proximal Policy Optimization Algorithms. arXiv:1707.06347 [cs.LG] https://arxiv.org/abs/1707.06347
- BERT4Rec: Sequential Recommendation with Bidirectional Encoder Representations from Transformer. arXiv:1904.06690 [cs.IR] https://arxiv.org/abs/1904.06690
- Gemma Team. 2024a. Gemma 2: Improving Open Language Models at a Practical Size. arXiv:2408.00118 [cs.CL] https://arxiv.org/abs/2408.00118
- Llama Team. 2024b. The Llama 3 Herd of Models. arXiv:2407.21783 [cs.AI] https://arxiv.org/abs/2407.21783
- Representation Learning with Contrastive Predictive Coding. arXiv:1807.03748 [cs.LG] https://arxiv.org/abs/1807.03748
- Neural graph collaborative filtering. In Proceedings of the ACM SIGIR International conference on Research and development in Information Retrieval.
- Towards Open-World Recommendation with Knowledge Augmentation from Large Language Models. https://doi.org/10.48550/arXiv.2306.10933 arXiv:2306.10933 [cs].
- Prompting Large Language Models for Recommender Systems: A Comprehensive Framework and Empirical Analysis. https://doi.org/10.48550/arXiv.2401.04997 arXiv:2401.04997 [cs].
- PALR: Personalization Aware LLMs for Recommendation. arXiv:2305.07622 [cs.IR] https://arxiv.org/abs/2305.07622
- Sequential Recommendation with Latent Relations based on Large Language Model. arXiv:2403.18348 [cs.IR] https://arxiv.org/abs/2403.18348
- Not All Embeddings are Created Equal: Towards Robust Cross-domain Recommendation via Contrastive Learning. In Proceedings of the ACM Web Conference 2024 (Singapore, Singapore) (WWW ’24). Association for Computing Machinery, New York, NY, USA, 3195–3206. https://doi.org/10.1145/3589334.3645357
- CoLLM: Integrating Collaborative Embeddings into Large Language Models for Recommendation. arXiv:2310.19488 [cs.IR] https://arxiv.org/abs/2310.19488
- Language-Based User Profiles for Recommendation. arXiv:2402.15623 [cs.CL] https://arxiv.org/abs/2402.15623
- Starling-7B: Improving LLM Helpfulness & Harmlessness with RLAIF.
- Maximum entropy inverse reinforcement learning.. In Aaai, Vol. 8. Chicago, IL, USA, 1433–1438.
- Fine-Tuning Language Models from Human Preferences. arXiv:1909.08593 [cs.CL] https://arxiv.org/abs/1909.08593