Optimization Methods for Personalizing Large Language Models through Retrieval Augmentation (2404.05970v1)
Abstract: This paper studies retrieval-augmented approaches for personalizing LLMs, which potentially have a substantial impact on various applications and domains. We propose the first attempt to optimize the retrieval models that deliver a limited number of personal documents to LLMs for the purpose of personalized generation. We develop two optimization algorithms that solicit feedback from the downstream personalized generation tasks for retrieval optimization -- one based on reinforcement learning whose reward function is defined using any arbitrary metric for personalized generation and another based on knowledge distillation from the downstream LLM to the retrieval model. This paper also introduces a pre- and post-generation retriever selection model that decides what retriever to choose for each LLM input. Extensive experiments on diverse tasks from the LLM personalization (LaMP) benchmark reveal statistically significant improvements in six out of seven datasets.
- PENS: A Dataset and Generic Framework for Personalized News Headline Generation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Association for Computational Linguistics, Online, 82–92. https://doi.org/10.18653/v1/2021.acl-long.7
- BERT-QPP: Contextualized Pre-trained transformers for Query Performance Prediction. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management (Virtual Event, Queensland, Australia) (CIKM ’21). Association for Computing Machinery, New York, NY, USA, 2857–2861. https://doi.org/10.1145/3459637.3482063
- Predicting Efficiency/Effectiveness Trade-Offs for Dense vs. Sparse Retrieval Strategy Selection. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management (Virtual Event, Queensland, Australia) (CIKM ’21). Association for Computing Machinery, New York, NY, USA, 2862–2866. https://doi.org/10.1145/3459637.3482159
- Longformer: The Long-Document Transformer. arXiv:2004.05150 [cs.CL]
- Modeling the Impact of Short- and Long-Term Behavior on Search Personalization. In Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval (Portland, Oregon, USA) (SIGIR ’12). Association for Computing Machinery, New York, NY, USA, 185–194. https://doi.org/10.1145/2348283.2348312
- Learning to Rank: From Pairwise Approach to Listwise Approach. In Proceedings of the 24th International Conference on Machine Learning (Corvalis, Oregon, USA) (ICML ’07). Association for Computing Machinery, New York, NY, USA, 129–136. https://doi.org/10.1145/1273496.1273513
- D. Carmel and E. Yom-Tov. 2010. Estimating the Query Difficulty for Information Retrieval (1st ed.). Morgan and Claypool Publishers.
- Hyung Won Chung et al. 2022. Scaling Instruction-Finetuned Language Models. arXiv:2210.11416 [cs.LG]
- Reciprocal Rank Fusion Outperforms Condorcet and Individual Rank Learning Methods. In Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval (Boston, MA, USA) (SIGIR ’09). Association for Computing Machinery, New York, NY, USA, 758–759. https://doi.org/10.1145/1571941.1572114
- Relevance Feedback and Personalization: A Language Modeling Perspective. In DELOS Workshop: Personalisation and Recommender Systems in Digital Libraries. http://citeseer.ist.psu.edu/453602.html
- Improved query performance prediction using standard deviation. In Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval (Beijing, China) (SIGIR ’11). Association for Computing Machinery, New York, NY, USA, 1089–1090. https://doi.org/10.1145/2009916.2010063
- A Relative Information Gain-based Query Performance Prediction Framework with Generated Query Variants. ACM Trans. Inf. Syst. 41, 2, Article 38 (dec 2022), 31 pages. https://doi.org/10.1145/3545112
- Refocusing on Relevance: Personalization in NLG. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Online and Punta Cana, Dominican Republic, 5190–5202. https://doi.org/10.18653/v1/2021.emnlp-main.421
- Susan T. Dumais. 2016. Personalized Search: Potential and Pitfalls. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management (Indianapolis, Indiana, USA) (CIKM ’16). Association for Computing Machinery, New York, NY, USA, 689. https://doi.org/10.1145/2983323.2983367
- Mohamed Farah and Daniel Vanderpooten. 2007. An Outranking Approach for Rank Aggregation in Information Retrieval. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (Amsterdam, The Netherlands) (SIGIR ’07). Association for Computing Machinery, New York, NY, USA, 591–598. https://doi.org/10.1145/1277741.1277843
- Lucie Flek. 2020. Returning the N to NLP: Towards Contextually Personalized Classification Models. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, 7828–7838. https://doi.org/10.18653/v1/2020.acl-main.700
- Effects of Language Modeling and Its Personalization on Touchscreen Typing Performance. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems (Seoul, Republic of Korea) (CHI ’15). Association for Computing Machinery, New York, NY, USA, 649–658. https://doi.org/10.1145/2702123.2702503
- Markus Freitag and Yaser Al-Onaizan. 2017. Beam Search Strategies for Neural Machine Translation. In Proceedings of the First Workshop on Neural Machine Translation, Thang Luong, Alexandra Birch, Graham Neubig, and Andrew Finch (Eds.). Association for Computational Linguistics, Vancouver, 56–60. https://doi.org/10.18653/v1/W17-3207
- Unsupervised Dense Information Retrieval with Contrastive Learning. Transactions on Machine Learning Research (2022). https://openreview.net/forum?id=jKN1pXi7b0
- Gautier Izacard and Edouard Grave. 2021. Distilling Knowledge from Reader to Retriever for Question Answering. In International Conference on Learning Representations. https://openreview.net/forum?id=NTEz-6wysdb
- Aaron Jaech and Mari Ostendorf. 2018. Personalized Language Model for Query Auto-Completion. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Association for Computational Linguistics, Melbourne, Australia, 700–705. https://doi.org/10.18653/v1/P18-2111
- Personalized Soups: Personalized Large Language Model Alignment via Post-hoc Parameter Merging. ArXiv abs/2310.11564 (2023). https://api.semanticscholar.org/CorpusID:264289231
- Selecting which Dense Retriever to use for Zero-Shot Search. arXiv:2309.09403 [cs.IR]
- Diederik P. Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Optimization. CoRR abs/1412.6980 (2014). https://api.semanticscholar.org/CorpusID:6628106
- Automatic Prompt Rewriting for Personalized Text Generation. ArXiv abs/2310.00152 (2023). https://api.semanticscholar.org/CorpusID:263333908
- Teach LLMs to Personalize - An Approach inspired by Writing Education. ArXiv abs/2308.07968 (2023). https://api.semanticscholar.org/CorpusID:260926523
- Pan Li and Alexander Tuzhilin. 2019. Towards Controllable and Personalized Review Generation. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, Hong Kong, China, 3237–3245. https://doi.org/10.18653/v1/D19-1319
- Chin-Yew Lin. 2004. ROUGE: A Package for Automatic Evaluation of Summaries. In Text Summarization Branches Out. Association for Computational Linguistics, Barcelona, Spain, 74–81. https://aclanthology.org/W04-1013
- A rank fusion approach based on score distributions for prioritizing relevance assessments in information retrieval evaluation. Information Fusion 39 (2018), 56–71. https://doi.org/10.1016/j.inffus.2017.04.001
- Generating Personalized Recipes from Historical User Preferences. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, Hong Kong, China, 5976–5982. https://doi.org/10.18653/v1/D19-1613
- Training Millions of Personalized Dialogue Agents. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Brussels, Belgium, 2775–2779. https://doi.org/10.18653/v1/D18-1298
- Distributed Representations of Words and Phrases and their Compositionality. In NIPS ’13. 3111–3119.
- UserIdentifier: Implicit User Representations for Simple and Effective Personalized Sentiment Analysis. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, Seattle, United States, 3449–3456. https://doi.org/10.18653/v1/2022.naacl-main.252
- Frederic Morin and Yoshua Bengio. 2005. Hierarchical Probabilistic Neural Network Language Model. In AISTATS ’05. 246–252.
- PEARL: Personalizing Large Language Model Writing Assistants with Generation-Calibrated Retrievers. https://api.semanticscholar.org/CorpusID:265213422
- Maxim Naumov et al. 2019. Deep Learning Recommendation Model for Personalization and Recommendation Systems. arXiv:1906.00091 [cs.IR]
- Rabia Nuray and Fazli Can. 2006. Automatic ranking of information retrieval systems using data fusion. Information Processing & Management 42, 3 (2006), 595–614. https://doi.org/10.1016/j.ipm.2005.03.023
- PyTorch: An Imperative Style, High-Performance Deep Learning Library. Curran Associates Inc., Red Hook, NY, USA.
- Pchatbot: A Large-Scale Dataset for Personalized Chatbot. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (Virtual Event, Canada) (SIGIR ’21). Association for Computing Machinery, New York, NY, USA, 2470–2477. https://doi.org/10.1145/3404835.3463239
- Integrating Summarization and Retrieval for Enhanced Personalization via Large Language Models. ArXiv abs/2310.20081 (2023). https://api.semanticscholar.org/CorpusID:264805263
- Okapi at TREC-3. In Text Retrieval Conference. https://api.semanticscholar.org/CorpusID:3946054
- Robust Standard Deviation Estimation for Query Performance Prediction. In Proceedings of the 2017 International ACM SIGIR Conference on the Theory of Information Retrieval (ICTIR ’17). 245–248.
- LaMP: When Large Language Models Meet Personalization. arXiv:2304.11406 [cs.CL]
- Predicting Query Performance by Query-Drift Estimation. ACM Transactions on Information Systems 30, 2 (May 2012).
- PERSON: Personalized information retrieval evaluation based on citation networks. Information Processing & Management 54, 4 (2018), 630–656. https://doi.org/10.1016/j.ipm.2018.04.004
- Personalised Language Modelling of Screen Characters Using Rich Metadata Annotations. arXiv preprint arXiv:2303.16618 (2023).
- Retrieve What You Need: A Mutual Learning Framework for Open-domain Question Answering. (March 2023). https://www.microsoft.com/en-us/research/publication/retrieve-what-you-need-a-mutual-learning-framework-for-open-domain-question-answering/
- Ronald J. Williams. 1992. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning. Mach. Learn. 8, 3–4 (may 1992), 229–256. https://doi.org/10.1007/BF00992696
- Personalized Response Generation via Generative Split Memory Network. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, Online, 1956–1970. https://doi.org/10.18653/v1/2021.naacl-main.157
- Compact Personalized Models for Neural Machine Translation. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Brussels, Belgium, 881–886. https://doi.org/10.18653/v1/D18-1104
- User Language Model for Collaborative Personalized Search. ACM Trans. Inf. Syst. 27, 2, Article 11 (mar 2009), 28 pages. https://doi.org/10.1145/1462198.1462203
- Sohee Yang and Minjoon Seo. 2020. Is Retriever Merely an Approximator of Reader? arXiv:2010.10999 [cs.CL]
- Hamed Zamani and W. Bruce Croft. 2017. Relevance-based Word Embedding. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval (Shinjuku, Tokyo, Japan) (SIGIR ’17). Association for Computing Machinery, New York, NY, USA, 505–514. https://doi.org/10.1145/3077136.3080831
- Neural Query Performance Prediction using Weak Supervision from Multiple Signals. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval (Ann Arbor, MI, USA) (SIGIR ’18). Association for Computing Machinery, New York, NY, USA, 105–114. https://doi.org/10.1145/3209978.3210041
- Retrieval-Enhanced Machine Learning. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (Madrid, Spain) (SIGIR ’22). Association for Computing Machinery, New York, NY, USA, 2875–2886. https://doi.org/10.1145/3477495.3531722
- A Personalized Dense Retrieval Framework for Unified Information Access. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (Taipei, Taiwan) (SIGIR ’23). Association for Computing Machinery, New York, NY, USA, 121–130. https://doi.org/10.1145/3539618.3591626
- Memory-Augmented LLM Personalization with Short- and Long-Term Memory Coordination. ArXiv abs/2309.11696 (2023). https://api.semanticscholar.org/CorpusID:262083954
- Personalizing Dialogue Agents: I have a dog, do you have pets too?. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Melbourne, Australia, 2204–2213. https://doi.org/10.18653/v1/P18-1205
- Query Specific Rank Fusion for Image Retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence 37, 4 (2015), 803–815. https://doi.org/10.1109/TPAMI.2014.2346201
- Less is More: Learning to Refine Dialogue History for Personalized Dialogue Generation. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, Seattle, United States, 5808–5820. https://doi.org/10.18653/v1/2022.naacl-main.426
- Yun Zhou and W. Bruce Croft. 2007. Query performance prediction in web search environments. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (Amsterdam, The Netherlands) (SIGIR ’07). Association for Computing Machinery, New York, NY, USA, 543–550. https://doi.org/10.1145/1277741.1277835
- Alireza Salemi (21 papers)
- Surya Kallumadi (15 papers)
- Hamed Zamani (88 papers)