Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Bridging the Preference Gap between Retrievers and LLMs (2401.06954v2)

Published 13 Jan 2024 in cs.CL
Bridging the Preference Gap between Retrievers and LLMs

Abstract: LLMs have demonstrated superior results across a wide range of tasks, and Retrieval-augmented Generation (RAG) is an effective way to enhance the performance by locating relevant information and placing it into the context window of the LLM. However, the relationship between retrievers and LLMs in a RAG is still under-investigated. Most existing work treats the retriever and the LLM as independent components and leaves a gap between retrieving human-"friendly" information and assembling a LLM-"friendly" context. In this work, we examine a novel bridge mechanism. We validate the ranking and selection assumptions of retrievers in the context of RAG and propose a framework that chains together supervised and reinforcement learning to train a bridge model that optimizes the connection between the retriever and the LLM. Empirical results demonstrate the effectiveness of our method in both question-answering and personalized generation tasks.

Introduction

Innovations in artificial intelligence have led to the development of formidable tools like LLMs and retrieval-augmented generation (RAG) techniques. LLMs, such as GPT-3 and PaLM2, have been breakthroughs in language processing, demonstrating remarkable performance on a multitude of tasks. RAG models enhance LLMs by integrating information retrieved from external datasets, thereby providing more contextually rich responses, particularly in complex tasks that require specific knowledge.

The Preference Gap

Notably, a lesser-discussed issue in the field of AI language processing is what researchers refer to as the 'preference gap' between retrievers and LLMs. This concept pertains to differences in data selection and ranking procedures preferred by users versus what's most effective for LLMs. Traditionally, designers focus on retrieval systems that mimic human reading behaviors, emphasizing the importance of presenting information in a top-to-bottom ranked format. However, LLMs may not align with this approach as their internal mechanics can focus on tokens non-sequentially. More critically, while humans can effortlessly ignore irrelevant content, LLMs can be easily swayed by such distractions, affecting their performance.

The paper in discussion highlights significant performance discrepancies when varying approaches to content selection and arrangement are applied within LLM contexts. This finding challenges the widely held belief regarding the significance of ranked retrieval and instead emphasizes the need for a tailored approach in the RAG system design that can bridge this preference gap.

Bridging the Gap with BGM

To address this preference gap, the paper proposes a framework called BGM (Bridging the Gap between retrievers and LLMs). The essential innovation is the introduction of a 'bridge model' that sits between the retriever and the LLM. This bridge model's purpose is to reformat retrieved information, making it more conducive for the LLM's successful interpretation. This approach has two facets: supervised learning (SL) to constrain the bridge model and reinforcement learning (RL) to optimize policy and improve downstream task performance.

The bridge model is a sequence-to-sequence model, which is trained to not only re-rank but also select the most appropriate passages for the query. This strategy grants the model the dexterity to perform dynamic selection, a capability absent in traditional re-ranking, and surpasses simplistic manual thresholds for passage selection.

Empirical Evidence and Future Work

The experiments conducted validate BGM's efficacy across various tasks such as question-answering and personalized text generation, covering datasets from QA forums to personal emails. The bridge model showed impressive performance compared to strong existing retrievers and ranking-based models, stressing the potential of BGM as a significant enhancement in RAG applications.

This bridge approach opens pathways for future research to consider advancing bridge models that can adapt to varying LLM sizes, datasets, or generalize across different tasks without requiring specialized training.

Conclusion

In summary, the BGM framework presents a novel solution to a nuanced problem, effectively advancing the synergy between human-centered information retrieval methods and the operational preferences of LLMs. By identifying and addressing the preference gap, BGM not only fosters a deeper comprehension of RAG systems but also extends the functionality and efficiency of AI in processing and generating human-like language responses.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (34)
  1. Boosting search engines with interactive agents. arXiv preprint arXiv:2109.00527.
  2. Palm 2 technical report. CoRR, abs/2305.10403.
  3. Rraml: Reinforced retrieval augmented machine learning. arXiv preprint arXiv:2307.12798.
  4. Improving language models by retrieving from trillions of tokens. In International conference on machine learning, pages 2206–2240. PMLR.
  5. Language models are few-shot learners. In Advances in Neural Information Processing Systems.
  6. Pre-computed memory or on-the-fly encoding? a hybrid approach to retrieval augmentation makes the most of your compute. In International Conference on Machine Learning, pages 7329–7342. PMLR.
  7. Glimmer: generalized late-interaction memory reranker. arXiv preprint arXiv:2306.10231.
  8. Retrieval augmented language model pre-training. In International conference on machine learning, pages 3929–3938. PMLR.
  9. Unsupervised dense information retrieval with contrastive learning. arXiv preprint arXiv:2112.09118.
  10. Gautier Izacard and Edouard Grave. 2020. Leveraging passage retrieval with generative models for open domain question answering. arXiv preprint arXiv:2007.01282.
  11. Few-shot learning with retrieval augmented language models. arXiv preprint arXiv:2208.03299.
  12. Dense passage retrieval for open-domain question answering. arXiv preprint arXiv:2004.04906.
  13. Generalization through memorization: Nearest neighbor language models. In International Conference on Learning Representations.
  14. Natural questions: A benchmark for question answering research. Transactions of the Association for Computational Linguistics, 7:452–466.
  15. Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems, 33:9459–9474.
  16. Teach llms to personalize–an approach inspired by writing education. arXiv preprint arXiv:2308.07968.
  17. Lost in the middle: How language models use long contexts. arXiv preprint arXiv:2307.03172.
  18. Justifying recommendations using distantly-labeled reviews and fine-grained aspects. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 188–197, Hong Kong, China. Association for Computational Linguistics.
  19. Large dual encoders are generalizable retrievers. arXiv preprint arXiv:2112.07899.
  20. Rodrigo Nogueira and Kyunghyun Cho. 2017. Task-oriented query reformulation with reinforcement learning. arXiv preprint arXiv:1704.04572.
  21. Avocado research email collection. Philadelphia: Linguistic Data Consortium.
  22. Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 21(1):5485–5551.
  23. Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res.
  24. Large language models can be easily distracted by irrelevant context. In International Conference on Machine Learning, pages 31210–31227. PMLR.
  25. Replug: Retrieval-augmented black-box language models. arXiv preprint arXiv:2301.12652.
  26. Reinforcement learning to rank with markov decision process. In Proceedings of the 40th international ACM SIGIR conference on research and development in information retrieval.
  27. Memorizing transformers. arXiv preprint arXiv:2203.08913.
  28. Conqrr: Conversational query rewriting for retrieval with reinforcement learning. arXiv preprint arXiv:2112.08558.
  29. Adapting markov decision process for search result diversification. In Proceedings of the 40th international ACM SIGIR conference on research and development in information retrieval.
  30. Recomp: Improving retrieval-augmented lms with compression and selective augmentation. arXiv preprint arXiv:2310.04408.
  31. Reinforcement learning to rank with pairwise policy gradient. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 509–518.
  32. Hotpotqa: A dataset for diverse, explainable multi-hop question answering. In EMNLP.
  33. Retrieval-augmented multimodal language modeling.
  34. Multi page search with reinforcement learning to rank. In Proceedings of the 2018 ACM SIGIR international conference on theory of information retrieval, pages 175–178.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Zixuan Ke (26 papers)
  2. Weize Kong (7 papers)
  3. Cheng Li (1094 papers)
  4. Mingyang Zhang (56 papers)
  5. Qiaozhu Mei (68 papers)
  6. Michael Bendersky (63 papers)
Citations (17)