Papers
Topics
Authors
Recent
2000 character limit reached

Grounding by Trying: LLMs with Reinforcement Learning-Enhanced Retrieval (2410.23214v2)

Published 30 Oct 2024 in cs.LG and cs.AI

Abstract: The hallucinations of LLMs are increasingly mitigated by allowing LLMs to search for information and to ground their answers in real sources. Unfortunately, LLMs often struggle with posing the right search queries, especially when dealing with complex or otherwise indirect topics. Observing that LLMs can learn to search for relevant facts by $\textit{trying}$ different queries and learning to up-weight queries that successfully produce relevant results, we introduce $\underline{Le}$arning to $\underline{Re}$trieve by $\underline{T}$rying (LeReT), a reinforcement learning framework that explores search queries and uses preference-based optimization to improve their quality. LeReT can improve the absolute retrieval accuracy by up to 29% and the downstream generator evaluations by 17%. The simplicity and flexibility of LeReT allows it to be applied to arbitrary off-the-shelf retrievers and makes it a promising technique for improving general LLM pipelines. Project website: http://sherylhsu.com/LeReT/.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (33)
  1. Self-RAG: Learning to retrieve, generate, and critique through self-reflection. In The Twelfth International Conference on Learning Representations, 2024. URL https://openreview.net/forum?id=hSyW5go0v8.
  2. A general theoretical paradigm to understand learning from human preferences, 2023. URL https://arxiv.org/abs/2310.12036.
  3. Rank analysis of incomplete block designs: I. the method of paired comparisons. Biometrika, 39(3/4):324–345, 1952.
  4. Reading wikipedia to answer open-domain questions. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp.  1870–1879, 2017.
  5. Google. Generative ai is coming to google search. https://blog.google/products/search/generative-ai-google-search-may-2024/, 2024.
  6. Retrieval augmented language model pre-training. In International conference on machine learning, pp.  3929–3938. PMLR, 2020.
  7. HoVer: A dataset for many-hop fact extraction and claim verification. In Trevor Cohn, Yulan He, and Yang Liu (eds.), Findings of the Association for Computational Linguistics: EMNLP 2020, pp.  3441–3460, Online, November 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.findings-emnlp.309. URL https://aclanthology.org/2020.findings-emnlp.309.
  8. Baleen: Robust multi-hop reasoning at scale via condensed retrieval. Advances in Neural Information Processing Systems, 34:27670–27682, 2021.
  9. Demonstrate-search-predict: Composing retrieval and language models for knowledge-intensive NLP. arXiv preprint arXiv:2212.14024, 2022.
  10. Dspy: Compiling declarative language model calls into self-improving pipelines. arXiv preprint arXiv:2310.03714, 2023.
  11. Internet-augmented dialogue generation. In Smaranda Muresan, Preslav Nakov, and Aline Villavicencio (eds.), Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp.  8460–8478, Dublin, Ireland, May 2022. Association for Computational Linguistics. doi: 10.18653/v1/2022.acl-long.579. URL https://aclanthology.org/2022.acl-long.579.
  12. Internet-augmented language models through few-shot prompting for open-domain question answering. arXiv preprint arXiv:2203.05115, 2022.
  13. Latent retrieval for weakly supervised open domain question answering. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp.  6086–6096, 2019.
  14. Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems, 33:9459–9474, 2020.
  15. OpenAI. Searchgpt is a prototype of new ai search features. https://openai.com/index/searchgpt-prototype/, 2024.
  16. Optimizing instructions and demonstrations for multi-stage language model programs, 2024. URL https://arxiv.org/abs/2406.11695.
  17. Training language models to follow instructions with human feedback. Advances in neural information processing systems, 35:27730–27744, 2022.
  18. PerplexityAI. Perplexity mai. https://www.perplexity.ai/, 2024.
  19. Measuring and narrowing the compositionality gap in language models, 2023. URL https://arxiv.org/abs/2210.03350.
  20. Scaling laws for reward model overoptimization in direct alignment algorithms. arXiv preprint arXiv:2406.02900, 2024a.
  21. Direct preference optimization: Your language model is secretly a reward model. Advances in Neural Information Processing Systems, 36, 2024b.
  22. Colbertv2: Effective and efficient retrieval via lightweight late interaction, 2022. URL https://arxiv.org/abs/2112.01488.
  23. Learning by distilling context, 2022. URL https://arxiv.org/abs/2209.15189.
  24. Trial and error: Exploration-based trajectory optimization for llm agents, 2024. URL https://arxiv.org/abs/2403.02502.
  25. Fine-tuning and prompt optimization: Two great steps that work better together, 2024. URL https://arxiv.org/abs/2407.10930.
  26. Learning to summarize with human feedback. Advances in Neural Information Processing Systems, 33:3008–3021, 2020.
  27. Musique: Multihop questions via single-hop question composition, 2022. URL https://arxiv.org/abs/2108.00573.
  28. Interleaving retrieval with chain-of-thought reasoning for knowledge-intensive multi-step questions, 2023. URL https://arxiv.org/abs/2212.10509.
  29. Answering complex open-domain questions with multi-hop dense retrieval. In International Conference on Learning Representations.
  30. Some things are more cringe than others: Iterative preference optimization with the pairwise cringe loss, 2024. URL https://arxiv.org/abs/2312.16682.
  31. HotpotQA: A dataset for diverse, explainable multi-hop question answering. In Ellen Riloff, David Chiang, Julia Hockenmaier, and Jun’ichi Tsujii (eds.), Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp.  2369–2380, Brussels, Belgium, October-November 2018. Association for Computational Linguistics. doi: 10.18653/v1/D18-1259. URL https://aclanthology.org/D18-1259.
  32. React: Synergizing reasoning and acting in language models, 2023. URL https://arxiv.org/abs/2210.03629.
  33. Slic-hf: Sequence likelihood calibration with human feedback. arXiv preprint arXiv:2305.10425, 2023.

Summary

  • The paper introduces LeReT, a reinforcement learning framework that refines query generation to boost retrieval accuracy and minimize hallucinations in LLM outputs.
  • By combining diverse query sampling with preference-based optimization, the method achieves up to 29% improvements in recall and 17% enhancements in generation quality.
  • The approach demonstrates that iterative, trial-oriented learning can enhance LLM grounding and transparency in complex multi-document reasoning tasks.

Analysis of "Grounding by Trying: LLMs with Reinforcement Learning-Enhanced Retrieval"

The paper "Grounding by Trying: LLMs with Reinforcement Learning-Enhanced Retrieval" presents a novel approach to improving how LLMs generate search queries to ground their outputs more reliably on verified sources. This approach addresses one of the significant challenges faced by LLMs: the tendency to hallucinate when tackling complex or nuanced topics not directly addressed by single-document retrieval. Traditional Retrieval-Augmented Generation (RAG) pipelines, while effective for straightforward queries, often falter when multi-hop reasoning is necessary. The authors propose Learning to Retrieve by Trying (LeReT), a reinforcement learning framework that enhances retrieval through iterative query improvements informed by preference-based optimization.

Key Contributions and Methodology

The LeReT framework leverages reinforcement learning (RL) to align query-generation strategies with criteria for successful information retrieval. It combines diverse search query sampling with preference-based optimization to refine and improve query generation, leading to substantial gains in retrieval performance. By utilizing a set of few-shot prompts optimized to induce diverse query generation, LeReT effectively explores the search space, enabling the discovery of higher-quality information.

The authors demonstrate the effectiveness of their approach through evaluations on widely recognized datasets like HotpotQA and HoVer. The framework improves retrieval accuracy by significantly high margins (up to 29% in recall for some setups) and offers up to 17% enhancements in downstream generation quality. This improvement manifests in both the retrieved sources' relevance and the leveraged LLM's ability to synthesize accurate responses.

LeReT's preference optimization involves constructing preference datasets from the results of diverse query sampling, followed by using identity policy optimization (IPO) to refine the model's retrieval capabilities. Such optimization enables grounding in high-quality retrievals, favoring LLMs that control access to information via complex pipelines. Furthermore, LeReT's adaptability allows it to function effectively across different retrieval models, making it conducive to various LLM pipeline architectures.

Implications and Future Directions

The proposed method furthers the integration of RL into LLM task formulations, illustrating that retrieval accuracy and the consequent grounding of answers are amenable to iterative, trial-oriented learning strategies. The paper's findings suggest that structured grounding strategies could lead to improved transparency in LLM outputs by making retrieval processes less opaque and more optimistically aligned with factual information. However, the primary reliance on direct supervision for retrieval could limit the general application, especially in scenarios where optimal document sets are not predefined. Thus, further exploration into indirect supervision methods or hybrid models that balance signal reliability and supervision accessibility appears necessary.

In conclusion, this paper underscores the potential of RL frameworks to effectively address challenges that arise when engaging LLMs in multi-document reasoning tasks. The adaptability demonstrated by LeReT hints at broader AI applications, where optimized retrieval mechanisms could become pivotal to enhancing the factual alignment and effectiveness of generative AI models. Future work might explore indirect supervision avenues or extend the framework to dynamically train retrievers, thereby enriching RL's role in constructing intelligent retrieval systems that cater to increasingly complex query demands.

Whiteboard

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 9 tweets with 119 likes about this paper.