Grounding by Trying: LLMs with Reinforcement Learning-Enhanced Retrieval (2410.23214v2)

Published 30 Oct 2024 in cs.LG and cs.AI

Abstract: The hallucinations of LLMs are increasingly mitigated by allowing LLMs to search for information and to ground their answers in real sources. Unfortunately, LLMs often struggle with posing the right search queries, especially when dealing with complex or otherwise indirect topics. Observing that LLMs can learn to search for relevant facts by $\textit{trying}$ different queries and learning to up-weight queries that successfully produce relevant results, we introduce $\underline{Le}$arning to $\underline{Re}$trieve by $\underline{T}$rying (LeReT), a reinforcement learning framework that explores search queries and uses preference-based optimization to improve their quality. LeReT can improve the absolute retrieval accuracy by up to 29% and the downstream generator evaluations by 17%. The simplicity and flexibility of LeReT allows it to be applied to arbitrary off-the-shelf retrievers and makes it a promising technique for improving general LLM pipelines. Project website: http://sherylhsu.com/LeReT/.

References (33)

Summary

The paper introduces LeReT, a reinforcement learning framework that refines query generation to boost retrieval accuracy and minimize hallucinations in LLM outputs.
By combining diverse query sampling with preference-based optimization, the method achieves up to 29% improvements in recall and 17% enhancements in generation quality.
The approach demonstrates that iterative, trial-oriented learning can enhance LLM grounding and transparency in complex multi-document reasoning tasks.

Analysis of "Grounding by Trying: LLMs with Reinforcement Learning-Enhanced Retrieval"

The paper "Grounding by Trying: LLMs with Reinforcement Learning-Enhanced Retrieval" presents a novel approach to improving how LLMs generate search queries to ground their outputs more reliably on verified sources. This approach addresses one of the significant challenges faced by LLMs: the tendency to hallucinate when tackling complex or nuanced topics not directly addressed by single-document retrieval. Traditional Retrieval-Augmented Generation (RAG) pipelines, while effective for straightforward queries, often falter when multi-hop reasoning is necessary. The authors propose Learning to Retrieve by Trying (LeReT), a reinforcement learning framework that enhances retrieval through iterative query improvements informed by preference-based optimization.

Key Contributions and Methodology

The LeReT framework leverages reinforcement learning (RL) to align query-generation strategies with criteria for successful information retrieval. It combines diverse search query sampling with preference-based optimization to refine and improve query generation, leading to substantial gains in retrieval performance. By utilizing a set of few-shot prompts optimized to induce diverse query generation, LeReT effectively explores the search space, enabling the discovery of higher-quality information.

The authors demonstrate the effectiveness of their approach through evaluations on widely recognized datasets like HotpotQA and HoVer. The framework improves retrieval accuracy by significantly high margins (up to 29% in recall for some setups) and offers up to 17% enhancements in downstream generation quality. This improvement manifests in both the retrieved sources' relevance and the leveraged LLM's ability to synthesize accurate responses.

LeReT's preference optimization involves constructing preference datasets from the results of diverse query sampling, followed by using identity policy optimization (IPO) to refine the model's retrieval capabilities. Such optimization enables grounding in high-quality retrievals, favoring LLMs that control access to information via complex pipelines. Furthermore, LeReT's adaptability allows it to function effectively across different retrieval models, making it conducive to various LLM pipeline architectures.

Implications and Future Directions

The proposed method furthers the integration of RL into LLM task formulations, illustrating that retrieval accuracy and the consequent grounding of answers are amenable to iterative, trial-oriented learning strategies. The paper's findings suggest that structured grounding strategies could lead to improved transparency in LLM outputs by making retrieval processes less opaque and more optimistically aligned with factual information. However, the primary reliance on direct supervision for retrieval could limit the general application, especially in scenarios where optimal document sets are not predefined. Thus, further exploration into indirect supervision methods or hybrid models that balance signal reliability and supervision accessibility appears necessary.

In conclusion, this paper underscores the potential of RL frameworks to effectively address challenges that arise when engaging LLMs in multi-document reasoning tasks. The adaptability demonstrated by LeReT hints at broader AI applications, where optimized retrieval mechanisms could become pivotal to enhancing the factual alignment and effectiveness of generative AI models. Future work might explore indirect supervision avenues or extend the framework to dynamically train retrievers, thereby enriching RL's role in constructing intelligent retrieval systems that cater to increasingly complex query demands.