Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

ReaRAG: Knowledge-guided Reasoning Enhances Factuality of Large Reasoning Models with Iterative Retrieval Augmented Generation (2503.21729v3)

Published 27 Mar 2025 in cs.CL and cs.AI

Abstract: Large Reasoning Models (LRMs) exhibit remarkable reasoning abilities but rely primarily on parametric knowledge, limiting factual accuracy. While recent works equip reinforcement learning (RL)-based LRMs with retrieval capabilities, they suffer from overthinking and lack robustness in reasoning, reducing their effectiveness in question answering (QA) tasks. To address this, we propose ReaRAG, a factuality-enhanced reasoning model that explores diverse queries without excessive iterations. Our solution includes a novel data construction framework with an upper bound on the reasoning chain length. Specifically, we first leverage an LRM to generate deliberate thinking, then select an action from a predefined action space (Search and Finish). For Search action, a query is executed against the RAG engine, where the result is returned as observation to guide reasoning steps later. This process iterates until a Finish action is chosen. Benefiting from ReaRAG's strong reasoning capabilities, our approach outperforms existing baselines on multi-hop QA. Further analysis highlights its strong reflective ability to recognize errors and refine its reasoning trajectory. Our study enhances LRMs' factuality while effectively integrating robust reasoning for Retrieval-Augmented Generation (RAG).

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Zhicheng Lee (3 papers)
  2. Shulin Cao (23 papers)
  3. Jinxin Liu (49 papers)
  4. Jiajie Zhang (30 papers)
  5. Weichuan Liu (4 papers)
  6. Xiaoyin Che (5 papers)
  7. Lei Hou (127 papers)
  8. Juanzi Li (144 papers)

Summary

ReaRAG: A Methodology for Improving Factual Accuracy in Large Reasoning Models

The paper at hand introduces ReaRAG, a methodology aimed at improving the factual accuracy of Large Reasoning Models (LRMs) by integrating knowledge-guided reasoning into the Retrieval-Augmented Generation (RAG) framework. ReaRAG, developed by a collaboration between researchers at Tsinghua University and Siemens AG, addresses key challenges in the integration of external knowledge sources with LRMs to enhance reasoning robustness and factuality, especially in multi-hop question-answering tasks.

Large Reasoning Models inherently possess formidable reasoning abilities attributed to their parametric knowledge. However, this reliance often limits their capacity for factual accuracy, as these models struggle to retrieve and incorporate external contextual information. Previous attempts to augment LRMs with retrieval functionalities through iterative query strategies have seen limited success. Such methods often suffer from propagation errors and robustness issues, detracting from their ability to effectively solve tasks that require reasoning over multiple steps or hops.

The proposed ReaRAG model seeks to mitigate these challenges by methodically constructing reasoning chains initialized by the LRM and augmented by knowledge retrieved from external sources. The authors introduce a novel data construction framework that drives this process, explicitly capping the reasoning chain length to prevent redundancy and inefficiency. The research highlights the Thought-Action-Observation paradigm operational within ReaRAG, whereby the model reflects critically on previously gathered knowledge before deciding on its next move. This framework enables ReaRAG to iteratively execute search actions, terminate searches judiciously, and self-correct when errors are detected.

Evaluative experiments conducted on four multi-hop QA benchmarks—MuSiQue, HotpotQA, IIRC, and Natural Questions (NQ)—demonstrated ReaRAG's superior performance over existing baselines. Notably, ReaRAG outperformed these alternatives by substantial margins. For instance, it yielded a significant performance boost on MuSiQue and HotpotQA benchmarks, signifying its capability to efficiently integrate LRMs' reasoning aptitude with external knowledge retrieval. However, it was observed that ReaRAG did not significantly outdo competitors on single-hop NQ datasets, illustrating the limited benefits of enhanced reasoning in simpler question-answering scenarios.

The authors identify some limitations in current state-of-the-art retrieval-augmented LLMs, such as Search-o1, which rely heavily on base models generating retrieval-specific tokens. ReaRAG surmounts such a hurdle by disentangling complex reasoning processes from token generation issues, thereby achieving greater robustness and accuracy. Furthermore, ReaRAG effectively circumvents the tendency of RL-based approaches to overthink, providing efficacious and streamlined reasoning pathways for multi-hop tasks.

The results imply compelling advancements in both theoretical and practical realms. By strengthening the link between reasoning models and dynamic knowledge bases, ReaRAG opens avenues for future research focused on task-specific fine-tuning and adaptive dynamic querying. Potentials abound for similar methodologies to be employed in interactive AI systems necessitating both deep reasoning and real-time knowledge integration, such as advanced collaborative and assistive technologies, intelligent tutoring systems, and dynamic decision-support frameworks.

ReaRAG's systematic method for constructing and fine-tuning models on dedicated datasets exemplifies a robust approach to tackling persistent challenges within AI deployments in knowledge-intensive environments, showcasing fertile ground for future exploration within the intersection of LLM development and retrieval-augmented methodologies.