ImpRAG: Retrieval-Augmented Generation with Implicit Queries (2506.02279v1)

Published 2 Jun 2025 in cs.CL

Abstract: Retrieval-Augmented Generation (RAG) systems traditionally treat retrieval and generation as separate processes, requiring explicit textual queries to connect them. This separation can limit the ability of models to generalize across diverse tasks. In this work, we propose a query-free RAG system, named ImpRAG, which integrates retrieval and generation into a unified model. ImpRAG allows models to implicitly express their information needs, eliminating the need for human-specified queries. By dividing pretrained decoder-only LLMs into specialized layer groups, ImpRAG optimizes retrieval and generation tasks simultaneously. Our approach employs a two-stage inference process, using the same model parameters and forward pass for both retrieval and generation, thereby minimizing the disparity between retrievers and LLMs. Experiments on 8 knowledge-intensive tasks demonstrate that ImpRAG achieves 3.6-11.5 improvements in exact match scores on unseen tasks with diverse formats, highlighting its effectiveness in enabling models to articulate their own information needs and generalize across tasks. Our analysis underscores the importance of balancing retrieval and generation parameters and leveraging generation perplexities as retrieval training objectives for enhanced performance.

Summary

The paper presents ImpRAG, a novel framework that integrates retrieval and generation using implicit queries, eliminating the need for explicit search inputs.
It employs a decoder-only model with distinct layers for retrieval, reading, and generation to optimize memory usage and overall efficiency.
Empirical results show up to 11.5-point gains in exact match and significant retrieval recall improvements across eight tasks, highlighting its practical impact.

ImpRAG: Enhancing Retrieval-Augmented Generation through Implicit Queries

This paper presents ImpRAG, a novel framework for Retrieval-Augmented Generation (RAG) systems, designed to integrate retrieval and generation processes into a single, cohesive model without relying on explicit textual queries. Distinct from existing RAG models, which typically treat retrieval and generation as separate components requiring explicit query formulation, ImpRAG allows LLMs to articulate their information needs implicitly through an integrated approach. This potentially elevates the models' generalization capabilities across diverse, knowledge-intensive tasks.

Key Methodological Innovations

ImpRAG redefines the traditional RAG architecture by leveraging pretrained decoder-only LLMs, divided into distinct layer groups optimized for retrieval and generation tasks. Specifically, the bottom layers focus on retrieval, the middle layers act as readers, encoding cross-attention mechanisms for retrieved information, while the top layers disable cross-attention to optimize memory usage. This structured division facilitates a unified forward pass that bridges the gap between retrieval and LLMing tasks.

Training in ImpRAG adheres to a two-stage process:

Warmup Stage: Initializes retrieval capabilities using pseudo labels generated by an established retriever.
Self-Distillation Stage: Refines retrieval skills by leveraging the generation perplexity to train retrieval objectives, encouraging models to improve retrieval efficacy based on their own context understanding.

Results and Observations

Evaluated across eight knowledge-intensive tasks—including question answering, entity linking, relation extraction, and fact checking—the ImpRAG framework exhibits significant improvements over baseline models such as RA-DIT and RA-IT. Notably, tasks that diverge considerably from standard input formats, such as T-Rex and AIDA, demonstrated the most substantial performance enhancements. Improvement metrics included exact match score increases of 3.6-11.5 points and retrieval recall enhancements of 5.0-23.2 points.

Analyses also highlighted the critical nature of optimal layer division, balancing retrieval and generation parameter allocation, and the instructive value of specific datasets in enriching retrieval capabilities. Moreover, the effectiveness of using generation perplexities in retrieval training objectives was recognized, further exhibiting the utility of an integrated model framework in knowledge transfer.

Implications and Future Directions

The results suggest that ImpRAG’s integrated approach leads to substantial advancements in both retrieval and generation tasks, providing a more seamless experience for unseen and varied task formats. This integration may pave the way for developing AI systems that require less human intervention for query formulation, ultimately minimizing errors that arise from manual query design and enhancing adaptability.

Nevertheless, this approach’s current focus on single-pass retrieval remains a limitation, as complex reasoning tasks could potentially benefit from iterative retrieval methods. Future research could explore extending ImpRAG to support iterative and multi-hop retrieval scenarios, as well as participatory validation with a wider variety of model families to assess architectural adaptability. Additionally, the framework’s reliance on pseudo-labeled data during training poses opportunities for investigation into more robust supervision methods, such as integrating human-in-the-loop systems for enhanced model refinement.

In summary, ImpRAG represents a significant step forward in enhancing the synergy between retrieval and LLMs in RAG systems, advocating for a more unified and self-sufficient mechanism for addressing information-intensive tasks within AI research.

PDF Markdown

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Generate Now

Related Papers

Find Related Papers

Authors (5)

Tweets

https://twitter.com/_reachsumit/status/1930102541876113492