ECoRAG: Evidentiality-Guided Compression for Long Context RAG
The paper presents ECoRAG, a framework designed to enhance Retrieval-Augmented Generation (RAG) systems in handling long contexts by introducing a mechanism for evidentiality-guided compression. The primary objective of ECoRAG is to improve Open-Domain Question Answering (ODQA) by efficiently compressing retrieved documents, ensuring that the LLMs generate answers based on necessary and correct evidence. This approach mitigates the issues of latency and computational cost associated with processing lengthy contexts, while also improving the accuracy of the generated answers.
Evidentiality-Guided Compression
The traditional RAG framework employs external document retrieval to aid LLMs in ODQA tasks. However, the increased context length from retrieved documents often introduces non-evidential information, detracting from answer quality and increasing computational costs. Unlike prior methods, ECoRAG specifically filters out non-evidential content by prioritizing sentences that directly support the generation of the correct answer. The framework defines evidentiality hierarchically, distinguishing between strong and weak evidence and identifying distractors that could mislead the LLM.
To achieve this, ECoRAG utilizes a dual encoder approach, distinguishing significant contributions of individual sentences to the final answer. The model's compressor is trained with evidentiality labels obtained from prior LLM-generated evaluations, optimizing it to rank sentences based on their evidentiality effectively.
Adaptive Compression and Efficient Retrieval
ECoRAG introduces a unique feature called evidentiality reflection, employing an evidentiality evaluator to assess whether the current compression sufficiently supports the answer generation. Using a lightweight model for fast evaluation, the framework adaptively adjusts the compression by iteratively adding more evidence until the compressed content meets the evidentiality threshold necessary for accurate answer generation. This process ensures that not only is the needed evidence retained, but also minimizes latency by avoiding excessive iterations.
Experimental Validation and Implications
The framework's efficacy was demonstrated across multiple datasets, including NQ, TQA, and WQ. ECoRAG consistently outperformed other compression methods, such as RECOMP and LLMLingua, in both EM and F1 scores, while reducing the token count significantly. Its ability to reduce both the token usage and latency demonstrates its practical benefits in enhancing RAG systems' efficiency, especially in scenarios involving extensive retrievable sources.
Practical and Theoretical Implications
Practically, ECoRAG offers a robust solution for improving the efficiency and accuracy of LLMs employed in domains requiring long-context understanding, such as ODQA. Theoretically, this framework advances understanding of how evidentiality can guide the compression of informational context, highlighting the importance of dynamic evidence assessment. It encourages future work in AI models to incorporate evidentiality as a core component for optimizing task-specific performances.
In conclusion, ECoRAG exemplifies a significant step in addressing the challenges faced by RAG systems in managing long contexts, contributing to more efficient and accurate LLM performances. This research opens avenues for applying evidentiality-guided processes to broader AI applications beyond ODQA, suggesting a potentially transformative approach to context handling in LLMs.