EfficientRAG: Efficient Retriever for Multi-Hop Question Answering (2408.04259v2)

Published 8 Aug 2024 in cs.CL and cs.AI

Abstract: Retrieval-augmented generation (RAG) methods encounter difficulties when addressing complex questions like multi-hop queries. While iterative retrieval methods improve performance by gathering additional information, current approaches often rely on multiple calls of LLMs. In this paper, we introduce EfficientRAG, an efficient retriever for multi-hop question answering. EfficientRAG iteratively generates new queries without the need for LLM calls at each iteration and filters out irrelevant information. Experimental results demonstrate that EfficientRAG surpasses existing RAG methods on three open-domain multi-hop question-answering datasets.

PDF HTML Abstract

EfficientRAG: Efficient Retriever for Multi-Hop Question Answering

The paper "EfficientRAG: Efficient Retriever for Multi-Hop Question Answering" introduces a novel approach to enhance retrieval-augmented generation (RAG) methods for handling complex multi-hop question answering tasks. EfficientRAG aims to address the inefficiencies in existing iterative retrieval methods, which often rely heavily on multiple calls to LLMs, resulting in increased latency and cost.

Background and Challenges

RAG techniques have become pivotal in augmenting pre-trained LLMs, enabling them to retrieve relevant information from external resources to ground generated responses. Despite their success, one-round retrieval methods often fall short when dealing with multi-hop questions that require information beyond the initial query. Iterative retrieval methods improve upon this by performing multiple rounds of retrieval or reasoning. However, these approaches are not without their limitations:

Dependence on multiple LLM calls per iteration.
Necessity for dedicated prompting and few-shot examples that may need updating across different scenarios.

Contributions of EfficientRAG

EfficientRAG introduces an innovative framework to mitigate the reliance on LLMs for query generation in multi-hop question answering. The framework consists of two lightweight components: the Labeler and the Filter.

Labeler: It annotates relevant tokens in retrieved documents, identifying and tagging them as <Terminate> or <Continue> based on their usefulness in answering the query.
Filter: It refines the query by incorporating new information extracted from relevant tokens, thus enhancing the subsequent retrieval process.

The primary advantage of EfficientRAG is its ability to iteratively generate new queries while maintaining a high recall with minimal chunks retrieved, thereby boosting efficiency and reducing costs.

Experimental Setup

The empirical paper evaluates EfficientRAG on three open-domain multi-hop question-answering datasets: HotpotQA, MuSiQue, and 2WikiMQA. Various experiments were conducted to analyze the retrieval performance and end-to-end QA performance.

LLM-Based Generators: Multiple settings were tested using GPT-3.5, GPT-4, and Llama-3-8B, highlighting the importance of retrieval in improving answer accuracy.
Query Decomposition Approaches: The paper assessed how different query decomposition methods—LLM Decompose and EfficientRAG Decompose—impact retrieval performance and efficiency.

Key Results

Retrieval Performance

EfficientRAG demonstrated high recall rates with significantly fewer chunks retrieved compared to baseline methods.

HotpotQA: Recall of 81.84 with 6.41 chunks.
MuSiQue: Lower performance attributed to dataset complexity and fewer retrieved chunks.
2WikiMQA: Recall of 84.08 with 3.69 chunks.

End-to-End QA Performance

EfficientRAG achieved promising results, showcasing high accuracy comparable to LLM-based baselines:

HotpotQA: Achieved the highest accuracy on both Exact Match (EM) and F1 metrics.
MuSiQue: Performed well despite lower recall, indicating robust handling of noisy inputs.
2WikiMQA: Notably raised accuracy, outperforming LLM-based systems significantly.

Efficiency Evaluation

EfficientRAG displayed substantial improvements in efficiency, reducing latency by up to 80% compared to other iterative methods while maintaining similar GPU utilization.

Implications and Future Work

EfficientRAG presents a significant step forward in making multi-hop question answering more efficient and cost-effective. The ability to generate high-quality queries without relying on multiple LLM calls could pave the way for more scalable and adaptable RAG systems. Future work could explore the application of EfficientRAG in domain-specific settings and further optimization of the framework to handle increasingly complex queries.

The paper underlines the potential of lightweight retrieval models to enhance the efficiency and accuracy of RAG methods, potentially making them more accessible and practical for a wider range of applications in AI and beyond.

PDF Markdown Bookmark Chat (Pro)

Authors (10)

Ziyuan Zhuang (4 papers)
Zhiyang Zhang (9 papers)
Sitao Cheng (10 papers)
Fangkai Yang (45 papers)
Jia Liu (369 papers)
Shujian Huang (106 papers)
Qingwei Lin (81 papers)
Saravan Rajmohan (85 papers)
Dongmei Zhang (193 papers)
Qi Zhang (784 papers)

Citations (2)

View on Semantic Scholar

Related Papers

Find Related Papers

Tweets

https://twitter.com/omarsar0/status/1822744591810114044

https://twitter.com/IntuitMachine/status/1824765444810563783

https://twitter.com/_reachsumit/status/1821740904786620787

https://twitter.com/fly51fly/status/1822026159145795726

https://twitter.com/gm8xx8/status/1821733650792100180

https://twitter.com/realmofresearch/status/1822549838938169609