Multi-Meta-RAG: Improving RAG for Multi-Hop Queries using Database Filtering with LLM-Extracted Metadata (2406.13213v2)

Published 19 Jun 2024 in cs.CL, cs.AI, and cs.DB

Abstract: The retrieval-augmented generation (RAG) enables retrieval of relevant information from an external knowledge source and allows LLMs to answer queries over previously unseen document collections. However, it was demonstrated that traditional RAG applications perform poorly in answering multi-hop questions, which require retrieving and reasoning over multiple elements of supporting evidence. We introduce a new method called Multi-Meta-RAG, which uses database filtering with LLM-extracted metadata to improve the RAG selection of the relevant documents from various sources, relevant to the question. While database filtering is specific to a set of questions from a particular domain and format, we found out that Multi-Meta-RAG greatly improves the results on the MultiHop-RAG benchmark. The code is available at https://github.com/mxpoliakov/Multi-Meta-RAG.

PDF HTML Abstract

Multi-Meta-RAG: Enhancing RAG for Multi-Hop Queries

The paper "Multi-Meta-RAG: Improving RAG for Multi-Hop Queries using Database Filtering with LLM-Extracted Metadata" addresses the persistent challenge of efficiently managing retrieval-augmented generation (RAG) models to tackle multi-hop queries. Despite the established use of RAG in LLMs for accessing previously unseen documents, traditional RAG tools often struggle with multi-hop questions requiring the aggregation of multiple sources of evidence.

Core Proposal and Methodology

Multi-Meta-RAG proposes an innovative approach by integrating database filtering bolstered with LLM-extracted metadata. This enhancement aims to refine the retrieval of relevant documents across diverse knowledge sources pertinent to the query. Their method centers around leveraging metadata extraction as a pivotal step to substantially advance the retrieval accuracy and response fidelity in RAG systems, specifically within the multi-hop query domain.

The methodology involves extracting metadata via LLMs to create granular filters comprising article sources and publication dates. By applying these filters, the retrieval process becomes more precise, ensuring that only the most relevant document chunks are considered for response generation.

Experimental Results

The experimental evaluation demonstrated significant performance gains. For instance, retrieval metrics such as Mean Average Precision (MAP), Mean Reciprocal Rank (MRR), and Hit Rate showed marked improvement. Specifically, the use of metadata filtering yielded a 17.2% increase in Hits@4 for voyage-02 compared to baseline systems.

Moreover, when using LLMs such as GPT-4 and Google PaLM, the paper reports substantial enhancements in accuracy. The Google PaLM model, in particular, exhibited a 25.6% improvement from baseline accuracy. Such results underscore the potential of Multi-Meta-RAG to optimize RAG implementations in real-world applications where query precision is crucial.

Theoretical and Practical Implications

Theoretically, Multi-Meta-RAG emphasizes the need for advanced metadata management within RAG models to address multi-hop challenges. The approach suggests that nuanced database interactions powered by metadata insights can substantially mitigate the inherent limitations of traditional methods that often overlook multi-source retrieval dynamics.

Practically, the implications are significant for applications that necessitate high accuracy in complex information retrieval tasks, such as those found in scientific research, legal documentation, and multi-source intelligence analysis. Multi-Meta-RAG demonstrates that incorporating metadata-driven filtering algorithms can benefit a wide array of industries reliant on precise and evidence-based information retrieval systems.

Future Directions

Despite its strengths, the paper acknowledges inherent limitations, particularly the dependency on domain-specific query structures and the need for tailored prompt templates for metadata extraction. Future research avenues may include exploring more adaptable prompt templates that can generalize across domains, as well as testing with LLMs on datasets with more contemporary knowledge bases.

Conclusion

Multi-Meta-RAG represents a noteworthy advancement in enhancing RAG systems for multi-hop queries through the use of LLM-extracted metadata and database filtering. By methodically integrating these elements, the research offers a promising avenue for robustly answering complex queries, thereby paving the way for more sophisticated and reliable information retrieval systems in AI.

PDF Markdown Bookmark Chat (Pro)

Authors (2)

Mykhailo Poliakov (1 paper)
Nadiya Shvai (6 papers)

Citations (3)

View on Semantic Scholar

Related Papers

Find Related Papers

GitHub

GitHub - mxpoliakov/Multi-Meta-RAG (19 stars)

Tweets

https://twitter.com/LangChainAI/status/1825234642037010518

YouTube

Show All Videos