A Hybrid RAG System with Comprehensive Enhancement on Complex Reasoning (2408.05141v3)

Published 9 Aug 2024 in cs.CL and cs.IR

Abstract: Retrieval-augmented generation (RAG) is a framework enabling LLMs to enhance their accuracy and reduce hallucinations by integrating external knowledge bases. In this paper, we introduce a hybrid RAG system enhanced through a comprehensive suite of optimizations that significantly improve retrieval quality, augment reasoning capabilities, and refine numerical computation ability. We refined the text chunks and tables in web pages, added attribute predictors to reduce hallucinations, conducted LLM Knowledge Extractor and Knowledge Graph Extractor, and finally built a reasoning strategy with all the references. We evaluated our system on the CRAG dataset through the Meta CRAG KDD Cup 2024 Competition. Both the local and online evaluations demonstrate that our system significantly enhances complex reasoning capabilities. In local evaluations, we have significantly improved accuracy and reduced error rates compared to the baseline model, achieving a notable increase in scores. In the meanwhile, we have attained outstanding results in online assessments, demonstrating the performance and generalization capabilities of the proposed system. The source code for our system is released in \url{https://gitlab.aicrowd.com/shizueyy/crag-new}.

PDF HTML Abstract

A Hybrid RAG System with Comprehensive Enhancement on Complex Reasoning

The paper "A Hybrid RAG System with Comprehensive Enhancement on Complex Reasoning" presents an evolved Retrieval-Augmented Generation (RAG) system aimed at addressing the complexities of contemporary question-answering tasks by employing LLMs. This work originates from the Meta CRAG KDD Cup 2024, where the system was rigorously evaluated. The authors have demonstrated significant improvements in RAG capabilities, particularly in scenarios requiring complex reasoning.

System Overview

The developed RAG system incorporates several critical modules, including:

Web Page Processing: This module refines the text chunks and tables from web pages to enhance reference quality. Tools like Trafilatura and BeautifulSoup were used for text extraction, followed by Blingfire for sentence segmentation. Tables, often noisy and misunderstood in standard processing, were converted to Markdown format to leverage the model's familiarity with this structure.
Attribute Predictor: This module determines the type and temporal nature of questions, essential for appropriate handling in subsequent stages. Given the variability in questions—from simple factual queries to dynamic and multi-hop ones—this module fine-tuned responses, minimizing incorrect answers on evolving topics by opting for a conservative "I don't know".
Numerical Calculator: To tackle the hallucination of LLMs in numerical computations, an external Python interpreter was employed. The LLM generated Python expressions for calculations, significantly enhancing accuracy in answers requiring precise arithmetic operations.
LLM Knowledge Extractor: Leveraging the inherent knowledge within LLMs, this module extracts relevant information from the model's own parameters, providing an additional layer of references that mitigate the reliance on possibly outdated or irrelevant retrieved documents.
Knowledge Graph Module: This module leverages structured information from a knowledge graph, with a focus on improving query generation through a function-calling method, though this fell short of expectations in practical performance.
Reasoning Module: The heart of the system's strength in complex reasoning, this module integrates information from various references and follows carefully designed prompts to direct LLMs to reason step-by-step. This approach showed effectiveness particularly in aggregation and multi-hop questions.

Experimental Evaluation

The system was extensively evaluated in local and online settings. In Task 1, improvements were quantified as follows:

Correct Answer Ratio: Achieved 29.7%, a notable improvement over the baseline of 16.2%.
Hallucination Reduction: Reduced hallucination to 13.9% from the baseline's 83.7%, enhancing overall response reliability.
Final Score: The score improved to 15.8%, merging accuracy with conservatism in attributing "I don't know" to uncertain responses.

In the competition's final evaluation, the system secured a high rank in Task 1 despite underperformance in Tasks 2 and 3, attributed to less effective utilization of knowledge graph information. A detailed analysis revealed strengths in domains requiring complex reasoning, such as movies, music, and open topics, while finance and sports, requiring dynamic information handling, posed challenges.

Future Directions

Despite significant advancements, several aspects remain ripe for further optimization:

Retrieval and Re-ranking: Implementing robust re-ranking models can refine initial retrieval results, particularly in complex settings of Task 3.
Knowledge Graph Integration: Improving the dynamic function-calling method and optimizing prompts for KG queries should enhance performance in structured data retrieval.
Table Handling: Introducing retrieval and structural query methods specifically designed for tables will mitigate noise and improve information coherence.

Conclusion

The developed system exemplifies substantial progress in the field of RAG, reflecting comprehensive enhancements in retrieval quality, numerical computation, and higher-order reasoning. These contributions have yielded a robust framework well-positioned for future explorations and practical deployments in dynamic and complex question-answering environments. The integration of the modules has demonstrated efficacy in significantly reducing error rates and enhancing performance, setting a strong foundation for continued advancements in the domain.

PDF Markdown Bookmark Chat (Pro)

Authors (6)

Ye Yuan (274 papers)
Chengwu Liu (6 papers)
Jingyang Yuan (14 papers)
Gongbo Sun (4 papers)
Siqi Li (60 papers)
Ming Zhang (313 papers)

A Hybrid RAG System with Comprehensive Enhancement on Complex Reasoning (2408.05141v3)