A Hybrid RAG System with Comprehensive Enhancement on Complex Reasoning
The paper "A Hybrid RAG System with Comprehensive Enhancement on Complex Reasoning" presents an evolved Retrieval-Augmented Generation (RAG) system aimed at addressing the complexities of contemporary question-answering tasks by employing LLMs. This work originates from the Meta CRAG KDD Cup 2024, where the system was rigorously evaluated. The authors have demonstrated significant improvements in RAG capabilities, particularly in scenarios requiring complex reasoning.
System Overview
The developed RAG system incorporates several critical modules, including:
- Web Page Processing: This module refines the text chunks and tables from web pages to enhance reference quality. Tools like Trafilatura and BeautifulSoup were used for text extraction, followed by Blingfire for sentence segmentation. Tables, often noisy and misunderstood in standard processing, were converted to Markdown format to leverage the model's familiarity with this structure.
- Attribute Predictor: This module determines the type and temporal nature of questions, essential for appropriate handling in subsequent stages. Given the variability in questions—from simple factual queries to dynamic and multi-hop ones—this module fine-tuned responses, minimizing incorrect answers on evolving topics by opting for a conservative "I don't know".
- Numerical Calculator: To tackle the hallucination of LLMs in numerical computations, an external Python interpreter was employed. The LLM generated Python expressions for calculations, significantly enhancing accuracy in answers requiring precise arithmetic operations.
- LLM Knowledge Extractor: Leveraging the inherent knowledge within LLMs, this module extracts relevant information from the model's own parameters, providing an additional layer of references that mitigate the reliance on possibly outdated or irrelevant retrieved documents.
- Knowledge Graph Module: This module leverages structured information from a knowledge graph, with a focus on improving query generation through a function-calling method, though this fell short of expectations in practical performance.
- Reasoning Module: The heart of the system's strength in complex reasoning, this module integrates information from various references and follows carefully designed prompts to direct LLMs to reason step-by-step. This approach showed effectiveness particularly in aggregation and multi-hop questions.
Experimental Evaluation
The system was extensively evaluated in local and online settings. In Task 1, improvements were quantified as follows:
- Correct Answer Ratio: Achieved 29.7%, a notable improvement over the baseline of 16.2%.
- Hallucination Reduction: Reduced hallucination to 13.9% from the baseline's 83.7%, enhancing overall response reliability.
- Final Score: The score improved to 15.8%, merging accuracy with conservatism in attributing "I don't know" to uncertain responses.
In the competition's final evaluation, the system secured a high rank in Task 1 despite underperformance in Tasks 2 and 3, attributed to less effective utilization of knowledge graph information. A detailed analysis revealed strengths in domains requiring complex reasoning, such as movies, music, and open topics, while finance and sports, requiring dynamic information handling, posed challenges.
Future Directions
Despite significant advancements, several aspects remain ripe for further optimization:
- Retrieval and Re-ranking: Implementing robust re-ranking models can refine initial retrieval results, particularly in complex settings of Task 3.
- Knowledge Graph Integration: Improving the dynamic function-calling method and optimizing prompts for KG queries should enhance performance in structured data retrieval.
- Table Handling: Introducing retrieval and structural query methods specifically designed for tables will mitigate noise and improve information coherence.
Conclusion
The developed system exemplifies substantial progress in the field of RAG, reflecting comprehensive enhancements in retrieval quality, numerical computation, and higher-order reasoning. These contributions have yielded a robust framework well-positioned for future explorations and practical deployments in dynamic and complex question-answering environments. The integration of the modules has demonstrated efficacy in significantly reducing error rates and enhancing performance, setting a strong foundation for continued advancements in the domain.