T$^2$-RAGBench: Text-and-Table Benchmark for Evaluating Retrieval-Augmented Generation (2506.12071v1)

Published 4 Jun 2025 in cs.IR

Abstract: While most financial documents contain a combination of textual and tabular information, robust Retrieval-Augmented Generation (RAG) systems are essential for effectively accessing and reasoning over such content to perform complex numerical tasks. This paper introduces T$^2$-RAGBench, a benchmark comprising 32,908 question-context-answer triples, designed to evaluate RAG methods on real-world financial data. Unlike typical QA datasets that operate under Oracle-context settings, where the relevant context is explicitly provided, T$^2$-RAGBench challenges models to first retrieve the correct context before conducting numerical reasoning. Existing QA datasets involving text and tables typically contain context-dependent questions, which may yield multiple correct answers depending on the provided context. To address this, we transform these datasets into a context-independent format, enabling reliable RAG evaluation. We conduct a comprehensive evaluation of popular RAG methods. Our analysis identifies Hybrid BM25, a technique that combines dense and sparse vectors, as the most effective approach for text-and-table data. However, results demonstrate that T$^2$-RAGBench remains challenging even for SOTA LLMs and RAG methods. Further ablation studies examine the impact of embedding models and corpus size on retrieval performance. T$^2$-RAGBench provides a realistic and rigorous benchmark for existing RAG methods on text-and-table data. Code and dataset are available online.

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Tweets

https://twitter.com/_reachsumit/status/1934859717991698584

https://twitter.com/_reachsumit/status/1934859269041738195

T$^2$-RAGBench: Text-and-Table Benchmark for Evaluating Retrieval-Augmented Generation (2506.12071v1)

Summary

Related Papers

Tweets