Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

TAT-QA: A Question Answering Benchmark on a Hybrid of Tabular and Textual Content in Finance (2105.07624v2)

Published 17 May 2021 in cs.CL and cs.AI

Abstract: Hybrid data combining both tabular and textual content (e.g., financial reports) are quite pervasive in the real world. However, Question Answering (QA) over such hybrid data is largely neglected in existing research. In this work, we extract samples from real financial reports to build a new large-scale QA dataset containing both Tabular And Textual data, named TAT-QA, where numerical reasoning is usually required to infer the answer, such as addition, subtraction, multiplication, division, counting, comparison/sorting, and the compositions. We further propose a novel QA model termed TAGOP, which is capable of reasoning over both tables and text. It adopts sequence tagging to extract relevant cells from the table along with relevant spans from the text to infer their semantics, and then applies symbolic reasoning over them with a set of aggregation operators to arrive at the final answer. TAGOPachieves 58.0% inF1, which is an 11.1% absolute increase over the previous best baseline model, according to our experiments on TAT-QA. But this result still lags far behind performance of expert human, i.e.90.8% in F1. It is demonstrated that our TAT-QA is very challenging and can serve as a benchmark for training and testing powerful QA models that address hybrid form data.

Overview of TAT-QA: A Benchmark for Question Answering on Hybrid Financial Data

The paper presents TAT-QA, a novel question answering (QA) benchmark that addresses the challenge of hybrid data consisting of both tabular and textual content, specifically within the finance domain. This research fills a significant gap in existing QA systems, which traditionally focus on either unstructured text or structured/semi-structured data like tables or knowledge bases independently.

Dataset Characteristics

TAT-QA distinguishes itself by simulating real-world scenarios where financial reports frequently interlace tabular data with explanatory or complementary text. The dataset is derived from authentic financial documents, comprising 2,757 hybrid contexts extracted from 182 financial reports, totaling 16,552 question-answer pairs. The contexts are constructed meticulously to include both complex table structures and multiple descriptive paragraphs, offering a platform for evaluating QA models that require both understanding of numerical data and the ability to perform nuanced reasoning tasks such as arithmetic and comparison.

Methodological Contributions

The paper introduces TagOp, a novel QA model specifically designed to tackle the challenges presented by TAT-QA. TagOp utilizes sequence tagging for evidence extraction and symbolic reasoning with aggregation operators to synthesize both table cells and text spans into coherent answers. This approach augments traditional QA models by incorporating numerical reasoning, crucial for comprehending and manipulating financial data. TagOp demonstrates a significant improvement over existing benchmarks, achieving 58.0% on the 1 metric, compared to the previous best baseline at 46.9%, highlighting its enhanced capability in handling complex hybrid data.

Experimental Analysis

The dataset and model provide robust benchmarks for evaluating QA systems' performance on hybrid data. TagOp's performance, while significant, still lags behind human capability, which achieves 90.8% in 1, signifying the complexity and challenge embodied by TAT-QA. The detailed error analysis presented in the paper identifies wrong evidence extraction and inadequate evidence as primary sources of errors, suggesting areas for future improvement.

Implications and Future Work

The introduction of TAT-QA sets the stage for further developments in hybrid QA systems. The dataset and TagOp model serve not only as a benchmarking foundation but also as a stimulus for innovative research in the integration of numerical reasoning and cross-modal understanding in QA tasks. Future research could extend beyond financial data, applying the insights gained to broader applications, and incorporating domain-specific knowledge into QA systems to further enhance accuracy and comprehension in specialized fields.

In conclusion, TAT-QA represents a significant advancement in the paper of QA systems, encouraging the development of algorithms capable of interacting with complex, real-world data structures. The benchmark stands as a critical resource for researchers aiming to push the boundaries of what current AI models can achieve in understanding and reasoning over hybrid data contexts.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Fengbin Zhu (19 papers)
  2. Wenqiang Lei (66 papers)
  3. Youcheng Huang (9 papers)
  4. Chao Wang (555 papers)
  5. Shuo Zhang (256 papers)
  6. Jiancheng Lv (99 papers)
  7. Fuli Feng (143 papers)
  8. Tat-Seng Chua (359 papers)
Citations (219)