Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

FinQA: A Dataset of Numerical Reasoning over Financial Data (2109.00122v3)

Published 1 Sep 2021 in cs.CL

Abstract: The sheer volume of financial statements makes it difficult for humans to access and analyze a business's financials. Robust numerical reasoning likewise faces unique challenges in this domain. In this work, we focus on answering deep questions over financial data, aiming to automate the analysis of a large corpus of financial documents. In contrast to existing tasks on general domain, the finance domain includes complex numerical reasoning and understanding of heterogeneous representations. To facilitate analytical progress, we propose a new large-scale dataset, FinQA, with Question-Answering pairs over Financial reports, written by financial experts. We also annotate the gold reasoning programs to ensure full explainability. We further introduce baselines and conduct comprehensive experiments in our dataset. The results demonstrate that popular, large, pre-trained models fall far short of expert humans in acquiring finance knowledge and in complex multi-step numerical reasoning on that knowledge. Our dataset -- the first of its kind -- should therefore enable significant, new community research into complex application domains. The dataset and code are publicly available\url{https://github.com/czyssrs/FinQA}.

FinQA: A Dataset of Numerical Reasoning over Financial Data

The paper introduces a dataset named FinQA, aimed at advancing the field of numerical reasoning within the financial domain. This development addresses the complex nature of financial data analysis by providing a substantial dataset of question-answer (Q&A) pairs constructed by financial experts. The principal contribution of FinQA is its focus on reasoning over both structured tables and unstructured text found within financial reports. Unlike prior datasets rooted in general queries, FinQA brings the challenge of domain-specific complexity to the forefront by incorporating numerical reasoning over rich financial data representations.

Overview of FinQA Dataset

FinQA is composed of 8,281 examples, sourced from earnings reports of S&P 500 companies spanning two decades. Each example includes a question, a corresponding numerical reasoning program, and annotated supporting facts. Notably, the dataset encompasses a diverse range of financial documents, with the questions requiring varying complexities of multi-step reasoning. This divergence from simpler, single-step operations seen in datasets like DROP provides an environment ripe for testing the capabilities of numerical reasoning models in handling real-world financial analyses.

Methodology and Baseline Framework

The authors employ a two-step model framework, FinQANet, which comprises a retriever to identify pertinent data points within a financial report and a generator to produce executable reasoning programs. For retrieval, the model converts table rows into sentence-like structures, allowing for efficient extraction of relevant facts, and employs pre-trained LLMs like BERT for classification. The generator is tasked with creating programs using a Domain-Specific Language (DSL), incorporating both mathematical and table operations, to derive answers from the retrieved data.

Additionally, several baseline methods are assessed, from simple TF-IDF based approaches to pre-trained Longformer applications. The comparisons reveal that FinQANet, particularly with large pre-trained models such as RoBERTa, substantially surpasses simpler and outdated models, although it still lags behind the performance of human experts.

Key Findings

The experiments underscore a substantial performance gap between current NLP models and human experts in complex financial Q&A tasks. FinQANet achieves an execution accuracy of 61.24%, which is higher than non-expert crowdsourcing workers but far from the 91.16% reached by financial domain experts. This discrepancy signals the intricate requirements of numerical reasoning within financial contexts and suggests that existing models need further improvements in incorporating financial knowledge and handling complex multi-step reasoning.

Implications and Future Directions

By establishing a challenging benchmark for numerical reasoning in finance, FinQA serves as a catalyst for future research in neural-symbolic reasoning and domain-specific model refinement. The paper suggests exploring new pre-training tasks focused on specialized domains to narrow the performance gap. Moreover, the intricacies revealed by the dataset—such as understanding financial table structures and unit conversions—call for innovations in the comprehension and reasoning capabilities of NLP models.

In conclusion, FinQA addresses a critical gap in the field of AI by providing a robust dataset aimed at enhancing numerical reasoning over complex financial data. The results and insights presented in the paper offer a pathway for further research, encouraging computational advancements in financial data analysis. With continued exploration and model developments, the FinQA dataset could significantly impact AI's role in automating and improving financial document analysis.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (11)
  1. Zhiyu Chen (60 papers)
  2. Wenhu Chen (134 papers)
  3. Charese Smiley (10 papers)
  4. Sameena Shah (33 papers)
  5. Iana Borova (1 paper)
  6. Dylan Langdon (1 paper)
  7. Reema Moussa (1 paper)
  8. Matt Beane (1 paper)
  9. Ting-Hao Huang (4 papers)
  10. Bryan Routledge (3 papers)
  11. William Yang Wang (254 papers)
Citations (226)