Fin-R1-Data: Financial Reasoning Dataset

Updated 15 October 2025

Fin-R1-Data is a specialized financial reasoning dataset that uses explicit chain-of-thought annotations and XML tagging to define both query and answer segments.
It is constructed by distilling heterogeneous financial sources and enforcing strict quality controls through automated and manual checks.
The dataset supports a two-stage training regimen—supervised fine-tuning and reinforcement learning—achieving state-of-the-art results on benchmarks like FinQA and ConvFinQA.

Fin-R1-Data refers to the specialized chain-of-thought (CoT) financial reasoning dataset constructed for the training and empirical validation of the Fin-R1 LLM, as detailed in "Fin-R1: A LLM for Financial Reasoning through Reinforcement Learning" (Liu et al., 20 Mar 2025). This dataset was designed to enhance the capability of LLMs to perform interpretable reasoning and robust decision-making across diverse financial tasks by distilling, structuring, and rigorously validating representative samples from heterogeneous financial sources. Below, the principal aspects and structural features of Fin-R1-Data are discussed in detail.

1. Construction and Structure of Fin-R1-Data

Fin-R1-Data is generated through a two-stage workflow. The initial stage involves the distillation and meticulous filtering of data from multiple raw financial sources, employing DeepSeek-R1 as the core extraction engine and an LLM-as-Judge for exhaustive data quality checks. Sources include financial question answering benchmarks and policy, regulatory, and quantitative datasets. The dataset systematically represents both reasoning and non-reasoning financial scenarios, ensuring diversity and coverage.

Every sample in Fin-R1-Data is structured as a triplet $(x, c, y^*)$ :

$x$ : The financial question, potentially representing real-world exam, policy, or quantitative investment prompts.
$c$ $c$ : The chain-of-thought (CoT) reasoning, encapsulated within custom XML-tags > …</think>, documenting the stepwise deduction process.
- $y^*$ : The definitive answer, enclosed in <answer>…</answer> tags, for clear extraction and comparison during training and evaluation.
This representation supports not only the learning of the reasoning process but also strict output validation and interpretability.

2. Data Sources and Quality Control

Raw inputs for Fin-R1-Data are derived from publicly available and proprietary financial datasets, with a specific focus on benchmarks such as FinQA and ConvFinQA, as well as datasets like FinPEE containing examination-style problems. Extensive filtering is employed using LLM-as-Judge, which assesses and ensures the presence of explicit, logically sound intermediate reasoning steps and answer correctness.

The quality filtering pipeline involves:
- Automated reasoning step extraction via DeepSeek-R1 distillation.
- Output format validation (tags for reasoning and final answers must be present).
- Manual and automated checks for content accuracy, logical flow, and non-redundant, representative reasoning traces.
Samples failing any stage are excluded, preventing logic gaps and formatting anomalies that could impair model training.

3. Formatting and Explicit Tagging

A salient feature of Fin-R1-Data is its explicit tagging scheme. All reasoning steps must be enclosed in <think>…, and answers in <answer>…</answer>. The strict enforcement of this format serves dual purposes:
Facilitates the extraction of reasoning and answer segments during training, enabling format reward integration in reinforcement learning.
Ensures output interpretability at inference, allowing practitioners and auditors to directly inspect the model’s reasoning and answer generation.

This requirement is systematically enforced during both the supervised fine-tuning (SFT) and the reinforcement learning (RL) phases.

4. Training Regimen and Integration with Model Development

Fin-R1-Data is employed in a two-stage training methodology:

Supervised Fine-Tuning (SFT): Each triplet trains the base model to produce logical and interpretable reasoning steps via explicit format learning.
Reinforcement Learning (RL): Samples are used in format-and-accuracy reward design. The “accuracy reward” evaluates whether <answer>…</answer> matches reference answers; the “format reward” checks for compliance with the rigorous output template.

The reinforcement learning policy utilizes Group Relative Policy Optimization (GRPO), for which sample advantage $A_i$ is computed as:

$A_i = \frac{r_i - \mu}{\sigma}$

where $r_i$ is the sample reward and $\mu$ , $\sigma$ denote the empirical mean and standard deviation over the group.

5. Coverage and Diversity Across Financial Tasks

Fin-R1-Data encompasses a broad range of financial reasoning scenarios:

Arithmetic calculations, multi-step accounting operations, and quantitative problem solving.
Regulatory compliance and policy-based reasoning tasks.
Sentiment detection, classification of financial statements, and scenario-based analysis.
Quantitative strategy code generation.

Coverage incorporates both simple and complex reasoning, with the explicit chain-of-thought traces allowing the model to generalize across tasks not explicitly seen in training.

6. Benchmark Representation and SOTA Evaluation Integration

Samples from FinQA and ConvFinQA appear within Fin-R1-Data, enabling direct benchmarking against mainstream financial reasoning tasks. The comprehensive structuring and explicit CoT make Fin-R1-Data uniquely suited for producing interpretable and explainable outputs, instrumental to the model achieving state-of-the-art results (FinQA: 76.0; ConvFinQA: 85.0) with only 7B parameters—surpassing larger models both in reasoning fidelity and overall accuracy.

7. Access and Implications for Research

Fin-R1-Data is made available as part of the open-source Fin-R1 release (https://github.com/SUFE-AIFLM-Lab/Fin-R1), providing researchers and developers access for further investigation, adaptation, and extension. Its structured design supports future work in automated financial reasoning, interpretable regulatory analysis, and domain-specific decision process auditing—establishing a standard for high-quality, chain-of-thought reasoning datasets in finance.

Fin-R1-Data represents a rigorously constructed, explicitly formatted financial reasoning dataset. Its chain-of-thought traces and strict answer validation underpin state-of-the-art financial LLM training, enabling transparent, interpretable performance across complex financial tasks.

PDF Markdown Chat (Pro)

References (1)

Fin-R1: A Large Language Model for Financial Reasoning through Reinforcement Learning (2025)

Follow Topic

Get notified by email when new papers are published related to Fin-R1-Data.

Fin-R1-Data: Financial Reasoning Dataset

1. Construction and Structure of Fin-R1-Data

2. Data Sources and Quality Control

3. Formatting and Explicit Tagging

4. Training Regimen and Integration with Model Development

5. Coverage and Diversity Across Financial Tasks

6. Benchmark Representation and SOTA Evaluation Integration

7. Access and Implications for Research

Follow Topic

Continue Learning

Related Topics