Fin-R1-Data Corpus for Financial Reasoning

Updated 2 March 2026

Fin-R1-Data Corpus is a large-scale, multi-domain dataset that synthesizes financial Q&A, step-by-step reasoning, and code-driven quantitative tasks.
It aggregates examples from nine public and one proprietary financial data source, covering business knowledge, regulatory compliance, and quantitative trading.
The corpus undergoes rigorous distillation, filtering, and manual validation to ensure high-quality multi-step reasoning, achieving strong benchmark performance.

The Fin-R1-Data Corpus is a large-scale, meticulously distilled chain-of-thought (CoT) dataset explicitly designed for training and evaluating financial reasoning in LLMs within the financial sector. Developed as the central data source for the Fin-R1 LLM, Fin-R1-Data is unique in its synthesis of diverse financial Q&A formats, stepwise reasoning traces, and coverage of both quantitative and qualitative decision-making, supporting robust performance across benchmark financial reasoning and compliance tasks (Liu et al., 20 Mar 2025).

1. Data Sources and Structural Composition

Fin-R1-Data aggregates samples from nine publicly available financial corpora and one proprietary examination set. Its construction draws from datasets that span financial business knowledge, professional terminology, numeric and causal reasoning, regulatory compliance, code generation for quantitative trading strategies, sentiment analysis, and multi-step numerical reasoning over tabular data. Major components are:

Ant_Finance: Financial business knowledge, compliance scenarios.
FinanceIQ: Q&A pairs targeting financial terms and domain-specific knowledge.
Quant-Trading-Instruct (FinanceQT): Quantitative strategy code generation.
ConvFinQA & FinQA: Multi-step annual-report table reasoning tasks.
TFNS: Financial-news sentiment classification.
Finance-Instruct-500K: Financial concept explanations, definitions, and Q&A.
FinCorpus: General financial text generation and business knowledge base entries.
FinCUGE: Causal relationship extraction from financial news.
FinPEE: 350 manually extracted and validated calculation problems from Chinese postgraduate finance exams.

This multi-domain approach ensures inclusion of both Chinese and English samples and covers a comprehensive set of financial reasoning scenarios.

2. Corpus Size, Category Breakdown, and Coverage

Following distillation and filtering, the final corpus contains exactly 60,091 examples. These are categorized as follows:

Category	Proportion ( $p_i$ )	Absolute Count ( $N_i$ )	Data Sources
Financial Code	0.002	$\sim$ 120	FinanceQT
Financial Expertise	0.219	$\sim$ 13,181	Finance-Instruct-500K, FinanceIQ, FinPEE
Non-Reasoning Business Knowledge	0.504	$\sim$ 30,285	Ant_Finance, FinCorpus
Reasoning Business Knowledge	0.275	$\sim$ 16,505	FinQA, ConvFinQA, TFNS, FinCUGE

The breakdown ensures a wide spectrum of problem types, including direct Q&A, code synthesis, multi-hop reasoning, compliance checks, business concept understanding, and quant trading tasks. Every sample includes a question, a detailed CoT reasoning trace derived or filtered from LLM generation, and a gold-standard answer.

3. Distillation, Filtering, and Quality Assurance Workflow

The data curation process follows a two-stage pipeline:

Distillation: Raw prompts from the source datasets are processed by DeepSeek-R1, a reasoning-intensive LLM, with controlled decoding (temperature 0.6), standardized answer formatting (with $\backslash$ boxed{} markers for math results), and each CoT beginning with an explicit newline.
Filtering Pipeline:
- Answer Verification: For questions with objective answers, a string-exact match is required against the ground truth. Subjective/long-form tasks are passed to Qwen2.5-72B-Instruct—a large LLM judge, empirically selected (99.6% accuracy in domain prompt tuning)—which returns a correctness label.
- CoT Quality Selection: Each reasoning trace is scored across seven axes: internal consistency, term overlap rate, number of steps, logical coherence, content diversity, domain relevance, and instruction alignment. Only examples meeting or exceeding a threshold total score (as determined by the LLM-as-judge) are retained.
- The entire pipeline removes incoherent, wrong-answer, and low-quality reasoning, yielding a dataset rich in valid, multi-step financial thinking.

For the proprietary FinPEE component, PDF-to-Markdown conversion is followed by manual Q&A extraction and validation, ensuring each exam-type entry is fully structured and answer-verified.

4. Data Schematic, Splits, and Coverage of Evaluation Benchmarks

All 60,091 curated samples constitute a supervised fine-tuning (SFT) pool. The corpus is not supplied with an official train/dev/test split; instead, downstream evaluation utilizes the original splits from externally benchmarked tasks (FinQA, ConvFinQA). For unrestricted tasks (such as Ant_Finance, TFNS, Finance-Instruct-500K), random samples of up to 1,000 examples are drawn for testing, or the full set if smaller.

Every example adheres to a strict format, typically encompassing:

Input: Natural language question, context, or prompt.
Chain-of-Thought Reasoning: Multi-step rationale, often beginning with a “\n”.
Answer: Canonical, string-exact or judge-verified solution (boxed for math).

No explicit mean token-length statistics are reported in the original data, but there is coverage of both long-form and short-form reasoning. The dataset is explicitly bilingual (Chinese/English), a direct result of drawing from both Chinese and English financial sources.

5. Selection Criteria and Annotation Protocols

Two orthogonal criteria govern inclusion of each record:

Answer Correctness: For numeric or objective questions, the generated answer is either string-exact or is labeled ‘correct’ by the LLM-as-judge.
Reasoning Quality: Each CoT is filtered for sufficient scores on the seven listed dimensions; diversity, consistency, and logical progression are emphasized.

Manual validation is reserved for FinPEE exam-style entries; no large-scale human annotation is applied for the rest, beyond what is automated or LLM-judged.

6. Functional Significance: Enhancing Reasoning and Financial Model Competence

The corpus’s design ensures robust grounding in:

Professional financial terminology and regulatory argumentation patterns (via Expert and Compliance sources).
Tabular, numeric, and textual reasoning, via multi-step, end-to-end problem chains.
Structured reasoning output, enforcing RL-tuned format expectations such as > …<answer>…</answer> for consistency.
Code-centric quantitative tasks (strategy generation/evaluation), covering a growing subdomain in prompt-based quant trading.

Fin-R1, when trained on this data, demonstrates strong results relative to parameter-matched and much larger models on established benchmarks. For example, despite being a 7B-parameter model, Fin-R1 achieves 85.0% on ConvFinQA and 76.0% on FinQA, suggesting both the breadth and depth of the Fin-R1-Data corpus (Liu et al., 20 Mar 2025).

7. Use Cases and Referential Impact

Fin-R1-Data enables:

Supervised fine-tuning and RL for LLMs on real financial tasks.
Evaluation and benchmarking of step-by-step financial reasoning, regulatory compliance, code synthesis, and concept explanation.
Cross-domain, cross-lingual reasoning analysis, leveraging its pooled Chinese and English content.
Advanced methodology development, e.g., for chain-of-thought prompting, judge-based rejection sampling, and multi-granularity financial data modeling.

Its large scale, domain specificity, and strict quality controls make it an indispensable resource for researchers aiming to build or benchmark transparent, auditable, and accurate financial reasoning systems in the LLM era (Liu et al., 20 Mar 2025).

Markdown Report Issue Upgrade to Chat

References (1)

Fin-R1: A Large Language Model for Financial Reasoning through Reinforcement Learning (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Fin-R1-Data Corpus.