Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 134 tok/s
Gemini 2.5 Pro 41 tok/s Pro
GPT-5 Medium 28 tok/s Pro
GPT-5 High 22 tok/s Pro
GPT-4o 79 tok/s Pro
Kimi K2 178 tok/s Pro
GPT OSS 120B 433 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

Fin-R1: Financial Reasoning LLM

Updated 15 October 2025
  • Fin-R1 is a financial domain-specific LLM that integrates supervised fine-tuning with reinforcement learning for precise multi-step financial reasoning.
  • It leverages a high-quality dataset with 60K chain-of-thought annotations to enforce structured logical progression and numerical analysis.
  • The model achieves state-of-the-art performance on specialized financial benchmarks, outperforming larger models on complex decision tasks.

Fin-R1 refers to a LLM for financial reasoning, designed to deliver robust multi-step reasoning and decision-making capabilities on complex financial tasks. The model is constructed using a two-stage training pipeline that synergizes supervised fine-tuning (SFT) with reinforcement learning (RL), specifically tailored to the financial domain by means of a distilled financial reasoning dataset. Fin-R1 is parameter-efficient, based on a 7B weight configuration, and demonstrates state-of-the-art (SOTA) performance on specialized benchmarks (FinQA, ConvFinQA) while remaining competitive with, or outperforming, much larger models on a broad suite of financial reasoning and decision tasks (Liu et al., 20 Mar 2025).

1. Two-Stage Model and Data Architecture

Fin-R1 is built around a two-stage approach. First, dataset generation and distillation yields the Fin-R1-Data corpus: a high-quality, financial reasoning dataset containing approximately 60,091 annotated chains-of-thought (CoT) and non-reasoning financial questions. The CoT annotations employ standardized tagging with > and <answer> tags, enforcing both the presence of interpretable step-by-step reasoning and an explicit answer segment, ensuring the model explicitly models logical progression alongside numerical and contextual reasoning.

Model pretraining and initialization utilize the Qwen2.5-7B-Instruct neural backbone to strike a balance between parameter economy and learning capacity.

The training pipeline consists of:

  • Supervised Fine-Tuning (SFT): Model initialized from the Qwen2.5-7B is fine-tuned using triplets v=(x,c,yβˆ—)v = (x, c, y^*), where xx is the financial query, cc is the explicitly tagged reasoning trace, and yβˆ—y^* is the standardized answer.
  • Reinforcement Learning (RL): RL is performed using the Group Relative Policy Optimization (GRPO) algorithm, leveraging dual reward signals (formatting and answer accuracy), further aligning the model's outputs to domain requirements and precision.

2. Training Methodology and Objective Function

The SFT step encodes both reasoning and answer generation directly:

  • Each training instance exposes the model to explicit CoT tags, enforcing structured multi-step logical thinking prior to answer rendering.
  • Format tags ensure outputs are robust to downstream evaluation and parsing.

In the RL phase, policy optimization follows a comparative group-based approach:

  • For each sample, the model generates a group of candidate outputs {o1,...,oG}\{o_1, ..., o_G\} under the previous policy Ο€old\pi_{\text{old}}.
  • Each output is rewarded for both correct format (presence of exactly one <think> and one <answer> segment) and answer correctness, validated by an external judge model (Qwen2.5-Max), with rewards allocated as 1 (semantic match) or 0 (otherwise).
  • The group-relative advantage for each candidate output is

Ai=riβˆ’ΞΌΟƒA_i = \frac{r_i - \mu}{\sigma}

where rir_i is the reward for candidate ii and ΞΌ,Οƒ\mu, \sigma are the sample group mean and standard deviation.

  • The GRPO objective is given by

JGRPO(ΞΈ)=E[1Gβˆ‘i=1Gmin⁑(riratioβ‹…Ai,clip(riratio,1βˆ’Ξ΅,1+Ξ΅)β‹…Ai)βˆ’Ξ²DKL(Ο€ΞΈβˆ£βˆ£Ο€ref)]\mathcal{J}_{\text{GRPO}}(\theta) = \mathbb{E}\left[ \frac{1}{G} \sum_{i=1}^G \min\left( r_i^{\text{ratio}} \cdot A_i, \text{clip}(r_i^{\text{ratio}}, 1 - \varepsilon, 1 + \varepsilon)\cdot A_i \right) - \beta D_{\text{KL}}(\pi_\theta || \pi_{\text{ref}}) \right]

where riratio=πθ(oi∣v)Ο€old(oi∣v)r_i^{\text{ratio}} = \frac{\pi_\theta(o_i|v)}{\pi_{\text{old}}(o_i|v)} and DKLD_{\text{KL}} is a regularization term for policy smoothness.

This dual-phase workflow yields a model highly attuned to both the formal structure and the substantive logic of financial Q&A.

3. Performance Metrics and Evaluation

Fin-R1 is evaluated across a multidataset battery of financial benchmarks:

  • On a portfolio of five standard datasets (FinQA, ConvFinQA, Ant-Finance, TFNS, Finance-Instruct-500K), it achieves an overall average score of 75.2.
  • SOTA results are demonstrated with 85.0 on ConvFinQA and 76.0 on FinQA: both tasks require chain-of-thought calculation, numerical precision, and multi-step symbolic manipulation.
  • Fin-R1 outperforms DeepSeek-R1-Distill-Llama-70B (69.2 overall) and even some larger models at 32B parameters, emphasizing training and architecture effectiveness at smaller scale.

This performance attests not just to surface-level answer accuracy, but to consistent multi-step reasoningβ€”a crucial capability for high-stakes financial operations.

4. Financial Reasoning and Decision-Making Capabilities

Fin-R1 is optimized to handle reasoning patterns native to the financial domain:

  • Supports numerical computation, ratio assessment, decimal/percentage translation, and strict output formatting (for auditability).
  • Mastery of chain-of-thought style thinking allows not just single-step problem-solving but multi-hop compliance, automated regulatory checks, and investment logic unfolding.
  • Output tags (<think>, <answer>) are strictly enforced by the reward mechanism, providing both human interpretability and traceability in automated decision environments where justification is mandatory.
  • Examples in the model outputs include broken-down calculation steps, internal uncertainty management, and precise outcome reporting.

5. Comparative Analysis and State-of-the-Art Achievements

Fin-R1 achieves top-tier results among financial domain models of moderate size, with characteristics including:

  • Parameter efficiency: 7B scale yet competitive against, or superior to, 32B and 70B models.
  • Strong generalization to fragmented data, numerically precise tasks, and logical QA.
  • Benchmarks (FinQA, ConvFinQA) act as proxies for real-world scenarios such as document-based financial analysis, risk adjudication, and compliance checking.

The model's competitive advantage arises from the combination of a domain-tailored data pipeline, chain-of-thought enforcement, and targeted RL-facilitated alignment of output structure and content.

6. Open Source Availability

The Fin-R1 codebase is released at https://github.com/SUFE-AIFLM-Lab/Fin-R1, supporting:

  • Full reproducibility of the model experiments and evaluation statistics.
  • Community extension, adaptation to new financial data schemas, and further innovation on domain-specific LLM reasoning frameworks.

This facilitates adoption within both fintech engineering and academic research, incentivizing rapid iteration on domain grounding, explainability, and regulatory-oriented reasoning.

7. Broader Implications

Fin-R1 exemplifies a modern pipeline for financial AI:

  • Demonstrates that smaller, well-designed architectures can achieve SOTA on reasoning-intensive financial tasks when supplied with explicit chain-of-thought data and reinforced alignment.
  • Sets a new baseline for the deployment of reasoning LLMs in financial decision-making, compliance monitoring, robo-advisory, and automated reporting systems.

This approach is distinguished by explicit structural reasoning control, reinforcement learning alignment for regulated output forms, and codebase accessibility, positioning Fin-R1 as an influential model for continued integration of LLMs into high-stakes financial domains.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)
Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Fin-R1.