Papers
Topics
Authors
Recent
Search
2000 character limit reached

Fin-R1: A Large Language Model for Financial Reasoning through Reinforcement Learning

Published 20 Mar 2025 in cs.CL | (2503.16252v2)

Abstract: Reasoning LLMs are rapidly evolving across various domains. However, their capabilities in handling complex financial tasks still require in-depth exploration. In this paper, we introduce Fin-R1, a reasoning LLM specifically designed for the financial sector. Fin-R1 is built using a two-stage architecture, leveraging a financial reasoning dataset distilled and processed based on DeepSeek-R1. Through supervised fine-tuning (SFT) and reinforcement learning (RL) training, it demonstrates performance close to DeepSeek-R1 with a parameter size of 7 billion across a range of financial reasoning tasks. It achieves the state-of-the-art (SOTA) in the FinQA and ConvFinQA tasks between those LLMs in our evaluation, surpassing larger models in other tasks as well. Fin-R1 showcases strong reasoning and decision-making capabilities, providing solutions to various problems encountered in the financial domain. Our code is available at https://github.com/SUFE-AIFLM-Lab/Fin-R1.

Summary

  • The paper introduces Fin-R1, a 7B-parameter model trained via data distillation and reinforcement learning to enhance financial reasoning.
  • It employs a two-stage pipeline combining Supervised Fine-Tuning and GRPO-based RL to refine decision-making and reasoning precision.
  • Fin-R1 achieves state-of-the-art scores on benchmarks such as ConvFinQA (85.0) and FinQA (76.0), enabling cost-effective deployment.

Fin-R1: A Comprehensive Approach to Financial Reasoning Using Reinforcement Learning

Introduction

The application of LLMs in the domain of financial reasoning presents unique challenges tied to fragmented financial data, uncontrollable reasoning logic, and insufficient business generalization ability. "Fin-R1: A LLM for Financial Reasoning through Reinforcement Learning" introduces Fin-R1, a financial domain-specific LLM. Fin-R1's 7 billion parameter model is optimized using a two-stage training framework, significantly reducing deployment costs while enhancing its capability to tackle complex financial reasoning tasks.

Methodological Framework

Two-Stage Construction Framework

Fin-R1 is constructed through a robust two-stage framework:

  1. Data Generation: This involves the creation of Fin-R1-Data, a high-quality dataset through data distillation and filtering techniques. The dataset is composed of approximately 60,091 complete chains of thought (CoT) spanning reasoning and non-reasoning scenarios.
  2. Model Training: Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL), particularly employing GRPO (Group Relative Policy Optimization), are used to refine the model's decision-making processes and enhance its reasoning precision. Figure 1

    Figure 1: The pipeline for constructing Fin-R1. The diagram depicts the two-stage construction framework of Fin-R1.

Data Construction and Processing

The Fin-R1-Data integrates both open-source and proprietary datasets, providing a diverse range of financial contexts and tasks. A meticulous data construction pipeline is employed to ensure high data quality:

  • Data Distillation: Aligning with DeepSeek-R1 configurations to ensure consistent reasoning patterns.
  • Data Filtering: Utilizing models like Qwen2.5-72B-Instruct as judges for reasoning selection, optimizing the coherence and validity of reasoning paths. Figure 2

    Figure 2: Stage 1-The pipeline of data construction: Data Distillation, Answer Check, and Reasoning Selection.

Training Methodology

Fin-R1 employs a combination of Supervised Fine-Tuning and Reinforcement Learning. The key elements include:

  • SFT Training: The training utilizes structured reasoning-augmented data which helps the model refine its parameters to yield precise reasoning and answers.
  • GRPO Algorithm: This optimization introduces dual reward signals—one for format correctness and another for content accuracy, ensuring high-quality outputs. Figure 3

    Figure 3: Stage 2-The pipeline of training construction, detailing the SFT and RL phases.

Experimental Evaluation

Evaluation Setup

The model's performance was validated using various financial benchmarks such as FinQA, ConvFinQA, Ant-Finance, TFNS, and Finance-Instruct-500k. Qwen2.5-72B-Instruct's judge performance was especially noted for its consistency in evaluations.

Results

Fin-R1 excelled across multiple tasks, achieving state-of-the-art scores of 85.0 in ConvFinQA and 76.0 in FinQA. Its compact structure allowed it to deliver superior performance using fewer computational resources compared to massive models like DeepSeek-R1-Distill-Llama-70B. Figure 4

Figure 4

Figure 4: Heatmap comparison of reasoning scores between LLMs and human annotators.

Conclusion

The development of Fin-R1 demonstrates an effective model for integrating financial reasoning capabilities into LLMs, addressing challenges related to fragmented data and reasoning logic. Future directions include expanding dataset coverage and exploring multimodal architectures to further enhance FlexR1's performance and applicability in dynamic financial environments.

The implications of Fin-R1 are significant not only for enhancing automated reasoning capabilities but also for enforcing regulatory compliance and improving decision-making processes within the financial industry. Future work will focus on refining the model for broader financial applications, emphasizing integration with complex financial datasets and scenarios.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

GitHub

Tweets

Sign up for free to view the 6 tweets with 65 likes about this paper.