Uni-FinLLM: Unified Financial LLM Architecture

Updated 4 July 2026

Uni-FinLLM is a unified financial intelligence framework that integrates text, time series, and network data across micro, meso, and macro financial levels.
The architecture employs a shared Transformer backbone with modular task heads designed for stock forecasting, credit-risk evaluation, and systemic-risk detection.
It leverages parameter-efficient tuning, contrastive alignment, and risk-aware reinforcement learning to enhance performance and ensure robust safety controls.

Uni-FinLLM denotes a unified financial LLM intended to consolidate multiple financial tasks, data modalities, and operational scales within one system. In the literature, the term appears both as a broad design objective—one shared financial backbone with modular specialization—and as the name of a specific multimodal architecture for stock prediction, credit-risk assessment, and systemic-risk early warning. In its named form, Uni-FinLLM is a shared Transformer backbone with modular task heads that jointly processes financial text, numerical time series, fundamentals, and visual or network-structured data; in its broader sense, it is a research program spanning parameter-efficient adaptation, multilingual finance specialization, reasoning-oriented post-training, and agentic safety controls (Zhang et al., 6 Jan 2026).

1. Conceptual scope

Uni-FinLLM is best understood as a unification strategy for financial intelligence rather than a single immutable model family. The central claim across related work is that financial workflows are fragmented across micro-level asset prediction, firm-level risk assessment, macro-level surveillance, classification, summarization, extraction, reasoning, and decision support, whereas a unified model should preserve shared financial semantics while exposing specialized capabilities when needed. The explicit Uni-FinLLM model frames this as cross-scale prediction—micro-level stock forecasting, meso-level credit-risk assessment, and macro-level systemic-risk detection—under one multimodal backbone (Zhang et al., 6 Jan 2026).

Adjacent work uses the term in a broader architectural sense. MoDULA treats Uni-FinLLM as a single base LLM with a shared universal LoRA expert and domain-specific experts that can be plugged in progressively without retraining all experts from scratch (Ma et al., 2024). The LLM Pro Finance Suite presents a multilingual financial specialization strategy built on instruction-tuned Llama 3.1, Qwen 3, and Gemma 3 models, using a 2,026,812-sample corpus with 54.4% finance-related data and preserving general-domain capabilities; this is not called Uni-FinLLM in the title, but it functions as a concrete template for a unified financial model family (Caillaut et al., 7 Nov 2025). FinAnchor proposes a different notion of unification: not a monolithic LLM, but a shared anchor embedding space into which multiple frozen encoders are aligned by ridge regression and then aggregated for downstream financial prediction (He et al., 24 Feb 2026).

A recurrent misconception is that “unified” necessarily means “single dense model with no modularity.” The literature does not support that reading. Some formulations are monolithic and multimodal, some are modular PEFT systems, and some are representation-layer hubs. This suggests that Uni-FinLLM is a systems concept defined by coherent cross-task financial behavior, not by one exclusive implementation pattern.

2. Canonical multimodal architecture

In the named model, Uni-FinLLM is organized around multimodal inputs

$X = \{X^{(p)}, X^{(t)}, X^{(m)}, X^{(g)}\},$

where $X^{(p)}$ denotes market price time series, $X^{(t)}$ textual financial news and disclosures, $X^{(m)}$ macroeconomic and banking indicators, and $X^{(g)}$ financial networks. These are mapped by modal encoders into a shared representation

$Z = f_e\big(X^{(p)}, X^{(t)}, X^{(m)}, X^{(g)}\big),$

with $f_e$ implemented as a multimodal Transformer encoder-decoder (Zhang et al., 6 Jan 2026).

Before task-specific prediction, Uni-FinLLM uses specialized modal encoders matched to each signal type. Market price series are processed by a temporal attention encoder intended to capture long-term dependencies and volatility clustering. Financial text is encoded by a domain-adapted LLM such as FinBERT or BloombergGPT. Macroeconomic and banking indicators are handled by an MLP with economic-factor attention. Financial networks are encoded by a Graph Attention Network to model contagion channels and centrality structure. After modal encoding, a contrastive alignment loss encourages semantically corresponding modalities to be close in the latent space, and cross-modal attention fuses them into a shared representation (Zhang et al., 6 Jan 2026).

Level	Primary objective	Main inputs
Micro-level	Stock and short-horizon price prediction	Price time series, news
Meso-level	Firm-level credit-risk assessment	Fundamentals, structured credit indicators, disclosures
Macro-level	Systemic risk and crisis early warning	Macro indicators, market stress indices, financial networks

On top of the shared backbone, Uni-FinLLM exposes modular task heads. The micro-level stock prediction head is an autoregressive decoder with mixture-density outputs. The systemic-risk head consumes $Z$ together with the graph adjacency $A$ and predicts risk scores and early-warning labels through attention over macro and network structure. The architecture also includes a policy or regulation assistant head that uses the generative interface of the LLM to answer supervisory or explanatory queries. Credit-risk prediction is evaluated as a meso-level head built in the same style (Zhang et al., 6 Jan 2026).

3. Training objectives and optimization regimes

The named Uni-FinLLM is trained with a multi-objective loss,

$\mathcal{L} = \lambda_1 \mathcal{L}_{\text{forecast}} + \lambda_2 \mathcal{L}_{\text{risk}} + \lambda_3 \mathcal{L}_{\text{align}} + \lambda_4 \mathcal{L}_{\text{RL}},$

where $X^{(p)}$ 0 combines mean-squared error and quantile loss for micro-level forecasting, $X^{(p)}$ 1 covers classification or regression losses for credit and systemic risk, $X^{(p)}$ 2 is the cross-modal contrastive alignment objective, and $X^{(p)}$ 3 incorporates risk-aware reinforcement learning. The RL component uses

$X^{(p)}$ 4

with reward

$X^{(p)}$ 5

so that profitable actions are penalized when they raise systemic risk (Zhang et al., 6 Jan 2026).

The training schedule is staged. First, each modality is pretrained separately. Second, multimodal alignment is optimized. Third, stock-forecasting and risk-estimation objectives are jointly tuned. Fourth, RL fine-tuning incorporates simulated or historical policy-impact feedback. The reported setup uses an NVIDIA A100 GPU cluster, AdamW with warmup-cosine scheduling, batch size 32 for micro-level tasks and 16 for macro-risk tasks, and 80 epochs of joint training (Zhang et al., 6 Jan 2026).

Other finance-oriented work expands this training picture. QianfanHuijin proposes a multi-stage industrial paradigm: Continual Pre-training on financial corpora, Financial SFT, Finance Reasoning RL, Finance Agentic RL, and General RL. It also introduces a dual-mode interface with explicit > ...</think> reasoning in “Thinking” mode and empty <think> in “Non-thinking” mode, together with rule-based and LLM-based verifiers for financial correctness and business alignment (Li et al., 30 Dec 2025). Fino1 shows a narrower but complementary result: domain-specific financial chain-of-thought distilled from FinQA, followed by supervised fine-tuning and PPO-based RL with a verifier reward model, improves an 8B backbone from an average of 50.12 to 61.03 across FinQA, DM-Simplong, and XBRL-Math (Qian et al., 12 Feb 2025).

A plausible implication is that Uni-FinLLM training is converging toward layered specialization rather than one-pass supervised tuning. In current evidence, domain knowledge infusion, financial reasoning optimization, and tool-using alignment are being treated as separable capabilities.

4. Evaluation frameworks and empirical performance

The named Uni-FinLLM reports improvements over domain baselines on three levels of financial prediction. It raises stock directional accuracy to 67.4% from 61.7%, credit-risk accuracy to 84.1% from 79.6%, and macro early-warning accuracy to 82.3%; in the detailed tables, it also reports stock MAPE of 10.9 and hit ratio of 64.3%, credit-risk ROC-AUC of 0.892 and PR-AUC of 0.871, and systemic-risk ROC-AUC of 0.873 (Zhang et al., 6 Jan 2026).

The evaluation ecosystem around Uni-FinLLM is increasingly benchmark-centric. The Open FinLLM Leaderboard defines seven task categories—Information Extraction, Textual Analysis, Question Answering, Text Generation, Risk Management, Forecasting, and Decision-Making—covering 42 datasets, with model outputs normalized to $X^{(p)}$ 6 through min–max scaling. It is explicitly intended as an open platform for assessing FinLLMs and FinAgents under a broader notion of “financial AI readiness” (Lin et al., 19 Jan 2025). FinMaster pushes this further by benchmarking full-pipeline workflows through FinSim, FinSuite, and FinEval across 183 tasks in financial literacy, accounting, auditing, and consulting; its experiments show accuracy dropping from over 90% on basic tasks to about 40% on complex scenarios, and a decline from 58% on single-metric calculations to 37% in multimetric scenarios, exposing computational error propagation in finance-oriented LLM reasoning (Jiang et al., 18 May 2025).

These benchmarks matter because they reframe “financial competence.” High performance on isolated sentiment or QA tasks is no longer sufficient. Uni-FinLLM systems are increasingly expected to handle statement generation, audit-style anomaly detection, multi-source ratio analysis, forecasting, and agentic decision support under one evaluation regime.

5. Modular and parameter-efficient formulations

A major research strand treats Uni-FinLLM not as a fully retrained dense model but as a modular PEFT system. MoDULA is the clearest example: a PEFT MoE paradigm in which the experts are LoRA adapters, explicitly separated into one universal expert $X^{(p)}$ 7 and multiple domain-specific experts $X^{(p)}$ 8, with routers trained in a third stage after expert training. Its MoDULA-Res variant preserves general capability by applying domain experts to the universal representation $X^{(p)}$ 9 and adding $X^{(t)}$ 0 back through a residual connection; the paper reports that MoDULA-Res reduces training cost by over 80% when adding new tasks while avoiding significant degradation of general capability (Ma et al., 2024).

Resource-constrained challenge systems reach related conclusions from a different angle. L3iTC uses 4-bit quantization and LoRA on Mistral-7B Instruct variants and Meta-Llama-3-8B-Instruct for financial classification and summarization, achieving third place in classification with F1-score 0.7543 and sixth place in summarization on the official FinLLM Challenge datasets. The same study reports that the 4-bit + LoRA recipe is effective for short-output classification but degrades summarization quality enough that the team preferred an unfine-tuned Mistral-7B-v0.3 for the official summarization submission (Pontes et al., 2024). CatMemo performs data fusion across financial classification and summarization tasks using Mistral-7B and Llama3-8B with PEFT/LoRA; fused training improves Task 1 and Task 2 validation results but does not transfer well to trading, with the fused Mistral-7B model yielding an overall Sharpe Ratio of $X^{(t)}$ 1 on the challenge trading task (Cao et al., 2024). A comparative study of instruction fine-tuning for financial text classification adds another modular route: single-task and multi-task instruction-tuned models can be merged back with the base model by task arithmetic, and the merged models recover or exceed original zero-shot performance on some unseen financial datasets (Fatemi et al., 2024).

An alternative unification strategy avoids fine-tuning altogether. FinAnchor chooses one encoder as an anchor space, learns linear maps from other frozen LLM embeddings into that space, and averages the aligned features. On Conference Call, 10-Q, FNSPID, Stock Movement, and FOMC tasks, this anchor-based aggregation outperforms strong single-model baselines and a raw concatenation baseline, suggesting that unification can also be implemented at the representation layer rather than at the parameter level (He et al., 24 Feb 2026).

6. Limitations, safety, and future directions

The literature is consistent that unification does not remove core financial failure modes. The named Uni-FinLLM paper notes limitations in data coverage, rare-event sparsity for systemic crises, distribution shift across regimes and regions, and the need for stronger interpretability and out-of-sample testing (Zhang et al., 6 Jan 2026). FinMaster documents concrete degradation under higher $X^{(t)}$ 2, $X^{(t)}$ 3, and $X^{(t)}$ 4, including domain knowledge gaps, critical data omission, floating-point errors, and reasoning inconsistency (Jiang et al., 18 May 2025). Fino1 provides a broader caution: general reasoning enhancements do not reliably transfer to financial reasoning, and some reasoning-optimized models perform worse than non-reasoning counterparts on FinQA and long-context financial tasks, even when they improve on equation-heavy XBRL-style problems (Qian et al., 12 Feb 2025).

Safety introduces a separate constraint. FinHarness argues that finance LLM agents must block prompt-induced unauthorized actions before irreversible tool calls while preserving legitimate multi-step workflows. Its inline safety harness combines a Query Monitor, a Tool Monitor, and a Cascade module with cheap-tier and advanced-tier LLM judges; on FinVault, routed FinHarness cuts ASR from 38.3% to 15.0% while largely preserving benign approval from 41.1% to 39.3%, and uses 4.7 times fewer advanced-judge calls than an always-advanced ablation (Jia et al., 26 May 2026). This suggests that a deployable Uni-FinLLM cannot be defined only by predictive or generative accuracy; it also requires lifecycle safety, tool governance, and bounded-cost oversight.

A second misconception is that unification automatically produces positive cross-task transfer. The evidence is mixed. Parameter sharing and modular PEFT can preserve generality and support pluggability (Ma et al., 2024), but data fusion across text tasks does not automatically yield profitable trading behavior (Cao et al., 2024), and aggressive quantization that is adequate for classification can hurt summarization (Pontes et al., 2024). A plausible implication is that future Uni-FinLLM systems will remain unified at the backbone or representation level while keeping explicit task heads, routers, verifiers, or safety harnesses for structurally different objectives.

In current research, Uni-FinLLM therefore denotes a convergent architecture class: a financial foundation system that seeks shared representations across tasks and modalities, but increasingly relies on modular specialization, reasoning-aware post-training, benchmark-driven validation, and inline governance to remain useful in high-stakes financial settings.