FinTSFMs: Financial Time-Series Models

Updated 4 July 2026

FinTSFMs are foundation models designed for sequential financial data, leveraging techniques like patch-based Transformers and discrete tokenization to capture temporal dynamics and non-stationarity.
They employ finance-native architectures, such as Kronos and FinCast, that utilize specialized tokenizers and loss functions to effectively model market uncertainty and regime shifts.
Empirical evidence shows that tailored pretraining, fine-tuning, and modular adaptation strategies in FinTSFMs improve tasks like price forecasting, volatility modeling, and risk management.

Financial Time-Series Foundation Models (FinTSFMs) are foundation models specialized for sequential financial data such as prices, returns, volatility, order flow, exchange rates, and related market signals, with the aim of transferring learned representations to downstream financial prediction and decision tasks. Within the broader taxonomy of Financial Foundation Models, they form the time-series branch alongside Financial Language Foundation Models and Financial Visual-Language Foundation Models, but differ in both modality and objective: FinTSFMs are optimized for temporal dependence, non-stationarity, and numerical dynamics rather than text understanding or visual reasoning (Chen et al., 7 Jul 2025). The area remains heterogeneous and comparatively immature, spanning patch-based autoregressive Transformers, discrete-event “market language” models, LLM adaptations via prompt reprogramming or fine-tuning, and tool-augmented agents; more recent work adds finance-native large-scale pretraining on K-line and multi-market corpora, indicating an ongoing transition from exploratory prototypes toward dedicated financial backbones (Chen et al., 7 Jul 2025, Shi et al., 2 Aug 2025, Zhu et al., 27 Aug 2025).

1. Taxonomic position and defining characteristics

The survey literature places FinTSFMs inside the broader category of Financial Foundation Models and defines them by their direct engagement with sequential market data rather than by reuse of generic language backbones alone. In the survey’s taxonomy, FinTSFMs differ from general-purpose foundation models such as GPT-4 or Gemini because they are explicitly optimized for the temporal structure of financial markets, and they differ from FinLFMs and FinVLFMs because their core problem is modeling temporal dependence, non-stationarity, and numerical dynamics in market sequences (Chen et al., 7 Jul 2025).

At the time of the survey, the field was still notably small: only seven representative FinTSFMs were identified, and there was no standardized recipe comparable to the more mature FinLFM “pre-training + supervised finetuning + alignment” workflow (Chen et al., 7 Jul 2025). This early-stage condition is significant. It implies that FinTSFMs should not be understood as a single stabilized architectural family, but rather as a research space in which multiple incompatible design choices are still being tested: continuous versus discrete representations, native time-series pretraining versus LLM adaptation, and parametric forecasting versus agentic reasoning-and-tool pipelines.

Subsequent papers broaden that landscape. Kronos is presented as a finance-only foundation model for candlestick sequences with OHLCVA features and a tokenizer tailored to market data (Shi et al., 2 Aug 2025). FinCast is presented as a 1B-parameter decoder-only financial forecasting model trained on large-scale financial datasets spanning multiple domains and temporal resolutions (Zhu et al., 27 Aug 2025). These systems suggest a movement from adapting generic TSFMs to constructing explicitly finance-native time-series foundations.

2. Architectural paradigms and representation strategies

The dominant architectural pattern in early FinTSFMs is Transformer-based autoregressive sequence modeling, usually with patching to make long continuous and multivariate financial sequences tractable (Chen et al., 7 Jul 2025). Patching converts long sequences into fixed-length segments before encoding them, thereby extending effective temporal context while controlling sequence length. This design is inherited by TimesFM-derived financial adaptations such as Fin-TimesFM and FinDA-TimesFM (Chen et al., 7 Jul 2025).

Paradigm	Core representation	Representative systems
Patch-based autoregressive forecasting	Fixed-length numeric patches	Fin-TimesFM, FinDA-TimesFM, FinCast
Event-token market modeling	Discrete microstructure or K-line tokens	MarketGPT, Kronos
LLM adaptation	Prompt reprogramming or direct LLM fine-tuning	Time-LLM, UniTime
Agentic non-training inference	External tools, knowledge bases, structured series	SocioDojo
Generative refinement	Post-hoc correction of TSFM outputs	RefineBridge

MarketGPT is the clearest departure from patch-based continuous modeling. Rather than segmenting numerical series, it tokenizes raw order-message streams into discrete market microstructure events and learns autoregressively over those events, which the survey characterizes as a distinctive “market language” design (Chen et al., 7 Jul 2025). This choice is structurally important because it moves the representation from continuous trajectory modeling toward discrete event sequencing, which is especially natural for order-flow simulation and microstructure analysis.

Kronos extends the discrete-token paradigm to K-line data. Each OHLCVA observation is treated as a 6D vector, and a Transformer-based autoencoder discretizes the sequence into tokens using Binary Spherical Quantization. The token is then factorized into coarse and fine subtokens, so the model learns a hierarchical coarse-to-fine representation rather than a flat vocabulary. A decoder-only Transformer performs autoregressive modeling over those discrete market tokens (Shi et al., 2 Aug 2025). The paper argues that this construction suppresses noise, bounds distortion, compresses state space, and maps common K-bar patterns and rare extreme events into a structured token vocabulary.

FinCast represents a different finance-native architecture. It is a decoder-only Transformer with sparse Mixture-of-Experts layers, patch-based tokenization, instance normalization, and learnable frequency embeddings designed to handle resolution heterogeneity from seconds to months (Zhu et al., 27 Aug 2025). In this architecture, temporal resolution is modeled as a structural conditioning variable rather than a nuisance covariate, and sparse token-level expert routing is intended to let different experts specialize in different regimes or domains.

The LLM-adaptation branch is architecturally distinct from both Kronos-like token models and TimesFM-style patch models. Time-LLM reprograms time series into prompt-like inputs and feeds them into a frozen LLM, while UniTime fine-tunes GPT-2 directly for multivariate forecasting with domain instructions (Chen et al., 7 Jul 2025). These models do not start from a finance-native temporal tokenizer; instead, they import language-model inductive biases into the time-series setting. SocioDojo goes further outside standard parametric forecasting by using GPT-3.5/4 with external tools, knowledge bases, and structured time series in a non-training, agentic workflow (Chen et al., 7 Jul 2025).

3. Objectives, adaptation regimes, and transfer mechanisms

The survey groups FinTSFM methodologies into three broad classes: time-series pretraining, training-based LLM adaptation, and non-training methods (Chen et al., 7 Jul 2025). In the time-series-native setting, the standard objective is autoregressive learning over future time points, analogous to language modeling. In patch-based models, the sequence is broken into patches and learned as a forecasting foundation model; in MarketGPT, tokenization of order-message data enables discrete event autoregression (Chen et al., 7 Jul 2025).

Kronos exemplifies a more specialized finance-native objective. After K-line tokenization, it models token sequences via standard autoregressive factorization,

$p(\mathbf{b}) = \prod_{t=1}^{T} p(b_t \mid \mathbf{b}_{<t}),$

with an additional hierarchical decomposition into coarse and fine subtokens. The paper further notes that the fine subtoken is trained using the model’s own sampled coarse prediction rather than teacher forcing, to reduce exposure bias and better match inference-time behavior (Shi et al., 2 Aug 2025). This is a concrete example of how finance-specific representation design changes the training problem itself rather than merely changing the dataset.

Finance-specific loss design also appears in FinCast. Its Point-Quantile Loss combines a Huber point loss, trend consistency loss, quantile loss, and MoE regularization, with the stated goal of resisting forecast collapse and improving robustness under non-stationarity, heavy tails, and regime changes (Zhu et al., 27 Aug 2025). This differs materially from pure MSE forecasting objectives and reflects the fact that financial forecasting often requires distributional and directional fidelity rather than mean-regression alone.

A recurring empirical theme across the literature is that zero-shot transfer from generic pretraining is often insufficient in finance. Financial continual pre-training of TimesFM on more than 100K financial time series and 90M time points materially improved price prediction relative to the original TimesFM checkpoint, which the paper found unsatisfactory on irregular financial prices (Fu et al., 2024). In VaR forecasting, zero-shot use was explicitly described as not optimal, whereas fine-tuned TimesFM became one of the strongest models in the comparison (Goel et al., 2024). In realized volatility forecasting, TimesFM in pretrained form provided a reasonable baseline, but incremental fine-tuning was reported as essential for learning volatility patterns effectively and statistically outperforming traditional models under Diebold-Mariano and Giacomini-White tests (Goel et al., 16 May 2025).

The adaptation regime itself is also diversifying. Some work emphasizes prompt-style or few-shot reuse of pretrained backbones; in a financial-aid benchmark with yearly data from 2004–2020, GPT-2-based TimeLLM, CALF, and GPT4TS were competitive in few-shot settings but substantially weaker in true zero-shot mode, indicating that sparse-domain adaptation remains important for idiosyncratic financial series (Islam et al., 2024). Other work explores synthetic pretraining rather than domain fine-tuning: a decoder-type Transformer pretrained on Lorenz-system-based synthetic chaotic series with 10 billion training samples per predictive horizon was transferred zero-shot to BTCUSDT trade data, where the paper reports a scaling-law-like relation between horizon difficulty and required data volume (Takemoto, 5 Sep 2025). This suggests that, within FinTSFM research, “pretraining” now spans financial corpora, synthetic dynamical systems, and LLM-derived prompt interfaces rather than a single canonical procedure.

4. Data ecosystems, corpora, and benchmark construction

The data ecosystem for FinTSFMs ranges from classical financial benchmarks to very large market corpora. The survey highlights widely used baseline datasets such as Google Stock Prices, S&P 500 historical prices, the Exchange Rate dataset, and Bitcoin prices, which provide univariate or multivariate sequences for standard forecasting evaluation (Chen et al., 7 Jul 2025). These datasets are important historically, but the survey also emphasizes their limitations: many are short, narrow, or single-market, and long-context or richly multimodal time-series resources remain scarce.

Two datasets are especially notable in the survey’s account. FNSPID combines 29.7 million price records with 15.7 million news headlines across more than 4,000 listed companies, enabling market analysis and price forecasting with explicit news-price coupling (Chen et al., 7 Jul 2025). FinTSB, introduced as a benchmark contribution, comprises 20 datasets, each containing 300 stocks over 250 days and organized into four regime patterns—uptrend, downtrend, volatility, and black swan. It is designed to provide unified metrics, realistic constraints such as transaction fees, and explicit sequence characteristics including non-stationarity and forecastability (Chen et al., 7 Jul 2025). FinTSB is therefore not merely a dataset collection; it encodes the regime-shift problem directly into benchmark design.

More recent finance-native models scale far beyond these classical resources. Kronos is pretrained on 12.11 billion K-line observations drawn from 45 global exchanges or markets and spanning seven temporal granularities across equities, crypto, futures, forex, and indices (Shi et al., 2 Aug 2025). FinCast uses a corpus of 20+ billion time points across 2.4 million time series, including crypto, forex, futures, stocks, economic indicators, and auxiliary non-financial data, with frequencies ranging from seconds to months (Zhu et al., 27 Aug 2025). These corpora reflect a broader shift toward pretraining on financial heterogeneity itself rather than treating finance as a minor slice of a generic TSFM mixture.

Large-scale return panels are also entering evaluation practice. A comprehensive empirical study of TSFMs in finance uses daily excess returns from 94 countries during 1990–2023, amounting to around 2 billion observations in the largest combined setting, and evaluates zero-shot inference, fine-tuning, and pretraining from scratch under expanding-window out-of-sample design (Rahimikia et al., 23 Nov 2025). At the opposite extreme, the financial-aid study shows that FinTSFM-relevant problems can also arise in very small, low-frequency, policy-sensitive datasets with only 17 yearly points and 56 channels (Islam et al., 2024). Taken together, these cases show that “financial time series” in the FinTSFM literature encompasses ultra-large multi-market corpora, low-frequency institutional allocations, multivariate macro-financial panels, and event-level microstructure streams.

5. Applications and empirical evidence

The most explicit application cluster for FinTSFMs is forecasting-oriented: stock price prediction, volatility modeling, risk recognition, Value-at-Risk prediction, realized volatility forecasting, and order-generation or simulation (Chen et al., 7 Jul 2025). MarketGPT is positioned as an order-generation engine for discrete-event simulation of order flow, while TimesFM-derived financial models are evaluated for left-tail VaR and realized volatility forecasting (Chen et al., 7 Jul 2025). The same survey also frames FinTSFMs more broadly as foundational components for trading, risk control, and decision workflows.

Empirically, finance-native pretraining has produced some of the strongest reported results. Kronos reports zero-shot gains across five representative tasks—price series forecasting, return forecasting, realized volatility forecasting, synthetic K-line generation, and investment simulation or backtesting. On its benchmarks, the paper reports a 93% RankIC improvement over the leading TSFM and an 87% RankIC improvement over the best non-pretrained baseline in price forecasting, a 9% lower MAE in volatility forecasting, and a 22% improvement in generative fidelity for synthetic K-line sequences (Shi et al., 2 Aug 2025). The significance of these results is not only numerical; they demonstrate that a finance-only discrete-token model can target forecasting, risk estimation, and generative market simulation within one backbone.

FinCast likewise reports strong zero-shot transfer. On 3,632 held-out series with more than 4.38M scalar points across crypto, forex, stocks, and futures at minute-to-week resolutions, it reports overall zero-shot performance of MSE $= 0.1644$ and MAE $= 0.2397$ , with about 20% lower MSE on average versus prior zero-shot state of the art (Zhu et al., 27 Aug 2025). On supervised PCIE benchmark datasets, its fine-tuned variant is reported to achieve about 26% MSE reduction and 19% MAE reduction relative to baselines (Zhu et al., 27 Aug 2025). The paper’s interpretation is that a finance-specific mixture-of-experts architecture with uncertainty-aware objectives yields more realistic, non-flat forecasts under distribution shift.

Evidence from adapted generic TSFMs is more mixed but still important. In multivariate financial forecasting, pretrained Tiny Time Mixers were reported to achieve 25–50% better performance when fine-tuned on limited data and 15–30% improvements even with lengthier datasets relative to the same architecture trained from scratch; the pretrained model also required 3–10 fewer years of data to reach comparable performance levels (Marconi, 9 Jul 2025). However, the same study found that traditional specialized models matched or exceeded TSFM performance in two of three tasks, underscoring that sample efficiency and absolute task optimality are not the same thing (Marconi, 9 Jul 2025).

Risk forecasting provides a second major application axis. In VaR forecasting on the S&P 100 index and constituents, fine-tuned TimesFM consistently outperformed traditional methods in actual-over-expected ratios and performed comparably to GAS on quantile score, ranking as the best or among the top performers across the 0.01, 0.025, 0.05, and 0.1 quantiles (Goel et al., 2024). In realized volatility forecasting across 21 global equity indices, pretrained TimesFM was competitive, but incrementally fine-tuned variants were stronger and statistically superior to classical econometric benchmarks under formal forecast-comparison tests (Goel et al., 16 May 2025). These studies indicate that domain adaptation can make generic TSFMs viable for financial risk management, even when pure zero-shot use remains weak.

Operational studies reinforce the same pattern. On the Exchange Rates benchmark, TimesFM 2.5 achieved MASE $= 2.167$ , sMAPE $= 1.82\%$ , and RMSE $= 0.011$ , outperforming Chronos, PatchTST, and DLinear on MASE, while XGBoost still retained the best RMSE $= 0.007$ and sMAPE $= 0.81\%$ (Soni et al., 23 May 2026). The paper therefore characterizes stochastic financial markets as a regime in which the FM–specialist boundary is becoming contested rather than decisively won by either side (Soni et al., 23 May 2026).

6. Methodological limits, operational constraints, and disputed claims

The central difficulty for FinTSFMs is non-stationarity. Financial time series shift across uptrend, downtrend, volatility, and black-swan regimes; the survey treats this as a first-order modeling problem and explicitly encodes it in FinTSB (Chen et al., 7 Jul 2025). Closely related issues include limited labeled data, narrow benchmark diversity, temporal leakage, robustness failures when text or external knowledge are introduced, interpretability requirements in high-stakes settings, large computational cost, privacy and confidentiality constraints, and regulatory compliance burdens (Chen et al., 7 Jul 2025). In other words, the main obstacles are not only predictive but also curatorial, infrastructural, and institutional.

A recurring controversy concerns the transferability of generic TSFMs. Several papers report that off-the-shelf pretrained models perform poorly on finance. Direct application of TimesFM to price data was described as unsatisfactory because financial prices are irregular, non-stationary, and prone to extreme moves unlike the regular series in its original training mix (Fu et al., 2024). A large-scale study of daily excess returns across global markets found that off-the-shelf TSFMs performed poorly in zero-shot and fine-tuning settings, whereas models pretrained from scratch on financial data achieved substantial forecasting and economic improvements; increasing dataset size, adding synthetic augmentation, and tuning hyperparameters further improved results (Rahimikia et al., 23 Nov 2025). This line of evidence argues against the assumption that generic TSFM pretraining automatically transfers to finance.

At the same time, newer results show that the gap with specialists is narrowing under some conditions. The Exchange Rates study finds that TimesFM 2.5 dramatically outperforms TimesFM 2.0 and beats several supervised specialists on MASE, even though XGBoost remains stronger on RMSE and sMAPE (Soni et al., 23 May 2026). A plausible implication is that architectural refinements, longer context, and better normalization can make foundation models increasingly viable in stochastic financial markets without making them universally superior.

Operational constraints remain severe. On Exchange Rates, reported P95 latency and throughput were 283 ms and 3.5/s for TimesFM 2.5, versus 0.18 ms and 9,300/s for XGBoost, which the study describes as an inference tax for foundation models (Soni et al., 23 May 2026). The same paper proposes a Complexity Router that routes each series to either a foundation model or specialist using empirical features; at the identified Pareto knee, $\alpha = 0.30$ , the routing mix achieved MASE 0.970 at cost 301×, compared with pure FM deployment at MASE 0.989 and cost 1000× (Soni et al., 23 May 2026). This argues that heterogeneous routing may be preferable to universal FM deployment in production finance.

Economic evaluation complicates the picture further. In the global daily excess-return study, transaction costs heavily eroded profitability for all models: average Sharpe for the benchmark fell from about 6.44 at 0 bps to about $-3.62$ at 20 bps and about $= 0.1644$ 0 at 40 bps, although larger TSFMs pretrained on JKP-augmented or synthetic-augmented data were comparatively more resilient (Rahimikia et al., 23 Nov 2025). FinTSFM claims based solely on frictionless forecast metrics therefore require caution when interpreted for trading deployment.

7. Research directions and likely trajectories

The survey’s forward-looking agenda begins with standardization. It calls for larger, more diverse, and more realistic financial time-series benchmarks, especially benchmarks that support long-context modeling, multimodal reasoning, and regime-aware evaluation, as well as a standardized FinTSFM pipeline analogous to the increasingly mature PT–SFT–alignment workflow used in FinLFMs (Chen et al., 7 Jul 2025). It also emphasizes temporally aware curation and evaluation, stronger defenses against leakage and hallucination, synthetic data generation to alleviate scarcity provided the synthetic series are high quality, and federated learning as a privacy-preserving direction under regulatory constraints (Chen et al., 7 Jul 2025).

Two complementary design trajectories are already visible. One is stronger finance-native pretraining. Kronos and FinCast both embody the view that financial forecasting requires specialized tokenization, finance-only or finance-dominant corpora, and objectives that explicitly reflect financial uncertainty, volatility, and cross-market heterogeneity (Shi et al., 2 Aug 2025, Zhu et al., 27 Aug 2025). The other is modular enhancement of a foundation backbone rather than monolithic retraining. RefineBridge, for example, treats TSFM predictions as a generative prior and learns a Schrödinger Bridge refinement module; on three daily financial assets it is reported to achieve the best performance in 81 out of 90 experimental configurations, with MSE reductions ranging from 11% to 71% (Bolton et al., 25 Dec 2025). This suggests that post-hoc stochastic correction may be an alternative to low-rank or direct fine-tuning in heavy-tailed, noisy financial settings.

A related direction concerns deployment-time adaptation under drift. AdapTS does not update foundation-model weights; instead, it combines a lightweight online forecaster with an online weighter and, in the reported experiments, improves every foundation model on every dataset and horizon tested. In a comparison against online fine-tuning of TTM, TTM + AdapTS achieved MASE 1.673 with 0.38 seconds per update, whereas TTM-Finetune obtained MASE 1.746 with 911.52 seconds per update (Lee et al., 18 Feb 2025). Although those experiments are not finance-specific, they suggest a plausible architecture for drift-sensitive financial deployment: frozen foundation backbone, lightweight online correction, and adaptive routing between generalist and specialist components.

Finally, prompt-based adaptation remains a relevant avenue. In-context fine-tuning for TimesFM trains a model to use multiple related time-series examples in the context window at inference time and is reported to rival explicit fine-tuning on general benchmarks (Das et al., 2024). In finance, this suggests the possibility of context-driven adaptation using related assets, sectors, maturities, or venues rather than gradient updates. More broadly, current evidence indicates that the future of FinTSFMs is unlikely to be a single universal architecture. It is more plausibly a layered ecosystem of finance-native pretraining, benchmark and curation reform, selective adaptation, modular refinement, and deployment-aware routing built around the distinct statistical and institutional constraints of financial markets.