Papers
Topics
Authors
Recent
Search
2000 character limit reached

SeqFusion: Zero-Shot Time-Series Forecasting

Updated 7 March 2026
  • SeqFusion is a zero-shot time-series forecasting framework that fuses predictions from a curated set of specialized pre-trained models without needing task-specific training.
  • The architecture uses a shared embedding space to match target series with PTMs, performing sequential prediction with similarity-weighted aggregation.
  • Empirical results demonstrate competitive MSE scores across benchmarks while ensuring privacy, resource efficiency, and modular adaptability.

SeqFusion is a framework for zero-shot time-series forecasting that bypasses the need for task-specific training data by sequentially fusing predictions from a curated zoo of specialized pre-trained models (PTMs). Unlike conventional methods, which aggregate vast and diverse datasets for generalized pre-training—raising privacy and logistical concerns—SeqFusion shifts the focus to acquiring a collection of compact, specialized PTMs and adaptively combining them based on the temporal characteristics of each target time series (Huang et al., 4 Mar 2025).

1. Zero-Shot Time-Series Forecasting: Problem Formulation and Motivation

Zero-shot time-series forecasting seeks to predict future values for a target time series without using any additional in-task training data. Formally, given a multivariate input X=[x1,…,xC]∈RT×C\mathbf{X} = [\mathbf{x}_1, \dots, \mathbf{x}_C] \in \mathbb{R}^{T \times C} with TT historical observations of CC variates, the objective is to forecast Y=[y1,…,yC]∈RH×C\mathbf{Y} = [\mathbf{y}_1, \dots, \mathbf{y}_C] \in \mathbb{R}^{H \times C} over horizon HH using only pre-existing models or representations.

Traditional forecasting methods—statistical or deep learning—require substantial in-task data for effective generalization. Recent zero-shot approaches mitigate this via generalized PTMs, but performance is contingent on the diversity of pre-training data and often incurs privacy and storage trade-offs. SeqFusion introduces a paradigm shift by collecting a zoo of lightweight PTMs, each usually trained on a distinct dataset or domain, and assembling predictions by strategically fusing PTMs most relevant to the target temporal dynamics.

2. Architectural Components and Inference Workflow

SeqFusion's pipeline consists of four distinct stages:

  1. Model Zoo Construction: Accumulate MM one-variate PTMs, M={ϕm:RT×1→Rh×1}m=1M\mathcal{M} = \{\phi_m: \mathbb{R}^{T \times 1} \to \mathbb{R}^{h \times 1}\}_{m=1}^M, each pre-trained on a different dataset X(m)\mathbf{X}_{(m)}.
  2. Representation Extraction and Matching: Both the target time series variates and PTMs are mapped into a shared embedding space using a general extractor ψ\psi. The similarity between embeddings (usually cosine similarity) determines the affinity between a target variate and each PTM.
  3. Sequential Prediction: For every target variate cc, select the top-TT0 PTMs with embeddings closest to the variate's embedding. Recursive multi-block prediction is performed by each selected PTM, chunking the prediction horizon TT1 into TT2 steps.
  4. Fusion and Post-Processing: The TT3 outputs from selected PTMs are fused using a similarity-weighted average for each forecasted time point. Finally, normalization procedures ensure data is returned in the original scale.
Stage Description Key Technical Detail
Zoo Collect diverse, one-variate PTMs Each PTM trained on unique dataset
Matching Embed series and PTMs in shared space Use general extractor TT4 with transferability loss
Prediction Select top-TT5 PTMs per variate, predict sequentially Recursive horizon division per PTM
Fusion Weighted ensemble of top-TT6 PTM outputs Similarity/temperature-based weighting, normalization

3. Embedding Learning and PTM Transferability

The general extractor TT7 is an encoder-decoder architecture trained on a dataset TT8 distinct from PTM pre-training sources. Its objective combines:

  • Reconstruction loss: TT9
  • Series-wise similarity loss: Pulls masked/unmasked augmentations of each series together in embedding space.
  • Transferability loss:

CC0

with CC1, measuring how well PTM CC2 generalizes to the data of CC3.

Embeddings are computed as follows:

  • For each target variate: CC4
  • For each PTM CC5: CC6, with CC7 a small batch from the PTM's original dataset.

Distance or similarity is quantified via cosine similarity, though CC8 metrics are also considered, enabling the selection of PTMs whose representations maximally align with those of the target variates.

4. Sequential Prediction and Multi-PTM Aggregation

For each variate CC9, top-Y=[y1,…,yC]∈RH×C\mathbf{Y} = [\mathbf{y}_1, \dots, \mathbf{y}_C] \in \mathbb{R}^{H \times C}0 PTMs are selected by maximizing similarity:

Y=[y1,…,yC]∈RH×C\mathbf{Y} = [\mathbf{y}_1, \dots, \mathbf{y}_C] \in \mathbb{R}^{H \times C}1

Recursive forecasting proceeds as follows: at each step, each selected PTM produces a block of predictions, the most recent Y=[y1,…,yC]∈RH×C\mathbf{Y} = [\mathbf{y}_1, \dots, \mathbf{y}_C] \in \mathbb{R}^{H \times C}2 points are re-assembled, and the next block is forecasted until the full horizon is covered.

Fusion of the Y=[y1,…,yC]∈RH×C\mathbf{Y} = [\mathbf{y}_1, \dots, \mathbf{y}_C] \in \mathbb{R}^{H \times C}3 forecasted series is accomplished via a similarity-weighted average:

Y=[y1,…,yC]∈RH×C\mathbf{Y} = [\mathbf{y}_1, \dots, \mathbf{y}_C] \in \mathbb{R}^{H \times C}4

where

Y=[y1,…,yC]∈RH×C\mathbf{Y} = [\mathbf{y}_1, \dots, \mathbf{y}_C] \in \mathbb{R}^{H \times C}5

and Y=[y1,…,yC]∈RH×C\mathbf{Y} = [\mathbf{y}_1, \dots, \mathbf{y}_C] \in \mathbb{R}^{H \times C}6 is a temperature hyperparameter controlling weight sharpness.

5. Empirical Evaluation and Performance Analysis

SeqFusion is evaluated on a suite of benchmark datasets with both multivariate and univariate settings. For multivariate scenarios, datasets include ETTh1/2, Exchange-Rate, Electricity (ECL), Traffic, Weather, and ILI, with typical look-back window 36 and horizons ranging from 6 to 48.

  • Baselines: Naïve (Last, Mean, SeasonalNaive), ARIMA, Prophet, deep networks (e.g., Transformer, PatchTST, iTransformer) trained on as few as 50 in-task points, and zero-shot methods such as Meta-N-BEATS, ForecastPFN, and GPT4TS.
  • PTM zoo: Comprised of 10 PatchTST models trained on diverse one-variate subsets (M3, M4, Tourism), using an extractor trained on Y=[y1,…,yC]∈RH×C\mathbf{Y} = [\mathbf{y}_1, \dots, \mathbf{y}_C] \in \mathbb{R}^{H \times C}7300k subseries with transferability loss and Y=[y1,…,yC]∈RH×C\mathbf{Y} = [\mathbf{y}_1, \dots, \mathbf{y}_C] \in \mathbb{R}^{H \times C}8 for aggregation.

Key results (MSE):

Dataset SeqFusion MSE State-of-the-art comparison
ECL 0.603 Best or 2nd best
ETTh1 0.600 Matches/inferior by small margin
ETTh2 0.245 Matches/inferior by small margin
Exchange 0.0217 Matches/inferior by small margin
ILI 3.496 Matches/inferior by small margin
Traffic 1.489 Matches/inferior by small margin
Weather 1.449 Matches/inferior by small margin

In univariate zero-shot benchmarks (M3, M4, Tourism), SeqFusion with 15 DLinear PTMs achieves SMAPE Y=[y1,…,yC]∈RH×C\mathbf{Y} = [\mathbf{y}_1, \dots, \mathbf{y}_C] \in \mathbb{R}^{H \times C}9 11–13%, ranking consistently second behind GPT4TS, while requiring only HH00.05 MB per PTM.

A large-scale experiment employing PTMs from Chronos, Moirai, and TimesFM demonstrates competitive accuracy (ECL 0.5263, Weather 1.3323 MSE) with total zoo storage under 1.4 GB, compared to hundreds of GB for monolithic generalist models.

6. Ablations, Architectural Insights, and Practical Significance

  • Generalist vs. Specialist PTMs: A single "general" PatchTST model trained on all zoo data achieves MSE HH10.83 on ECL. However, using domain-specialized PTMs and selective fusion yields 0.603, indicating the value of model specialization and strategic selection.
  • Zoo Composition: Adding more PTMs (10→20, including Hospital-trained models) marginally benefits domains like Illness. However, mixing architectural types (PatchTST+DLinear) can degrade performance in some domains, particularly Weather. This suggests that domain diversity is more critical than architectural diversity for optimal aggregation.
  • Transferability Loss: Omitting the transferability loss HH2 from the representation extractor reduces accuracy on multiple benchmarks, confirming its utility for meaningful similarity assessment.
  • Embedding Schemes: The use of SimMTM-based embeddings with transferability loss marginally outperforms TS2Vec-based alternatives.
  • Aggregation Hyperparameter (HH3): Increasing HH4 from 1 to 5 provides a monotonic reduction in average MSE, corroborating the ensemble effect.

7. Privacy, Resource Efficiency, and Methodological Impact

SeqFusion is inherently privacy-preserving, as it requires only sample embeddings or small batches of non-sensitive data for PTM representation calculation; full datasets never need to be exchanged. Empirically, strong zero-shot forecasting can be realized with approximately 23 MB of total model storage, a fraction of the storage footprint for large, generalist zero-shot models. This suggests practical advantages in settings with data governance constraints or resource limitations.

By consolidating select predictions from multiple PTMs judiciously matched to the temporal characteristics of each target, SeqFusion demonstrates that distributed, specialized model aggregation in embedding space is an effective alternative to monolithic model pre-training for zero-shot time-series forecasting across domains (Huang et al., 4 Mar 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to SeqFusion.