SeqFusion: Zero-Shot Time-Series Forecasting
- SeqFusion is a zero-shot time-series forecasting framework that fuses predictions from a curated set of specialized pre-trained models without needing task-specific training.
- The architecture uses a shared embedding space to match target series with PTMs, performing sequential prediction with similarity-weighted aggregation.
- Empirical results demonstrate competitive MSE scores across benchmarks while ensuring privacy, resource efficiency, and modular adaptability.
SeqFusion is a framework for zero-shot time-series forecasting that bypasses the need for task-specific training data by sequentially fusing predictions from a curated zoo of specialized pre-trained models (PTMs). Unlike conventional methods, which aggregate vast and diverse datasets for generalized pre-training—raising privacy and logistical concerns—SeqFusion shifts the focus to acquiring a collection of compact, specialized PTMs and adaptively combining them based on the temporal characteristics of each target time series (Huang et al., 4 Mar 2025).
1. Zero-Shot Time-Series Forecasting: Problem Formulation and Motivation
Zero-shot time-series forecasting seeks to predict future values for a target time series without using any additional in-task training data. Formally, given a multivariate input with historical observations of variates, the objective is to forecast over horizon using only pre-existing models or representations.
Traditional forecasting methods—statistical or deep learning—require substantial in-task data for effective generalization. Recent zero-shot approaches mitigate this via generalized PTMs, but performance is contingent on the diversity of pre-training data and often incurs privacy and storage trade-offs. SeqFusion introduces a paradigm shift by collecting a zoo of lightweight PTMs, each usually trained on a distinct dataset or domain, and assembling predictions by strategically fusing PTMs most relevant to the target temporal dynamics.
2. Architectural Components and Inference Workflow
SeqFusion's pipeline consists of four distinct stages:
- Model Zoo Construction: Accumulate one-variate PTMs, , each pre-trained on a different dataset .
- Representation Extraction and Matching: Both the target time series variates and PTMs are mapped into a shared embedding space using a general extractor . The similarity between embeddings (usually cosine similarity) determines the affinity between a target variate and each PTM.
- Sequential Prediction: For every target variate , select the top-0 PTMs with embeddings closest to the variate's embedding. Recursive multi-block prediction is performed by each selected PTM, chunking the prediction horizon 1 into 2 steps.
- Fusion and Post-Processing: The 3 outputs from selected PTMs are fused using a similarity-weighted average for each forecasted time point. Finally, normalization procedures ensure data is returned in the original scale.
| Stage | Description | Key Technical Detail |
|---|---|---|
| Zoo | Collect diverse, one-variate PTMs | Each PTM trained on unique dataset |
| Matching | Embed series and PTMs in shared space | Use general extractor 4 with transferability loss |
| Prediction | Select top-5 PTMs per variate, predict sequentially | Recursive horizon division per PTM |
| Fusion | Weighted ensemble of top-6 PTM outputs | Similarity/temperature-based weighting, normalization |
3. Embedding Learning and PTM Transferability
The general extractor 7 is an encoder-decoder architecture trained on a dataset 8 distinct from PTM pre-training sources. Its objective combines:
- Reconstruction loss: 9
- Series-wise similarity loss: Pulls masked/unmasked augmentations of each series together in embedding space.
- Transferability loss:
0
with 1, measuring how well PTM 2 generalizes to the data of 3.
Embeddings are computed as follows:
- For each target variate: 4
- For each PTM 5: 6, with 7 a small batch from the PTM's original dataset.
Distance or similarity is quantified via cosine similarity, though 8 metrics are also considered, enabling the selection of PTMs whose representations maximally align with those of the target variates.
4. Sequential Prediction and Multi-PTM Aggregation
For each variate 9, top-0 PTMs are selected by maximizing similarity:
1
Recursive forecasting proceeds as follows: at each step, each selected PTM produces a block of predictions, the most recent 2 points are re-assembled, and the next block is forecasted until the full horizon is covered.
Fusion of the 3 forecasted series is accomplished via a similarity-weighted average:
4
where
5
and 6 is a temperature hyperparameter controlling weight sharpness.
5. Empirical Evaluation and Performance Analysis
SeqFusion is evaluated on a suite of benchmark datasets with both multivariate and univariate settings. For multivariate scenarios, datasets include ETTh1/2, Exchange-Rate, Electricity (ECL), Traffic, Weather, and ILI, with typical look-back window 36 and horizons ranging from 6 to 48.
- Baselines: Naïve (Last, Mean, SeasonalNaive), ARIMA, Prophet, deep networks (e.g., Transformer, PatchTST, iTransformer) trained on as few as 50 in-task points, and zero-shot methods such as Meta-N-BEATS, ForecastPFN, and GPT4TS.
- PTM zoo: Comprised of 10 PatchTST models trained on diverse one-variate subsets (M3, M4, Tourism), using an extractor trained on 7300k subseries with transferability loss and 8 for aggregation.
Key results (MSE):
| Dataset | SeqFusion MSE | State-of-the-art comparison |
|---|---|---|
| ECL | 0.603 | Best or 2nd best |
| ETTh1 | 0.600 | Matches/inferior by small margin |
| ETTh2 | 0.245 | Matches/inferior by small margin |
| Exchange | 0.0217 | Matches/inferior by small margin |
| ILI | 3.496 | Matches/inferior by small margin |
| Traffic | 1.489 | Matches/inferior by small margin |
| Weather | 1.449 | Matches/inferior by small margin |
In univariate zero-shot benchmarks (M3, M4, Tourism), SeqFusion with 15 DLinear PTMs achieves SMAPE 9 11–13%, ranking consistently second behind GPT4TS, while requiring only 00.05 MB per PTM.
A large-scale experiment employing PTMs from Chronos, Moirai, and TimesFM demonstrates competitive accuracy (ECL 0.5263, Weather 1.3323 MSE) with total zoo storage under 1.4 GB, compared to hundreds of GB for monolithic generalist models.
6. Ablations, Architectural Insights, and Practical Significance
- Generalist vs. Specialist PTMs: A single "general" PatchTST model trained on all zoo data achieves MSE 10.83 on ECL. However, using domain-specialized PTMs and selective fusion yields 0.603, indicating the value of model specialization and strategic selection.
- Zoo Composition: Adding more PTMs (10→20, including Hospital-trained models) marginally benefits domains like Illness. However, mixing architectural types (PatchTST+DLinear) can degrade performance in some domains, particularly Weather. This suggests that domain diversity is more critical than architectural diversity for optimal aggregation.
- Transferability Loss: Omitting the transferability loss 2 from the representation extractor reduces accuracy on multiple benchmarks, confirming its utility for meaningful similarity assessment.
- Embedding Schemes: The use of SimMTM-based embeddings with transferability loss marginally outperforms TS2Vec-based alternatives.
- Aggregation Hyperparameter (3): Increasing 4 from 1 to 5 provides a monotonic reduction in average MSE, corroborating the ensemble effect.
7. Privacy, Resource Efficiency, and Methodological Impact
SeqFusion is inherently privacy-preserving, as it requires only sample embeddings or small batches of non-sensitive data for PTM representation calculation; full datasets never need to be exchanged. Empirically, strong zero-shot forecasting can be realized with approximately 23 MB of total model storage, a fraction of the storage footprint for large, generalist zero-shot models. This suggests practical advantages in settings with data governance constraints or resource limitations.
By consolidating select predictions from multiple PTMs judiciously matched to the temporal characteristics of each target, SeqFusion demonstrates that distributed, specialized model aggregation in embedding space is an effective alternative to monolithic model pre-training for zero-shot time-series forecasting across domains (Huang et al., 4 Mar 2025).