Cisco Time Series Model (TSM)
- Cisco Time Series Model is a univariate zero-shot forecasting architecture that ingests paired coarse- and fine-resolution time series to capture both long-range trends and short-range variations.
- The model integrates a TimesFM decoder backbone with a unique multiresolution design, enabling accurate forecasting through simultaneous processing of 1-hour coarse and 1-minute fine contexts.
- Trained on over 300 billion data points, TSM achieves state-of-the-art performance in observability tasks and general benchmarks, demonstrating its practical impact in real-time monitoring and capacity planning.
The Cisco Time Series Model (TSM) is a univariate zero-shot forecasting architecture developed as an extension of the TimesFM decoder-only backbone, augmented to accept multiresolution input. Designed for large-scale forecasting tasks in observability and general time series domains, TSM introduces a general architectural innovation allowing the simultaneous ingestion of paired coarse- and fine-resolution contexts. The model is trained on over 300 billion unique data points, with more than half sourced from proprietary observability datasets, and achieves state-of-the-art zero-shot forecasting performance within the Cisco observability stack while maintaining competitive accuracy on general-purpose benchmarks such as GIFT-Eval (Gou et al., 25 Nov 2025).
1. Architectural Foundation
TSM builds upon the TimesFM decoder-only backbone. In the base TimesFM framework, each univariate time series is divided into non-overlapping "patches" of length , each embedded using a small residual network. These patch embeddings form the token sequence input to a deep decoder-only stack of transformer layers, concluding in un-embedding layers that produce a fixed-length forecast.
The core innovation of TSM is the incorporation of multiresolution context. Rather than relying on a single fine-resolution (e.g., 1-minute) context window, TSM accepts two parallel sequences:
- A coarse-resolution context (set at points at 1-hour granularity),
- A fine-resolution context ( points at 1-minute granularity).
The ratio relates the two resolutions; coarse points span the same interval as fine points. The model function is:
where is the fine-resolution forecast horizon.
Input Preprocessing and Embedding
Both contexts undergo normalization: the initial 32 points of and yield respective means and standard deviations . Each context is standardized:
Each normalized context is partitioned into patches, producing $32$ patches overall.
Patch embedding is accomplished via:
yielding sequence tokens .
Special Tokens and Resolution Embeddings
A learned Special Token (ST) delimits the coarse and fine resolutions in the final input sequence:
Two learned resolution embeddings are added to their respective regions. The 33-token sequence is input to a stack of 50 transformer decoder layers, mirroring the TimesFM processing pipeline.
Output Quantities
The output is a mean prediction together with quantile forecasts for , produced via a final un-embedding residual block.
Autoregressive Multiresolution Decoding
Forecasting proceeds autoregressively: predicted fine-resolution values are appended to . Coarse context is updated by aggregating consecutive blocks of fine predictions:
ensuring future decode-steps see paired coarse and fine contexts.
2. Training Regime and Data Pipeline
TSM is trained in the zero-shot forecasting regime. For each context pair and ground truth horizon , the model produces a median forecast and quantiles. The loss function is a weighted sum of mean squared error (MSE) and quantile regression losses:
Data Sources
Training spans 20 epochs over >300 billion unique data points, sourced as follows:
| Data Subset | Contribution (%) | Description |
|---|---|---|
| 1-min observability (Splunk) | 35 | 400M series, 13 months |
| 5-min observability | 16.5 | Observability metrics at coarser granularity |
| GIFT-Eval (public) | 29.5 | 4.5M series, 230B points |
| Chronos datasets | 4.5 | 0.9M series, 85B points |
| Synthetic (KernelSynth) | 14.5 | Artificially generated series |
Windows of length (512, 512) for fine and coarse streams are extracted using a sliding window, filtered for missingness, flat spots, spectral entropy, and abrupt steps. SimHash-based and distance-based statistical deduplication ensure diversity and avoid domination by repetitive series.
3. Multiresolution Design and Long-Context Forecasting
Traditional single-resolution models with context length directly observe only fine steps. At 1-minute resolution, forecasting over hours requires tokens. TSM's paired context covers coarse points (1 hour each) and fine points (1 minute), so with only $1025$ tokens ($512 + 1 + 512$), the model observes $512$ hours of coarse history and $512$ minutes of fine history simultaneously.
This structure allows the model to directly attend across all tokens—coarse-to-fine and within each resolution—enabling fusion of long-range low-frequency (trend, seasonality) and short-range high-frequency (intraday, noise) patterns. Empirically, this yields enhanced long-range forecasting performance without sacrificing local accuracy.
4. Empirical Evaluation
TSM was assessed on both observability and general time series benchmarks:
Observability Benchmarks
On out-of-domain, in-the-future splits at 1-minute granularity (context: 512 fine + 512 coarse, horizon: 128), metrics normalized by a last-value baseline (Naive) demonstrate consistent improvement:
| Metric | Cisco TSM | TimesFM-2.5 (512) | Chronos-2 (512) | Toto-1.0 (512) | AutoARIMA (512) |
|---|---|---|---|---|---|
| MSE | 0.8524 | 0.8838 | 0.8816 | 0.8836 | 4.0520 |
| MAE | 0.4788 | 0.6265 | 0.6023 | 0.6055 | 0.8545 |
| MASE | 0.4569 | 0.7290 | 0.7056 | 0.6834 | 0.9381 |
| sMAPE | 0.7758 | 0.8297 | 0.7811 | 0.7741† | 1.3316 |
| MSIS | 0.1207 | 0.1732 | 0.1773 | 0.2032 | 0.2562 |
| CRPS | 0.4126 | 0.5089 | 0.4878 | 0.4932 | 0.7444 |
(† best among single-resolution baselines.)
When single-resolution models are given 1024 fine-resolution context, TSM leads or matches on these metrics. Similar performance is observed for 5-minute resolution series.
General Forecasting Benchmark: GIFT-Eval
On non-leaking GIFT-Eval, for windows longer than 512 points (normalized by SeasonalNaive), TSM closely matches or slightly lags TimesFM-2.5 on global averages (MAE 0.6980 vs. 0.6635) but achieves higher performance on long-context subsets, indicating that multiresolution adaptation does not impair general-purpose forecasting.
Qualitative Analyses
- For series exhibiting strong diurnal or weekly seasonality, coarse (1-hour) context captures patterns unreachable by 512-minute windows.
- In series with noise or regime shifts, extended historical context helps filter transient spikes and identify underlying trends.
5. Limitations and Prospective Extensions
TSM currently fixes two input resolutions and employs a single special token to demarcate the coarse/fine boundary. More flexible formulations—such as variable-length contexts, more than two resolutions, or dynamic token placement—may yield further gains. The architecture is restricted to univariate modeling; multivariate temporal dependencies remain unmodeled. For extremely abrupt or chaotic time series, even long context is insufficient for effective prediction.
Continued pre-training (CPT) of TimesFM on mixed data with multiresolution tokens accelerates convergence and improves observability performance, without adverse impact on general benchmarks. Ablation studies indicate that concatenating contexts absent the resolution embeddings and special token achieves reasonable but slower learning.
6. Applications Within Cisco Observability
Within Cisco's observability stack, TSM is deployed for several forecasting workflows:
- Real-time inference for infrastructure and application metrics at high (minute) and low (hour) granularities, supporting anomaly detection.
- Capacity planning tasks, leveraging coarse context to project growth and seasonality over multi-week horizons.
- As a zero-shot forecasting engine integrated into Splunk’s Observability Cloud, TSM forecasts novel series without per-series retraining.
Summary: The Cisco Time Series Model extends the TimesFM decoder-only backbone through a lightweight multiresolution scheme—comprising a special token, dedicated resolution embeddings, and multiresolution autoregressive updates. Trained on 300 billion points (approx. half from proprietary observability data), TSM provides a scalable, zero-shot forecasting backbone for observability scenarios while maintaining strong results on diverse, publicly available forecasting benchmarks (Gou et al., 25 Nov 2025).