A decoder-only foundation model for time-series forecasting (2310.10688v4)

Published 14 Oct 2023 in cs.CL, cs.AI, and cs.LG

Abstract: Motivated by recent advances in LLMs for NLP, we design a time-series foundation model for forecasting whose out-of-the-box zero-shot performance on a variety of public datasets comes close to the accuracy of state-of-the-art supervised forecasting models for each individual dataset. Our model is based on pretraining a patched-decoder style attention model on a large time-series corpus, and can work well across different forecasting history lengths, prediction lengths and temporal granularities.

References (28)

Authors (4)

Abhimanyu Das (21 papers)
Weihao Kong (29 papers)
Rajat Sen (29 papers)
Yichen Zhou (21 papers)

Citations (121)

View on Semantic Scholar

Summary

The paper introduces TimesFM, a decoder-only model that achieves effective zero-shot forecasting using a novel input patching and causal attention mechanism.
It employs a fixed-length patching strategy within an autoregressive framework, enabling the capture of complex temporal patterns across various datasets.
Empirical evaluations reveal that TimesFM competes with state-of-the-art supervised models, offering significant computational efficiency for forecasting tasks.

Overview of a Decoder-Only Foundation Model for Time-Series Forecasting

This paper introduces a time-series foundation model named TimesFM, designed to achieve effective zero-shot forecasting by leveraging a decoder-only attention architecture. The model’s architecture and training methodology draw inspiration from recent advancements in LLMs such as GPT-3.

Model Architecture and Design

TimesFM employs a decoder-style attention mechanism with input patching, which synthesizes advantages found in both sequence modeling and long-term forecasting. This architecture facilitates effective encoding of temporal patterns, even for datasets it has not previously encountered. The model utilizes both real-world and synthetic datasets for pretraining, comprising approximately 100 billion time points.

Key attributes of the model include:

Patching Strategy: Input series are divided into fixed-length patches, allowing efficient processing and inference across varying temporal granularities.
Decoder-Only Training: Utilizes causal attention for each sequence, enabling auto-regressive decoding without explicit encoder dependencies.
Longer Output Patches: While functioning similarly to autoregressive models, TimesFM can predict extended sequences in one step due to its architecture, balancing the typical trade-off between output length and model efficiency.

Empirical Evaluation

TimesFM’s zero-shot capabilities were empirically validated against established public benchmarks including the Monash Archive, Darts datasets, and Informer datasets:

Monash Archive: TimesFM achieved competitive performance with state-of-the-art supervision models like DeepAR and FFNN, ranking among the top-3 models in terms of mean scaled MAE.
Darts Datasets: Exhibiting proficiency particularly in datasets with evident seasonal patterns, TimesFM closely rivaled top methods such as ARIMA, demonstrating its ability to effectively capture complex temporal dynamics.
Informer Datasets: Generally recognized for challenging long-horizon predictions, the model surpassed or equaled many supervised methods, including PatchTST and Autoformer.

Implications and Future Directions

The results provide strong evidence supporting the feasibility of designing a versatile time-series foundation model capable of practical zero-shot applications. The inclusion of diverse synthetic and real-world datasets in pretraining contributes significantly to the model's generalization capabilities. An implication of such a model is substantial reductions in computational costs typically associated with model training on new datasets, thereby offering potential gains in scenarios with limited data availability or computational resources.

Experimentations suggest promising avenues for future research:

Fine-tuning Capabilities: Exploring the model’s adaptability and performance improvements through fine-tuning on specific datasets or tasks.
Probabilistic Forecasting: Extending the architecture to incorporate uncertainty estimates, providing richer predictive distributions.
Scaling Studies: Systematically investigating the trade-offs involved in model scaling, considering data size and parameter efficiency, akin to scaling laws observed in LLMs.

In conclusion, TimesFM represents a significant endeavor towards building robust, scalable, and efficient time-series forecasting models, broadening the possibilities for zero-shot and few-shot learning in this domain.

PDF Markdown

Related Papers

Tweets

https://twitter.com/osanseviero/status/1789002000975237397

https://twitter.com/theomitsa/status/1756667767166583241

https://twitter.com/KyeGomezB/status/1789068955925246035

https://twitter.com/ayushnoori/status/1759780067507306870

https://twitter.com/rsen91/status/1755043746046148933

https://twitter.com/trading_indian/status/1753551084130377778

YouTube

Show All Videos

Reddit

[R] A decoder-only foundation model for time-series forecasting (10 points, 1 comment)