Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A decoder-only foundation model for time-series forecasting (2310.10688v4)

Published 14 Oct 2023 in cs.CL, cs.AI, and cs.LG

Abstract: Motivated by recent advances in LLMs for NLP, we design a time-series foundation model for forecasting whose out-of-the-box zero-shot performance on a variety of public datasets comes close to the accuracy of state-of-the-art supervised forecasting models for each individual dataset. Our model is based on pretraining a patched-decoder style attention model on a large time-series corpus, and can work well across different forecasting history lengths, prediction lengths and temporal granularities.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (28)
  1. On the benefits of maximum likelihood estimation for regression and forecasting. arXiv preprint arXiv:2106.10370, 2021.
  2. Conditional time series forecasting with convolutional neural networks. arXiv preprint arXiv:1703.04691, 2017.
  3. Some recent advances in forecasting and control. Journal of the Royal Statistical Society. Series C (Applied Statistics), 17(2):91–109, 1968.
  4. Llm4ts: Two-stage fine-tuning for time-series forecasting with pre-trained llms. arXiv preprint arXiv:2308.08469, 2023.
  5. Long-term forecasting with TiDE: Time-series dense encoder. Transactions on Machine Learning Research, 2023.
  6. Timegpt-1. arXiv preprint arXiv:2310.03589, 2023.
  7. Training compute-optimal large language models. arXiv preprint arXiv:2203.15556, 2022.
  8. Traffic4cast at neurips 2020 - yet more on the unreasonable effectiveness of gridded geo-spatial processes. In Hugo Jair Escalante and Katja Hofmann, editors, Proceedings of the NeurIPS 2020 Competition and Demonstration Track, volume 133 of Proceedings of Machine Learning Research, pages 325–343. PMLR, 06–12 Dec 2021.
  9. Generating wikipedia by summarizing long sequences. arXiv preprint arXiv:1801.10198, 2018.
  10. ED McKenzie. General exponential smoothing and the equivalent arma process. Journal of Forecasting, 3(3):333–344, 1984.
  11. A survey on time-series pre-trained models. arXiv preprint arXiv:2305.10716, 2023.
  12. M5 accuracy competition: Results, findings, and conclusions. International Journal of Forecasting, 38(4):1346–1364, 2022.
  13. A time series is worth 64 words: Long-term forecasting with transformers. International conference on learning representations, 2022.
  14. N-beats: Neural basis expansion analysis for interpretable time series forecasting. In International Conference on Learning Representations, 2019.
  15. Meta-learning framework with applications to zero-shot time-series forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 9242–9250, 2021.
  16. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
  17. Deepar: Probabilistic forecasting with autoregressive recurrent networks. International Journal of Forecasting, 36(3):1181–1191, 2020.
  18. Think globally, act locally: A deep neural network approach to high-dimensional time series forecasting. Advances in neural information processing systems, 32, 2019.
  19. Forecasting at scale. The American Statistician, 72(1):37–45, 2018.
  20. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  21. Optimal forecast reconciliation for hierarchical and grouped time series through trace minimization. Journal of the American Statistical Association, 114(526):804–819, 2019.
  22. Towards efficient and comprehensive urban spatial-temporal prediction: A unified library and performance benchmark. arXiv preprint arXiv:2304.14343, 2023.
  23. A multi-horizon quantile recurrent forecaster. arXiv preprint arXiv:1711.11053, 2017.
  24. Are transformers effective for time series forecasting? Proceedings of the AAAI conference on artificial intelligence, 2023.
  25. Fedformer: Frequency enhanced decomposed transformer for long-term series forecasting. In International Conference on Machine Learning, pages 27268–27286. PMLR, 2022.
  26. One fits all: Power general time series analysis by pretrained lm. arXiv preprint arXiv:2302.11939, 2023.
  27. Vector autoregressive models for multivariate time series. Modeling financial time series with S-PLUS®, pages 385–429, 2006.
  28. Informer: Beyond efficient transformer for long sequence time-series forecasting. In Proceedings of the AAAI conference on artificial intelligence, 2021.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Abhimanyu Das (21 papers)
  2. Weihao Kong (29 papers)
  3. Rajat Sen (29 papers)
  4. Yichen Zhou (21 papers)
Citations (121)

Summary

  • The paper introduces TimesFM, a decoder-only model that achieves effective zero-shot forecasting using a novel input patching and causal attention mechanism.
  • It employs a fixed-length patching strategy within an autoregressive framework, enabling the capture of complex temporal patterns across various datasets.
  • Empirical evaluations reveal that TimesFM competes with state-of-the-art supervised models, offering significant computational efficiency for forecasting tasks.

Overview of a Decoder-Only Foundation Model for Time-Series Forecasting

This paper introduces a time-series foundation model named TimesFM, designed to achieve effective zero-shot forecasting by leveraging a decoder-only attention architecture. The model’s architecture and training methodology draw inspiration from recent advancements in LLMs such as GPT-3.

Model Architecture and Design

TimesFM employs a decoder-style attention mechanism with input patching, which synthesizes advantages found in both sequence modeling and long-term forecasting. This architecture facilitates effective encoding of temporal patterns, even for datasets it has not previously encountered. The model utilizes both real-world and synthetic datasets for pretraining, comprising approximately 100 billion time points.

Key attributes of the model include:

  1. Patching Strategy: Input series are divided into fixed-length patches, allowing efficient processing and inference across varying temporal granularities.
  2. Decoder-Only Training: Utilizes causal attention for each sequence, enabling auto-regressive decoding without explicit encoder dependencies.
  3. Longer Output Patches: While functioning similarly to autoregressive models, TimesFM can predict extended sequences in one step due to its architecture, balancing the typical trade-off between output length and model efficiency.

Empirical Evaluation

TimesFM’s zero-shot capabilities were empirically validated against established public benchmarks including the Monash Archive, Darts datasets, and Informer datasets:

  • Monash Archive: TimesFM achieved competitive performance with state-of-the-art supervision models like DeepAR and FFNN, ranking among the top-3 models in terms of mean scaled MAE.
  • Darts Datasets: Exhibiting proficiency particularly in datasets with evident seasonal patterns, TimesFM closely rivaled top methods such as ARIMA, demonstrating its ability to effectively capture complex temporal dynamics.
  • Informer Datasets: Generally recognized for challenging long-horizon predictions, the model surpassed or equaled many supervised methods, including PatchTST and Autoformer.

Implications and Future Directions

The results provide strong evidence supporting the feasibility of designing a versatile time-series foundation model capable of practical zero-shot applications. The inclusion of diverse synthetic and real-world datasets in pretraining contributes significantly to the model's generalization capabilities. An implication of such a model is substantial reductions in computational costs typically associated with model training on new datasets, thereby offering potential gains in scenarios with limited data availability or computational resources.

Experimentations suggest promising avenues for future research:

  • Fine-tuning Capabilities: Exploring the model’s adaptability and performance improvements through fine-tuning on specific datasets or tasks.
  • Probabilistic Forecasting: Extending the architecture to incorporate uncertainty estimates, providing richer predictive distributions.
  • Scaling Studies: Systematically investigating the trade-offs involved in model scaling, considering data size and parameter efficiency, akin to scaling laws observed in LLMs.

In conclusion, TimesFM represents a significant endeavor towards building robust, scalable, and efficient time-series forecasting models, broadening the possibilities for zero-shot and few-shot learning in this domain.

Youtube Logo Streamline Icon: https://streamlinehq.com
Reddit Logo Streamline Icon: https://streamlinehq.com