Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Timer: Generative Pre-trained Transformers Are Large Time Series Models (2402.02368v3)

Published 4 Feb 2024 in cs.LG and stat.ML

Abstract: Deep learning has contributed remarkably to the advancement of time series analysis. Still, deep models can encounter performance bottlenecks in real-world data-scarce scenarios, which can be concealed due to the performance saturation with small models on current benchmarks. Meanwhile, large models have demonstrated great powers in these scenarios through large-scale pre-training. Continuous progress has been achieved with the emergence of LLMs, exhibiting unprecedented abilities such as few-shot generalization, scalability, and task generality, which are however absent in small deep models. To change the status quo of training scenario-specific small models from scratch, this paper aims at the early development of large time series models (LTSM). During pre-training, we curate large-scale datasets with up to 1 billion time points, unify heterogeneous time series into single-series sequence (S3) format, and develop the GPT-style architecture toward LTSMs. To meet diverse application needs, we convert forecasting, imputation, and anomaly detection of time series into a unified generative task. The outcome of this study is a Time Series Transformer (Timer), which is generative pre-trained by next token prediction and adapted to various downstream tasks with promising capabilities as an LTSM. Code and datasets are available at: https://github.com/thuml/Large-Time-Series-Model.

Overview of "Timer: Transformers for Time Series Analysis at Scale"

The paper "Timer: Transformers for Time Series Analysis at Scale" introduces a novel approach for enhancing the performance of time series models through the development of large time series models (LTSMs). In leveraging the architectural frameworks of LLMs, which have exhibited unprecedented generalization and scalability across various tasks, the authors aim to address the current limitations in time series analysis, particularly in data-scarce environments.

Key Contributions

The central contribution of this research is the introduction of Timer, a Time Series Transformer developed using a GPT-style architecture, optimized through pre-training on extensive, multi-domain datasets comprising up to 1 billion time points. The proposed model adopts several innovative methodologies:

  1. Unified Data Representation: The authors propose a unified single-series sequence (S3) format that homogenizes heterogeneous time series data into consistent token sequences. This representation supports the amalgamation of diverse time series types, facilitating large-scale pre-training.
  2. Generative Task Framework: They convert typical time series analysis tasks such as forecasting, imputation, and anomaly detection into a unified generative task. This conversion leverages a decoder-only Transformer architecture, employing an autoregressive next token prediction objective.
  3. Scalability and Generality: The Timer model is pre-trained across extensive datasets divided into hierarchical capacities, allowing for detailed investigations into model scalability. The model effectively demonstrates notable capability in few-shot scenarios, exhibiting improvements over models trained from scratch with significantly less training data.

Experimental Results

The experimental evaluation underscores Timer's competitive performance across several tasks:

  • Time Series Forecasting: Timer achieves superior results, particularly in data-limited scenarios, outperforming existing state-of-the-art methods (iTransformer and PatchTST) on several datasets by requiring as little as 1% of the training data used by other models to reach comparable accuracy levels.
  • Imputation and Anomaly Detection: Timer's capabilities extend beyond basic forecasting. The model shows a substantial decrease in imputation errors, with noted improvements in segment-level imputation challenges. Moreover, when applied to anomaly detection in the UCR Anomaly Archive, Timer successfully identifies anomalies at higher precision compared to baseline methods.

Architectural Insights

The paper also examines the foundational architecture choices for large scale time series models. It highlights the superior performance and generalization capacity of the decoder-only Transformer, akin to architectures used in LLMs, over encoder-only models conventional to time series forecasting. This is attributed to the autoregressive training objective, which aligns with the natural sequential dependencies present in time series data.

Implications and Future Directions

The implications of this paper are substantial for developing adaptable and efficient models within the time series domain. By emulating the training paradigms successful in LLMs, Timer can potentially serve a broad range of applications, from weather prediction to industrial process monitoring. The research prompts a reevaluation of existing practices in time series model development, particularly in the context of scalability and transferability.

Future research directions include exploring zero-shot generalization capabilities and the advancement of domain-specific pre-trained models, further enhancing the model's adaptability and reducing the dependence on large annotated datasets. Additionally, there lies a direction to investigate the synergy between increased model capacity and dataset size, elaborating on the scaling laws applicable to time series models as seen in LLM development.

In summary, "Timer: Transformers for Time Series Analysis at Scale" advances the discourse in time series analysis by presenting a scalable, generative model that aligns with the autoregressive strengths demonstrated in LLMs, paving the way for more robust and adaptable analytical tools in the field.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (55)
  1. An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv preprint arXiv:1803.01271, 2018.
  2. Beit: Bert pre-training of image transformers. arXiv preprint arXiv:2106.08254, 2021.
  3. A neural probabilistic language model. Advances in neural information processing systems, 13, 2000.
  4. On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258, 2021.
  5. Box, G. Box and jenkins: time series analysis, forecasting and control. In A Very British Affair: Six Britons and the Development of Time Series Analysis During the 20th Century, pp.  161–215. Springer, 2013.
  6. Time series analysis: forecasting and control. John Wiley & Sons, 2015.
  7. Lof: identifying density-based local outliers. In Proceedings of the 2000 ACM SIGMOD international conference on Management of data, pp.  93–104, 2000.
  8. Llm4ts: Two-stage fine-tuning for time-series forecasting with pre-trained llms. arXiv preprint arXiv:2308.08469, 2023.
  9. A simple framework for contrastive learning of visual representations. In International conference on machine learning, pp.  1597–1607. PMLR, 2020.
  10. Why can gpt learn in-context? language models secretly perform gradient descent as meta optimizers. arXiv preprint arXiv:2212.10559, 2022.
  11. Long-term forecasting with tide: Time-series dense encoder. arXiv preprint arXiv:2304.08424, 2023a.
  12. A decoder-only foundation model for time-series forecasting. arXiv preprint arXiv:2310.10688, 2023b.
  13. The ucr time series archive. IEEE/CAA Journal of Automatica Sinica, 6(6):1293–1305, 2019.
  14. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
  15. Simmtm: A simple pre-training framework for masked time-series modeling. arXiv preprint arXiv:2302.00861, 2023.
  16. Forecastpfn: Synthetically-trained zero-shot forecasting. arXiv preprint arXiv:2311.01933, 2023.
  17. Efficient tests for an autoregressive unit root. Econometrica, 1996.
  18. Friedman, M. The interpolation of time series by related series. Journal of the American Statistical Association, 57(300):729–757, 1962.
  19. Monash time series forecasting archive. arXiv preprint arXiv:2105.06643, 2021.
  20. Goerg, G. Forecastable component analysis. In International conference on machine learning, pp.  64–72. PMLR, 2013.
  21. Long short-term memory. Neural computation, 9(8):1735–1780, 1997.
  22. Time-llm: Time series forecasting by reprogramming large language models. arXiv preprint arXiv:2310.01728, 2023.
  23. Scaling laws for neural language models. arXiv preprint arXiv:2001.08361, 2020.
  24. Adam: A method for stochastic optimization. In ICLR, 2015. URL http://arxiv.org/abs/1412.6980.
  25. Scinet: time series modeling and forecasting with sample convolution and interaction. NeurIPS, 2022.
  26. itransformer: Inverted transformers are effective for time series forecasting. arXiv preprint arXiv:2310.06625, 2023.
  27. Era5-land: A state-of-the-art global reanalysis dataset for land applications. Earth system science data, 13(9):4349–4383, 2021.
  28. A time series is worth 64 words: Long-term forecasting with transformers. arXiv preprint arXiv:2211.14730, 2022.
  29. OpenAI, R. Gpt-4 technical report. arxiv 2303.08774. View in Article, 2:13, 2023.
  30. Pytorch: An imperative style, high-performance deep learning library. In NeurIPS, 2019.
  31. Improving language understanding by generative pre-training. 2018.
  32. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
  33. Learning transferable visual models from natural language supervision. In International conference on machine learning, pp.  8748–8763. PMLR, 2021.
  34. Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 21(1):5485–5551, 2020.
  35. Lag-llama: Towards foundation models for time series forecasting. arXiv preprint arXiv:2310.08278, 2023.
  36. Time series extrinsic regression: Predicting numeric values from time series data. Data Mining and Knowledge Discovery, 35:1032–1060, 2021.
  37. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023.
  38. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  39. What language model architecture and pretraining objective works best for zero-shot generalization? In International Conference on Machine Learning, pp.  22964–22984. PMLR, 2022a.
  40. Contrast everything: A hierarchical contrastive framework for medical time-series. arXiv preprint arXiv:2310.14017, 2023.
  41. Learning latent seasonal-trend representations for time series forecasting. Advances in Neural Information Processing Systems, 35:38775–38787, 2022b.
  42. Cost: Contrastive learning of disentangled seasonal-trend representations for time series forecasting. arXiv preprint arXiv:2202.01575, 2022.
  43. Pushing the limits of pre-training for time series forecasting in the cloudops domain. arXiv preprint arXiv:2310.05063, 2023.
  44. Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting. Advances in Neural Information Processing Systems, 34:22419–22430, 2021.
  45. Timesnet: Temporal 2d-variation modeling for general time series analysis. arXiv preprint arXiv:2210.02186, 2022.
  46. Current time series anomaly detection benchmarks are flawed and are creating the illusion of progress. IEEE Transactions on Knowledge and Data Engineering, 2021.
  47. Anomaly transformer: Time series anomaly detection with association discrepancy. arXiv preprint arXiv:2110.02642, 2021.
  48. Videogpt: Video generation using vq-vae and transformers. arXiv preprint arXiv:2104.10157, 2021.
  49. Ts2vec: Towards universal representation of time series. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pp.  8980–8987, 2022.
  50. Are transformers effective for time series forecasting? In Proceedings of the AAAI conference on artificial intelligence, volume 37, pp.  11121–11128, 2023.
  51. A transformer-based framework for multivariate time series representation learning. In Proceedings of the 27th ACM SIGKDD conference on knowledge discovery & data mining, pp.  2114–2124, 2021.
  52. Self-supervised contrastive pre-training for time series via time-frequency consistency. Advances in Neural Information Processing Systems, 35:3988–4003, 2022.
  53. A survey of large language models. arXiv preprint arXiv:2303.18223, 2023.
  54. Informer: Beyond efficient transformer for long sequence time-series forecasting. In Proceedings of the AAAI conference on artificial intelligence, volume 35, pp.  11106–11115, 2021.
  55. One fits all: Power general time series analysis by pretrained lm. arXiv preprint arXiv:2302.11939, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Yong Liu (721 papers)
  2. Haoran Zhang (102 papers)
  3. Chenyu Li (27 papers)
  4. Xiangdong Huang (6 papers)
  5. Jianmin Wang (119 papers)
  6. Mingsheng Long (110 papers)
Citations (27)
X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

Youtube Logo Streamline Icon: https://streamlinehq.com