Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
126 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Generative Pretrained Hierarchical Transformer for Time Series Forecasting (2402.16516v2)

Published 26 Feb 2024 in cs.LG

Abstract: Recent efforts have been dedicated to enhancing time series forecasting accuracy by introducing advanced network architectures and self-supervised pretraining strategies. Nevertheless, existing approaches still exhibit two critical drawbacks. Firstly, these methods often rely on a single dataset for training, limiting the model's generalizability due to the restricted scale of the training data. Secondly, the one-step generation schema is widely followed, which necessitates a customized forecasting head and overlooks the temporal dependencies in the output series, and also leads to increased training costs under different horizon length settings. To address these issues, we propose a novel generative pretrained hierarchical transformer architecture for forecasting, named \textbf{GPHT}. There are two aspects of key designs in GPHT. On the one hand, we advocate for constructing a mixed dataset under the channel-independent assumption for pretraining our model, comprising various datasets from diverse data scenarios. This approach significantly expands the scale of training data, allowing our model to uncover commonalities in time series data and facilitating improved transfer to specific datasets. On the other hand, GPHT employs an auto-regressive forecasting approach, effectively modeling temporal dependencies in the output series. Importantly, no customized forecasting head is required, enabling \textit{a single model to forecast at arbitrary horizon settings.} We conduct sufficient experiments on eight datasets with mainstream self-supervised pretraining models and supervised models. The results demonstrated that GPHT surpasses the baseline models across various fine-tuning and zero/few-shot learning settings in the traditional long-term forecasting task. We make our codes publicly available\footnote{https://github.com/icantnamemyself/GPHT}.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (49)
  1. Accurate medium-range global weather forecasting with 3D neural networks. Nature (2023), 1–6.
  2. George EP Box and Gwilym M Jenkins. 1968. Some recent advances in forecasting and control. Journal of the Royal Statistical Society. Series C (Applied Statistics) 17, 2 (1968), 91–109.
  3. Language models are few-shot learners. Advances in neural information processing systems 33 (2020), 1877–1901.
  4. N-HiTS: Neural Hierarchical Interpolation for Time Series Forecasting. arXiv:2201.12886 [cs.LG]
  5. Learning to rotate: Quaternion transformer for complicated periodical time series forecasting. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 146–156.
  6. FormerTime: Hierarchical Multi-Scale Representations for Multivariate Time Series Classification. In Proceedings of the ACM Web Conference 2023. 1437–1445.
  7. TimeMAE: Self-Supervised Representations of Time Series with Decoupled Masked Autoencoders. arXiv preprint arXiv:2303.00320 (2023).
  8. A decoder-only foundation model for time-series forecasting. arXiv preprint arXiv:2310.10688 (2023).
  9. St-norm: Spatial and temporal normalization for multi-variate time series forecasting. In Proceedings of the 27th ACM SIGKDD conference on knowledge discovery & data mining. 269–278.
  10. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).
  11. SimMTM: A Simple Pre-Training Framework for Masked Time-Series Modeling. arXiv preprint arXiv:2302.00861 (2023).
  12. ForecastPFN: Synthetically-Trained Zero-Shot Forecasting. In Thirty-seventh Conference on Neural Information Processing Systems.
  13. Time-series representation learning via temporal and contextual contrasting. arXiv preprint arXiv:2106.14112 (2021).
  14. Azul Garza and Max Mergenthaler-Canseco. 2023. TimeGPT-1. arXiv preprint arXiv:2310.03589 (2023).
  15. Monash time series forecasting archive. arXiv preprint arXiv:2105.06643 (2021).
  16. Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33 (2020), 21271–21284.
  17. Large Language Models Are Zero-Shot Time Series Forecasters. In Thirty-seventh Conference on Neural Information Processing Systems.
  18. Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 16000–16009.
  19. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770–778.
  20. Spatio-temporal graph neural networks for predictive learning in urban computing: A survey. IEEE Transactions on Knowledge and Data Engineering (2023).
  21. Reversible instance normalization for accurate time-series forecasting against distribution shift. In International Conference on Learning Representations.
  22. SCINet: Time Series Modeling and Forecasting with Sample Convolution and Interaction. Thirty-sixth Conference on Neural Information Processing Systems (NeurIPS), 2022 (2022).
  23. itransformer: Inverted transformers are effective for time series forecasting. arXiv preprint arXiv:2310.06625 (2023).
  24. Adaptive Normalization for Non-stationary Time Series Forecasting: A Temporal Slice Perspective. In Thirty-seventh Conference on Neural Information Processing Systems.
  25. A composite multi-attention framework for intraoperative hypotension early warning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 37. 14374–14381.
  26. A Time Series is Worth 64 Words: Long-term Forecasting with Transformers. In The Eleventh International Conference on Learning Representations.
  27. N-BEATS: Neural basis expansion analysis for interpretable time series forecasting. In International Conference on Learning Representations.
  28. Deep adaptive input normalization for time series forecasting. IEEE transactions on neural networks and learning systems 31, 9 (2019), 3760–3765.
  29. Gábor Petneházi. 2019. Recurrent neural networks for time series forecasting. arXiv preprint arXiv:1901.00069 (2019).
  30. Forecasting: theory and practice. International Journal of Forecasting (2022).
  31. DeepAR: Probabilistic forecasting with autoregressive recurrent networks. International Journal of Forecasting 36, 3 (2020), 1181–1191.
  32. Scaleformer: Iterative Multi-scale Refining Transformers for Time Series Forecasting. In The Eleventh International Conference on Learning Representations.
  33. Pre-training enhanced spatial-temporal graph neural network for multivariate time series forecasting. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 1567–1577.
  34. Unsupervised Representation Learning for Time Series with Temporal Neighborhood Coding. In International Conference on Learning Representations.
  35. Attention is all you need. Advances in neural information processing systems 30 (2017).
  36. Transformers in time series: A survey. arXiv preprint arXiv:2202.07125 (2022).
  37. CoST: Contrastive Learning of Disentangled Seasonal-Trend Representations for Time Series Forecasting. In International Conference on Learning Representations.
  38. TimesNet: Temporal 2D-Variation Modeling for General Time Series Analysis. In The Eleventh International Conference on Learning Representations.
  39. Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting. Advances in Neural Information Processing Systems 34 (2021), 22419–22430.
  40. Ts2vec: Towards universal representation of time series. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 36. 8980–8987.
  41. Are Transformers Effective for Time Series Forecasting? Proceedings of the AAAI Conference on Artificial Intelligence.
  42. A transformer-based framework for multivariate time series representation learning. In Proceedings of the 27th ACM SIGKDD conference on knowledge discovery & data mining. 2114–2124.
  43. G Peter Zhang. 2003. Time series forecasting using a hybrid ARIMA and neural network model. Neurocomputing 50 (2003), 159–175.
  44. Self-Supervised Learning for Time Series Analysis: Taxonomy, Progress, and Prospects. arXiv preprint arXiv:2306.10125 (2023).
  45. Skilful nowcasting of extreme precipitation with NowcastNet. Nature 619, 7970 (2023), 526–532.
  46. Yunhao Zhang and Junchi Yan. 2022. Crossformer: Transformer utilizing cross-dimension dependency for multivariate time series forecasting. In The Eleventh International Conference on Learning Representations.
  47. Informer: Beyond efficient transformer for long sequence time-series forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 11106–11115.
  48. FEDformer: Frequency enhanced decomposed transformer for long-term series forecasting. In Proc. 39th International Conference on Machine Learning (ICML 2022) (Baltimore, Maryland).
  49. One Fits All: Power General Time Series Analysis by Pretrained LM. arXiv preprint arXiv:2302.11939 (2023).
Citations (7)

Summary

We haven't generated a summary for this paper yet.