iTransformer: Inverted Transformers Are Effective for Time Series Forecasting (2310.06625v4)
Abstract: The recent boom of linear forecasting models questions the ongoing passion for architectural modifications of Transformer-based forecasters. These forecasters leverage Transformers to model the global dependencies over temporal tokens of time series, with each token formed by multiple variates of the same timestamp. However, Transformers are challenged in forecasting series with larger lookback windows due to performance degradation and computation explosion. Besides, the embedding for each temporal token fuses multiple variates that represent potential delayed events and distinct physical measurements, which may fail in learning variate-centric representations and result in meaningless attention maps. In this work, we reflect on the competent duties of Transformer components and repurpose the Transformer architecture without any modification to the basic components. We propose iTransformer that simply applies the attention and feed-forward network on the inverted dimensions. Specifically, the time points of individual series are embedded into variate tokens which are utilized by the attention mechanism to capture multivariate correlations; meanwhile, the feed-forward network is applied for each variate token to learn nonlinear representations. The iTransformer model achieves state-of-the-art on challenging real-world datasets, which further empowers the Transformer family with promoted performance, generalization ability across different variates, and better utilization of arbitrary lookback windows, making it a nice alternative as the fundamental backbone of time series forecasting. Code is available at this repository: https://github.com/thuml/iTransformer.
- Layer normalization. https://arxiv.org/pdf/1607.06450.pdf, 2016.
- An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv preprint arXiv:1803.01271, 2, 2018.
- Some recent advances in forecasting and control. Journal of the Royal Statistical Society. Series C (Applied Statistics), 17(2):91–109, 1968.
- Language models are few-shot learners. NeurIPS, 2020.
- Flashattention: Fast and memory-efficient exact attention with io-awareness. NeurIPS, 2022.
- Long-term forecasting with tide: Time-series dense encoder. arXiv preprint arXiv:2304.08424, 2023.
- Simmtm: A simple pre-training framework for masked time-series modeling. arXiv preprint arXiv:2302.00861, 2023.
- An image is worth 16x16 words: Transformers for image recognition at scale. ICLR, 2021.
- Tsmixer: Lightweight mlp-mixer model for multivariate time series forecasting. KDD, 2023.
- The capacity and robustness trade-off: Revisiting the channel independent strategy for multivariate time series forecasting. arXiv preprint arXiv:2304.05206, 2023.
- Kurt Hornik. Approximation capabilities of multilayer feedforward networks. Neural networks, 4(2):251–257, 1991.
- Scaling laws for neural language models. arXiv preprint arXiv:2001.08361, 2020.
- Reversible instance normalization for accurate time-series forecasting against distribution shift. ICLR, 2021.
- Adam: A method for stochastic optimization. ICLR, 2015.
- Reformer: The efficient transformer. ICLR, 2020.
- Similarity of neural network representations revisited. ICML, 2019.
- Modeling long-and short-term temporal patterns with deep neural networks. SIGIR, 2018.
- Informer: Beyond efficient transformer for long sequence time-series forecasting. arXiv: 2012.07436, 2021.
- Revisiting long-term time series forecasting: An investigation on linear mapping. arXiv preprint arXiv:2305.10721, 2023.
- Scinet: time series modeling and forecasting with sample convolution and interaction. NeurIPS, 2022a.
- Non-stationary transformers: Rethinking the stationarity in time series forecasting. NeurIPS, 2022b.
- Koopa: Learning non-stationary time series dynamics with koopman predictors. arXiv preprint arXiv:2305.18803, 2023.
- A time series is worth 64 words: Long-term forecasting with transformers. ICLR, 2023.
- N-BEATS: Neural basis expansion analysis for interpretable time series forecasting. ICLR, 2019.
- Pytorch: An imperative style, high-performance deep learning library. NeurIPS, 2019.
- Deep state space models for time series forecasting. NeurIPS, 2018.
- Deepar: Probabilistic forecasting with autoregressive recurrent networks. International Journal of Forecasting, 36(3):1181–1191, 2020.
- Mlp-mixer: An all-mlp architecture for vision. NeurIPS, 2021.
- Attention is all you need. NeurIPS, 2017.
- Autoformer: Decomposition transformers with Auto-Correlation for long-term series forecasting. NeurIPS, 2021.
- Flowformer: Linearizing transformers with conservation flows. ICML, 2022.
- Timesnet: Temporal 2d-variation modeling for general time series analysis. ICLR, 2023.
- Are transformers effective for time series forecasting? AAAI, 2023.
- Crossformer: Transformer utilizing cross-dimension dependency for multivariate time series forecasting. ICLR, 2023.
- Lstm network: a deep learning approach for short-term traffic forecast. IET Intelligent Transport Systems, 11(2):68–75, 2017.
- FEDformer: Frequency enhanced decomposed transformer for long-term series forecasting. ICML, 2022.
- Yong Liu (721 papers)
- Tengge Hu (4 papers)
- Haoran Zhang (102 papers)
- Haixu Wu (26 papers)
- Shiyu Wang (77 papers)
- Lintao Ma (18 papers)
- Mingsheng Long (110 papers)