Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

iTransformer: Inverted Transformers Are Effective for Time Series Forecasting (2310.06625v4)

Published 10 Oct 2023 in cs.LG

Abstract: The recent boom of linear forecasting models questions the ongoing passion for architectural modifications of Transformer-based forecasters. These forecasters leverage Transformers to model the global dependencies over temporal tokens of time series, with each token formed by multiple variates of the same timestamp. However, Transformers are challenged in forecasting series with larger lookback windows due to performance degradation and computation explosion. Besides, the embedding for each temporal token fuses multiple variates that represent potential delayed events and distinct physical measurements, which may fail in learning variate-centric representations and result in meaningless attention maps. In this work, we reflect on the competent duties of Transformer components and repurpose the Transformer architecture without any modification to the basic components. We propose iTransformer that simply applies the attention and feed-forward network on the inverted dimensions. Specifically, the time points of individual series are embedded into variate tokens which are utilized by the attention mechanism to capture multivariate correlations; meanwhile, the feed-forward network is applied for each variate token to learn nonlinear representations. The iTransformer model achieves state-of-the-art on challenging real-world datasets, which further empowers the Transformer family with promoted performance, generalization ability across different variates, and better utilization of arbitrary lookback windows, making it a nice alternative as the fundamental backbone of time series forecasting. Code is available at this repository: https://github.com/thuml/iTransformer.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (36)
  1. Layer normalization. https://arxiv.org/pdf/1607.06450.pdf, 2016.
  2. An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv preprint arXiv:1803.01271, 2, 2018.
  3. Some recent advances in forecasting and control. Journal of the Royal Statistical Society. Series C (Applied Statistics), 17(2):91–109, 1968.
  4. Language models are few-shot learners. NeurIPS, 2020.
  5. Flashattention: Fast and memory-efficient exact attention with io-awareness. NeurIPS, 2022.
  6. Long-term forecasting with tide: Time-series dense encoder. arXiv preprint arXiv:2304.08424, 2023.
  7. Simmtm: A simple pre-training framework for masked time-series modeling. arXiv preprint arXiv:2302.00861, 2023.
  8. An image is worth 16x16 words: Transformers for image recognition at scale. ICLR, 2021.
  9. Tsmixer: Lightweight mlp-mixer model for multivariate time series forecasting. KDD, 2023.
  10. The capacity and robustness trade-off: Revisiting the channel independent strategy for multivariate time series forecasting. arXiv preprint arXiv:2304.05206, 2023.
  11. Kurt Hornik. Approximation capabilities of multilayer feedforward networks. Neural networks, 4(2):251–257, 1991.
  12. Scaling laws for neural language models. arXiv preprint arXiv:2001.08361, 2020.
  13. Reversible instance normalization for accurate time-series forecasting against distribution shift. ICLR, 2021.
  14. Adam: A method for stochastic optimization. ICLR, 2015.
  15. Reformer: The efficient transformer. ICLR, 2020.
  16. Similarity of neural network representations revisited. ICML, 2019.
  17. Modeling long-and short-term temporal patterns with deep neural networks. SIGIR, 2018.
  18. Informer: Beyond efficient transformer for long sequence time-series forecasting. arXiv: 2012.07436, 2021.
  19. Revisiting long-term time series forecasting: An investigation on linear mapping. arXiv preprint arXiv:2305.10721, 2023.
  20. Scinet: time series modeling and forecasting with sample convolution and interaction. NeurIPS, 2022a.
  21. Non-stationary transformers: Rethinking the stationarity in time series forecasting. NeurIPS, 2022b.
  22. Koopa: Learning non-stationary time series dynamics with koopman predictors. arXiv preprint arXiv:2305.18803, 2023.
  23. A time series is worth 64 words: Long-term forecasting with transformers. ICLR, 2023.
  24. N-BEATS: Neural basis expansion analysis for interpretable time series forecasting. ICLR, 2019.
  25. Pytorch: An imperative style, high-performance deep learning library. NeurIPS, 2019.
  26. Deep state space models for time series forecasting. NeurIPS, 2018.
  27. Deepar: Probabilistic forecasting with autoregressive recurrent networks. International Journal of Forecasting, 36(3):1181–1191, 2020.
  28. Mlp-mixer: An all-mlp architecture for vision. NeurIPS, 2021.
  29. Attention is all you need. NeurIPS, 2017.
  30. Autoformer: Decomposition transformers with Auto-Correlation for long-term series forecasting. NeurIPS, 2021.
  31. Flowformer: Linearizing transformers with conservation flows. ICML, 2022.
  32. Timesnet: Temporal 2d-variation modeling for general time series analysis. ICLR, 2023.
  33. Are transformers effective for time series forecasting? AAAI, 2023.
  34. Crossformer: Transformer utilizing cross-dimension dependency for multivariate time series forecasting. ICLR, 2023.
  35. Lstm network: a deep learning approach for short-term traffic forecast. IET Intelligent Transport Systems, 11(2):68–75, 2017.
  36. FEDformer: Frequency enhanced decomposed transformer for long-term series forecasting. ICML, 2022.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Yong Liu (721 papers)
  2. Tengge Hu (4 papers)
  3. Haoran Zhang (102 papers)
  4. Haixu Wu (26 papers)
  5. Shiyu Wang (77 papers)
  6. Lintao Ma (18 papers)
  7. Mingsheng Long (110 papers)
Citations (237)

Summary

Insights on iTransformer: Inverted Transformers for Time Series Forecasting

The paper "iTransformer: Inverted Transformers Are Effective for Time Series Forecasting" proposes a novel perspective on leveraging Transformers for multivariate time series forecasting tasks. It addresses inherent inefficiencies in current approaches that apply temporal tokens, emphasizing instead the construction of variate tokens. This paper identifies and solves core challenges faced when employing standard Transformer architectures in time series problems, especially those with multivariate dimensions and long lookback windows.

Problem Statement and Challenges

Traditional Transformer-based models face significant hurdles when applied to time series forecasting. The embeddings of temporal tokens typically fuse multi-variate data, leading to potential misalignment and inefficient inter-variable attention mechanisms. This results in degraded performance, computational inefficiency, and poorly interpretable attention maps. The existing structure fails to accommodate larger lookback windows due to computational constraints and negligible modeling advantages.

Proposed Approach

The authors introduce iTransformer, a model that applies the attention and feed-forward network on inverted dimensions. Instead of embedding multiple variables for a singular time step, iTransformer treats the entire series of each variate as independent tokens. This reversal of the token approach enhances the model's capacity to capture multivariate correlations efficiently and effectively.

Key aspects of iTransformer include:

  • Embedding: Each time series variate is embedded independently, allowing for better integration of series-specific information.
  • Attention Mechanism: Attention is focused on these variate tokens, facilitating enhanced interpretability and revealing more accurate multivariate correlations.
  • Feed-Forward Network: Applied across temporal sequences, this network ensures nonlinear representation learning, critical for capturing global time series trends.

Evaluation and Results

iTransformer demonstrates state-of-the-art performance across several real-world datasets, significantly outperforming existing Transformer models. The evaluation showcases its robust capacity to handle extensive lookback windows and effectively generalize across unseen variates. Key results highlight the model's efficiency in processing high-dimensional time series.

Practical and Theoretical Implications

The paper acknowledges the inadequacies of conventional time series tokenization within Transformers and challenges the prevailing use of temporal embedding strategies. It highlights the potential of inverted dimension models in providing more meaningful and computationally efficient solutions. This approach not only rectifies inefficiencies in multivariate representation but also paves the way for transformers to serve as fundamental backbones in complex temporal forecasting scenarios.

Future Directions

This paper opens discussions for leveraging efficient attention mechanisms tailored for multivariate processes. Future work may delve into enhancing the extraction of temporal features with advanced linear and non-linear modeling. Additionally, there's significant potential in exploring pre-training paradigms specific to time series tasks using the iTransformer architecture, to further elevate its utility across wider domains.

In conclusion, iTransformer presents a pivotal shift in Transformer application to time series forecasting, yielding promising results and establishing guidelines for further exploration in efficient architectural design for multivariate temporal data.

Youtube Logo Streamline Icon: https://streamlinehq.com