TSMixer: Lightweight MLP-Mixer Model for Multivariate Time Series Forecasting (2306.09364v4)
Abstract: Transformers have gained popularity in time series forecasting for their ability to capture long-sequence interactions. However, their high memory and computing requirements pose a critical bottleneck for long-term forecasting. To address this, we propose TSMixer, a lightweight neural architecture exclusively composed of multi-layer perceptron (MLP) modules for multivariate forecasting and representation learning on patched time series. Inspired by MLP-Mixer's success in computer vision, we adapt it for time series, addressing challenges and introducing validated components for enhanced accuracy. This includes a novel design paradigm of attaching online reconciliation heads to the MLP-Mixer backbone, for explicitly modeling the time-series properties such as hierarchy and channel-correlations. We also propose a novel Hybrid channel modeling and infusion of a simple gating approach to effectively handle noisy channel interactions and generalization across diverse datasets. By incorporating these lightweight components, we significantly enhance the learning capability of simple MLP structures, outperforming complex Transformer models with minimal computing usage. Moreover, TSMixer's modular design enables compatibility with both supervised and masked self-supervised learning methods, making it a promising building block for time-series Foundation Models. TSMixer outperforms state-of-the-art MLP and Transformer models in forecasting by a considerable margin of 8-60%. It also outperforms the latest strong benchmarks of Patch-Transformer models (by 1-2%) with a significant reduction in memory and runtime (2-3X). The source code of our model is officially released as PatchTSMixer in the HuggingFace. Model: https://huggingface.co/docs/transformers/main/en/model_doc/patchtsmixer Examples: https://github.com/ibm/tsfm/#notebooks-links
- 2023. KDD 2023 Conference details. (2023). https://kdd.org/kdd2023/call-for-research-track-papers/
- Short-term electricity demand forecasting with MARS, SVR and ARIMA models using aggregated demand data in Queensland, Australia. Advanced Engineering Informatics 35 (2018), 1–16.
- Anonymous. 2023. Crossformer: Transformer Utilizing Cross-Dimension Dependency for Multivariate Time Series Forecasting. (2023). https://openreview.net/forum?id=vSVLM2j9eie under review.
- Layer normalization. arXiv preprint arXiv:1607.06450 (2016).
- Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014).
- BEiT: BERT Pre-Training of Image Transformers. In International Conference on Learning Representations. https://openreview.net/forum?id=p-BhZSz59o4
- On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258 (2021).
- CycleMLP: A MLP-like Architecture for Dense Prediction. (2021). https://doi.org/10.48550/ARXIV.2107.10224
- Tsmixer: An all-mlp architecture for time series forecasting. arXiv preprint arXiv:2303.06053 (2023).
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. CoRR abs/1810.04805 (2018). arXiv:1810.04805 http://arxiv.org/abs/1810.04805
- Time-Series Representation Learning via Temporal and Contextual Contrasting. In Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI-21. 2352–2359.
- Efficiently Modeling Long Sequences with Structured State Spaces. (2021). https://arxiv.org/abs/2111.00396
- Hire-MLP: Vision MLP via Hierarchical Rearrangement. (2021). https://doi.org/10.48550/ARXIV.2108.13341
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770–778.
- R.J. Hyndman and G. Athanasopoulos (Eds.). 2021. Forecasting: principles and practice. OTexts: Melbourne, Australia. OTexts.com/fpp3.
- Hierarchy-guided Model Selection for Time Series Forecasting. https://doi.org/10.48550/ARXIV.2211.15092
- Reversible Instance Normalization for Accurate Time-Series Forecasting against Distribution Shift. In International Conference on Learning Representations. https://openreview.net/forum?id=cGDAkQo1C0p
- MLP4Rec: A Pure MLP Architecture for Sequential Recommendations. In Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI-22, Lud De Raedt (Ed.). International Joint Conferences on Artificial Intelligence Organization, 2138–2144. https://doi.org/10.24963/ijcai.2022/297 Main Track.
- Pay attention to mlps. Advances in Neural Information Processing Systems 34 (2021), 9204–9215.
- Pyraformer: Low-Complexity Pyramidal Attention for Long-Range Time Series Modeling and Forecasting. In International Conference on Learning Representations.
- M5 accuracy competition: Results, findings, and conclusions. International Journal of Forecasting (2022). https://www.sciencedirect.com/science/article/pii/S0169207021001874 https://doi.org/10.1016/j.ijforecast.2021.11.013.
- A Time Series is Worth 64 Words: Long-term Forecasting with Transformers. https://doi.org/10.48550/ARXIV.2211.14730
- Leslie N. Smith and Nicholay Topin. 2017. Super-Convergence: Very Fast Training of Neural Networks Using Large Learning Rates. (2017). https://doi.org/10.48550/ARXIV.1708.07120
- An Image Patch is a Wave: Phase-Aware Vision MLP. (2021). https://doi.org/10.48550/ARXIV.2111.12294
- Mlp-mixer: An all-mlp architecture for vision. Advances in Neural Information Processing Systems 34 (2021), 24261–24272.
- Unsupervised Representation Learning for Time Series with Temporal Neighborhood Coding. In International Conference on Learning Representations. https://openreview.net/forum?id=8qDwejCuCN
- Resmlp: Feedforward networks for image classification with data-efficient training. IEEE Transactions on Pattern Analysis and Machine Intelligence (2022).
- Representation Learning with Contrastive Predictive Coding. CoRR abs/1807.03748 (2018). arXiv:1807.03748 http://arxiv.org/abs/1807.03748
- Attention is All you Need. In Advances in Neural Information Processing Systems, Vol. 30. https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf
- Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting. In Advances in Neural Information Processing Systems.
- Ling Yang and Shenda Hong. 2022. Unsupervised Time-Series Representation Learning with Iterative Bilinear Temporal-Spectral Fusion. In ICML.
- S22{}^{2}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPT-MLP: Spatial-Shift MLP Architecture for Vision. (2021). https://doi.org/10.48550/ARXIV.2106.07477
- Ts2vec: Towards universal representation of time series. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 36. 8980–8987.
- Are Transformers Effective for Time Series Forecasting? arXiv preprint arXiv:2205.13504 (2022). https://arxiv.org/pdf/2205.13504.pdf
- Github Repo: Are Transformers Effective for Time Series Forecasting? arXiv preprint arXiv:2205.13504 (2022). https://github.com/cure-lab/LTSF-Linear
- Less Is More: Fast Multivariate Time Series Forecasting with Light Sampling-oriented MLP Structures. https://doi.org/10.48550/ARXIV.2207.01186
- Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting. In The Thirty-Fifth AAAI Conference on Artificial Intelligence, Vol. 35. 11106–11115.
- FEDformer: Frequency enhanced decomposed transformer for long-term series forecasting. In Proc. 39th International Conference on Machine Learning (Baltimore, Maryland).
- Vijay Ekambaram (8 papers)
- Arindam Jati (14 papers)
- Nam Nguyen (46 papers)
- Phanwadee Sinthong (4 papers)
- Jayant Kalagnanam (15 papers)