Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

TSMixer: Lightweight MLP-Mixer Model for Multivariate Time Series Forecasting (2306.09364v4)

Published 14 Jun 2023 in cs.LG and cs.AI

Abstract: Transformers have gained popularity in time series forecasting for their ability to capture long-sequence interactions. However, their high memory and computing requirements pose a critical bottleneck for long-term forecasting. To address this, we propose TSMixer, a lightweight neural architecture exclusively composed of multi-layer perceptron (MLP) modules for multivariate forecasting and representation learning on patched time series. Inspired by MLP-Mixer's success in computer vision, we adapt it for time series, addressing challenges and introducing validated components for enhanced accuracy. This includes a novel design paradigm of attaching online reconciliation heads to the MLP-Mixer backbone, for explicitly modeling the time-series properties such as hierarchy and channel-correlations. We also propose a novel Hybrid channel modeling and infusion of a simple gating approach to effectively handle noisy channel interactions and generalization across diverse datasets. By incorporating these lightweight components, we significantly enhance the learning capability of simple MLP structures, outperforming complex Transformer models with minimal computing usage. Moreover, TSMixer's modular design enables compatibility with both supervised and masked self-supervised learning methods, making it a promising building block for time-series Foundation Models. TSMixer outperforms state-of-the-art MLP and Transformer models in forecasting by a considerable margin of 8-60%. It also outperforms the latest strong benchmarks of Patch-Transformer models (by 1-2%) with a significant reduction in memory and runtime (2-3X). The source code of our model is officially released as PatchTSMixer in the HuggingFace. Model: https://huggingface.co/docs/transformers/main/en/model_doc/patchtsmixer Examples: https://github.com/ibm/tsfm/#notebooks-links

Definition Search Book Streamline Icon: https://streamlinehq.com
References (38)
  1. 2023. KDD 2023 Conference details. (2023). https://kdd.org/kdd2023/call-for-research-track-papers/
  2. Short-term electricity demand forecasting with MARS, SVR and ARIMA models using aggregated demand data in Queensland, Australia. Advanced Engineering Informatics 35 (2018), 1–16.
  3. Anonymous. 2023. Crossformer: Transformer Utilizing Cross-Dimension Dependency for Multivariate Time Series Forecasting. (2023). https://openreview.net/forum?id=vSVLM2j9eie under review.
  4. Layer normalization. arXiv preprint arXiv:1607.06450 (2016).
  5. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014).
  6. BEiT: BERT Pre-Training of Image Transformers. In International Conference on Learning Representations. https://openreview.net/forum?id=p-BhZSz59o4
  7. On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258 (2021).
  8. CycleMLP: A MLP-like Architecture for Dense Prediction. (2021). https://doi.org/10.48550/ARXIV.2107.10224
  9. Tsmixer: An all-mlp architecture for time series forecasting. arXiv preprint arXiv:2303.06053 (2023).
  10. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. CoRR abs/1810.04805 (2018). arXiv:1810.04805 http://arxiv.org/abs/1810.04805
  11. Time-Series Representation Learning via Temporal and Contextual Contrasting. In Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI-21. 2352–2359.
  12. Efficiently Modeling Long Sequences with Structured State Spaces. (2021). https://arxiv.org/abs/2111.00396
  13. Hire-MLP: Vision MLP via Hierarchical Rearrangement. (2021). https://doi.org/10.48550/ARXIV.2108.13341
  14. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770–778.
  15. R.J. Hyndman and G. Athanasopoulos (Eds.). 2021. Forecasting: principles and practice. OTexts: Melbourne, Australia. OTexts.com/fpp3.
  16. Hierarchy-guided Model Selection for Time Series Forecasting. https://doi.org/10.48550/ARXIV.2211.15092
  17. Reversible Instance Normalization for Accurate Time-Series Forecasting against Distribution Shift. In International Conference on Learning Representations. https://openreview.net/forum?id=cGDAkQo1C0p
  18. MLP4Rec: A Pure MLP Architecture for Sequential Recommendations. In Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI-22, Lud De Raedt (Ed.). International Joint Conferences on Artificial Intelligence Organization, 2138–2144. https://doi.org/10.24963/ijcai.2022/297 Main Track.
  19. Pay attention to mlps. Advances in Neural Information Processing Systems 34 (2021), 9204–9215.
  20. Pyraformer: Low-Complexity Pyramidal Attention for Long-Range Time Series Modeling and Forecasting. In International Conference on Learning Representations.
  21. M5 accuracy competition: Results, findings, and conclusions. International Journal of Forecasting (2022). https://www.sciencedirect.com/science/article/pii/S0169207021001874 https://doi.org/10.1016/j.ijforecast.2021.11.013.
  22. A Time Series is Worth 64 Words: Long-term Forecasting with Transformers. https://doi.org/10.48550/ARXIV.2211.14730
  23. Leslie N. Smith and Nicholay Topin. 2017. Super-Convergence: Very Fast Training of Neural Networks Using Large Learning Rates. (2017). https://doi.org/10.48550/ARXIV.1708.07120
  24. An Image Patch is a Wave: Phase-Aware Vision MLP. (2021). https://doi.org/10.48550/ARXIV.2111.12294
  25. Mlp-mixer: An all-mlp architecture for vision. Advances in Neural Information Processing Systems 34 (2021), 24261–24272.
  26. Unsupervised Representation Learning for Time Series with Temporal Neighborhood Coding. In International Conference on Learning Representations. https://openreview.net/forum?id=8qDwejCuCN
  27. Resmlp: Feedforward networks for image classification with data-efficient training. IEEE Transactions on Pattern Analysis and Machine Intelligence (2022).
  28. Representation Learning with Contrastive Predictive Coding. CoRR abs/1807.03748 (2018). arXiv:1807.03748 http://arxiv.org/abs/1807.03748
  29. Attention is All you Need. In Advances in Neural Information Processing Systems, Vol. 30. https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf
  30. Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting. In Advances in Neural Information Processing Systems.
  31. Ling Yang and Shenda Hong. 2022. Unsupervised Time-Series Representation Learning with Iterative Bilinear Temporal-Spectral Fusion. In ICML.
  32. S22{}^{2}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPT-MLP: Spatial-Shift MLP Architecture for Vision. (2021). https://doi.org/10.48550/ARXIV.2106.07477
  33. Ts2vec: Towards universal representation of time series. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 36. 8980–8987.
  34. Are Transformers Effective for Time Series Forecasting? arXiv preprint arXiv:2205.13504 (2022). https://arxiv.org/pdf/2205.13504.pdf
  35. Github Repo: Are Transformers Effective for Time Series Forecasting? arXiv preprint arXiv:2205.13504 (2022). https://github.com/cure-lab/LTSF-Linear
  36. Less Is More: Fast Multivariate Time Series Forecasting with Light Sampling-oriented MLP Structures. https://doi.org/10.48550/ARXIV.2207.01186
  37. Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting. In The Thirty-Fifth AAAI Conference on Artificial Intelligence, Vol. 35. 11106–11115.
  38. FEDformer: Frequency enhanced decomposed transformer for long-term series forecasting. In Proc. 39th International Conference on Machine Learning (Baltimore, Maryland).
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Vijay Ekambaram (8 papers)
  2. Arindam Jati (14 papers)
  3. Nam Nguyen (46 papers)
  4. Phanwadee Sinthong (4 papers)
  5. Jayant Kalagnanam (15 papers)
Citations (88)

Summary

TSMixer: A Lightweight Model for Multivariate Time Series Forecasting

The recent research paper introduces TSMixer, a lightweight neural network architecture composed exclusively of Multi-Layer Perceptron (MLP) modules designed to address the challenges in multivariate time series forecasting. Unlike the popular Transformer models that are often memory and compute-intensive, TSMixer offers an efficient alternative while maintaining competitive predictive performance.

Background and Motivation

Inspired by the capabilities of Transformers in capturing long-sequence dependencies, they have become a common choice for time series forecasting tasks. However, their high computational requirements present a significant limitation for long-term forecasting applications. MLP-Mixers, initially successful in the vision domain, offer a promising solution by eliminating intensive self-attention mechanisms. TSMixer draws on this approach, aiming to effectively forecast multivariate time series with lower resource consumption.

Methodology

TSMixer integrates various innovative components to enhance the basic MLP-Mixer architecture:

  1. Channel Independence: The model employs a channel-independent backbone, enabling shared learning across multiple datasets with different channel counts. This approach improves the model's generalization capabilities.
  2. Hybrid Channel Modeling: A cross-channel reconciliation head is introduced to refine forecasts by leveraging inter-channel dependencies, enhancing generalization across diverse datasets.
  3. Hierarchical Reconciliation and Gated Attention: TSMixer incorporates a hierarchical patch reconciliation head and a gated attention mechanism to effectively model temporal dependencies and reduce attention to redundant features.
  4. Patching and Modular Design: The model adopts a patching approach, significantly reducing input size and enabling efficient learning. Its modular design supports both supervised and self-supervised training methodologies.

Empirical Results

The paper presents extensive empirical evaluations on seven widely-used public datasets. Key findings include:

  • TSMixer outperforms state-of-the-art MLP and Transformer models, achieving a forecast accuracy improvement of 8-60%.
  • Compared to Patch-Transformer models, TSMixer shows a marginal improvement of 1-2% while achieving considerable memory and runtime reductions (2-3X).
  • The model emerges as a viable building block for time series foundation models due to its adaptability to both supervised and self-supervised learning paradigms.

Implications and Future Directions

The introduction of TSMixer marks a significant development in time series forecasting by providing a resource-efficient alternative to Transformers. The implications are substantial for industries reliant on long-term forecasting, such as energy, finance, and climate modeling. TSMixer's ability to generalize across different datasets and tasks offers avenues for broader applicability in real-world scenarios.

Future research may focus on extending TSMixer's capabilities to other time-series tasks, such as anomaly detection and classification, and exploring its transfer learning potential. Additionally, integrating newer mixer variants could further enhance its performance and applicability.

In summary, TSMixer contributes a meaningful development in the domain of multivariate time series forecasting, providing a compelling balance between computational efficiency and forecasting accuracy. Its design principles could serve as a foundation for future explorations in lightweight model architectures for diverse applications.

X Twitter Logo Streamline Icon: https://streamlinehq.com
Youtube Logo Streamline Icon: https://streamlinehq.com