Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Bi-Mamba+: Bidirectional Mamba for Time Series Forecasting (2404.15772v3)

Published 24 Apr 2024 in cs.LG
Bi-Mamba+: Bidirectional Mamba for Time Series Forecasting

Abstract: Long-term time series forecasting (LTSF) provides longer insights into future trends and patterns. Over the past few years, deep learning models especially Transformers have achieved advanced performance in LTSF tasks. However, LTSF faces inherent challenges such as long-term dependencies capturing and sparse semantic characteristics. Recently, a new state space model (SSM) named Mamba is proposed. With the selective capability on input data and the hardware-aware parallel computing algorithm, Mamba has shown great potential in balancing predicting performance and computational efficiency compared to Transformers. To enhance Mamba's ability to preserve historical information in a longer range, we design a novel Mamba+ block by adding a forget gate inside Mamba to selectively combine the new features with the historical features in a complementary manner. Furthermore, we apply Mamba+ both forward and backward and propose Bi-Mamba+, aiming to promote the model's ability to capture interactions among time series elements. Additionally, multivariate time series data in different scenarios may exhibit varying emphasis on intra- or inter-series dependencies. Therefore, we propose a series-relation-aware decider that controls the utilization of channel-independent or channel-mixing tokenization strategy for specific datasets. Extensive experiments on 8 real-world datasets show that our model achieves more accurate predictions compared with state-of-the-art methods.

Bi-Mamba4TS: Enhancing Long-Term Time Series Forecasting with a Bidirectional Mamba Model

Introduction

Time series forecasting (TSF) plays a pivotal role across numerous domains such as traffic management, energy, and finance, particularly where long-term forecasting is paramount. Although Transformer-based models have gained traction in this regard due to their capacity to model long-range dependencies, their quadratic computational complexity remains a significant bottleneck. Recently, state space models (SSM), and notably the Mamba model, have emerged as effective alternatives due to their linear computational complexity and robustness in handling long sequences. Building on this, we introduce the Bi-Mamba4TS model which integrates bidirectional Mamba models to enhance the capability of time series forecasting.

Model Architecture

The Bi-Mamba4TS employs a novel approach to model both "channel-independent" and "channel-mixing" scenarios via a mechanism that evaluates dataset characteristics to decide on the appropriate strategy. This is governed by a "series-relation-aware" (SRA) decider that leverages the Pearson correlation coefficient, providing an objective basis for strategy selection. Input data is decomposed into patches to enrich local semantic information. This patch-wise tokenization not only helps in reducing computational load but also enhances the model's ability to capture intricate evolutionary patterns in the data.

Main Contributions:

  • We propose a new SSM-based model, Bi-Mamba4TS, harnessing the power of bidirectional Mamba encoders, which enhances the modeling of long-range dependencies in time series data.
  • The model introduces a decision-making mechanism (SRA decider) based on Pearson correlation coefficients to autonomously decide between channel-independent and channel-mixing strategies based on the dataset characteristics.
  • Extensive experiments on diverse real-world datasets show that Bi-Mamba4TS achieves superior forecasting accuracy compared to existing state-of-the-art methods.

Model Evaluation and Results

In rigorous experiments across seven varied real-world datasets, Bi-Mamba4TS consistently outperformed other leading models in long-term multivariate time-series forecasting. The model not only excelled in terms of prediction accuracy but also demonstrated efficiency in computational resource utilization. These experiments underscore the effectiveness of Bi-Mamba4TS in a practical setting, making it a valuable tool for various real-life applications requiring accurate and efficient long-term forecasting.

Model Efficiency and Ablation Study

Efficiency analysis reiterated that Bi-Mamba4TS balances well between accuracy and computational demands. The ablation studies further validated the importance of bidirectional encoding and adaptive strategy selection in enhancing forecasting performance. The model demonstrates robustness across different parameter settings, emphasizing its practical utility.

Future Directions

The promising results invite further exploration into more complex and dynamic scenarios, such as adaptive forecasting in rapidly changing environments. Future work could also delve into refining the SRA decider to accommodate more nuanced dataset characteristics and exploring the integration of Bi-Mamba4TS with other forecasting frameworks to leverage complementary strengths.

Conclusion

Bi-Mamba4TS sets a new benchmark in long-term time series forecasting by effectively addressing the computational inefficiencies of traditional models and introducing an adaptive mechanism that aligns model strategy with data characteristics. Its superior performance, backed by rigorous experimental validation, makes it a potent tool for a wide range of applications in time series forecasting.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (35)
  1. Timemachine: A time series is worth 4 mambas for long-term forecasting. arXiv preprint arXiv:2403.09898, 2024.
  2. Inparformer: evolutionary decomposition transformers with interactive parallel attention for long-term time series forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 6906–6915, 2023.
  3. Nhits: Neural hierarchical interpolation for time series forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 6989–6997, 2023.
  4. Adarnn: Adaptive learning and forecasting of time series. In Proceedings of the 30th ACM international conference on information & knowledge management, pages 402–411, 2021.
  5. Mamba: Linear-time sequence modeling with selective state spaces. arXiv preprint arXiv:2312.00752, 2023.
  6. Hippo: Recurrent memory with optimal polynomial projections. Advances in neural information processing systems, 33:1474–1487, 2020.
  7. Efficiently modeling long sequences with structured state spaces. arXiv preprint arXiv:2111.00396, 2021a.
  8. Combining recurrent, convolutional, and continuous-time models with linear state space layers. Advances in neural information processing systems, 34:572–585, 2021b.
  9. Attention based spatial-temporal graph convolutional networks for traffic flow forecasting. In Proceedings of the AAAI conference on artificial intelligence, volume 33, pages 922–929, 2019.
  10. Crossgnn: Confronting noisy multivariate time series via cross interaction refinement. Advances in Neural Information Processing Systems, 36, 2024.
  11. Witran: Water-wave information transmission and recurrent acceleration network for long-range time series forecasting. Advances in Neural Information Processing Systems, 36, 2024.
  12. Reversible instance normalization for accurate time-series forecasting against distribution shift. In International Conference on Learning Representations, 2022.
  13. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  14. Jamba: A hybrid transformer-mamba language model. arXiv preprint arXiv:2403.19887, 2024.
  15. Time-series forecasting with deep learning: a survey. Philosophical Transactions of the Royal Society A, 379(2194):20200209, 2021.
  16. Ssdnet: State space decomposition neural network for time series forecasting. In 2021 IEEE International Conference on Data Mining (ICDM), pages 370–378. IEEE, 2021.
  17. Pyraformer: Low-complexity pyramidal attention for long-range time series modeling and forecasting. In International conference on learning representations, 2021.
  18. itransformer: Inverted transformers are effective for time series forecasting. In The Twelfth International Conference on Learning Representations, 2023.
  19. Multivariate time series forecasting method based on nonlinear spiking neural p systems and non-subsampled shearlet transform. Neural Networks, 152:300–310, 2022.
  20. U-mamba: Enhancing long-range dependency for biomedical image segmentation. arXiv preprint arXiv:2401.04722, 2024.
  21. A time series is worth 64 words: Long-term forecasting with transformers. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net, 2023.
  22. Financial time series forecasting with deep learning: A systematic literature review: 2005–2019. Applied soft computing, 90:106181, 2020.
  23. A new framework for multivariate time series forecasting in energy management system. IEEE Transactions on Smart Grid, 2022.
  24. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  25. Is mamba effective for time series forecasting? arXiv preprint arXiv:2403.11144, 2024.
  26. Autoformer: Decomposition transformers with Auto-Correlation for long-term series forecasting. In Advances in Neural Information Processing Systems, 2021.
  27. Timesnet: Temporal 2d-variation modeling for general time series analysis. In The eleventh international conference on learning representations, 2022.
  28. Are transformers effective for time series forecasting? In Proceedings of the AAAI conference on artificial intelligence, volume 37, pages 11121–11128, 2023.
  29. Solar forecasting with hourly updated numerical weather prediction. Renewable and Sustainable Energy Reviews, 154:111768, 2022a.
  30. Effectively modeling time series with simple discrete state spaces. In The Eleventh International Conference on Learning Representations, 2022b.
  31. Crossformer: Transformer utilizing cross-dimension dependency for multivariate time series forecasting. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net, 2023.
  32. Sageformer: Series-aware framework for long-term multivariate time series forecasting. IEEE Internet of Things Journal, 2024.
  33. Informer: Beyond efficient transformer for long sequence time-series forecasting. In The Thirty-Fifth AAAI Conference on Artificial Intelligence, volume 35, pages 11106–11115, 2021.
  34. Fedformer: Frequency enhanced decomposed transformer for long-term series forecasting. In International conference on machine learning, pages 27268–27286. PMLR, 2022.
  35. Vision mamba: Efficient visual representation learning with bidirectional state space model. arXiv preprint arXiv:2401.09417, 2024.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Aobo Liang (3 papers)
  2. Xingguo Jiang (1 paper)
  3. Yan Sun (309 papers)
  4. Xiaohou Shi (2 papers)
  5. Ke Li (722 papers)
Citations (2)
X Twitter Logo Streamline Icon: https://streamlinehq.com
Youtube Logo Streamline Icon: https://streamlinehq.com