Bi-Mamba4TS: Bidirectional Forecasting
- Bi-Mamba4TS is a state-space deep learning approach that employs bidirectional processing and memory-enhanced blocks to enhance long-term multivariate time series forecasting.
- It integrates techniques such as instance normalization, patch-wise tokenization, and dual-branch Mamba+ encoders to capture both local features and long-range dependencies.
- Empirical evaluations on diverse real-world datasets show that Bi-Mamba4TS achieves lower errors compared to Transformer-based models while maintaining linear scaling with sequence length.
The term Bi-Mamba4TS encompasses a family of state-space–based deep learning models—primarily exemplified by Bi-Mamba+, also referred to in the literature as Bi-Mamba4TS—designed for long-term, multivariate time series forecasting. These models advance the core Mamba selective state space model through bidirectional processing, memory-enhanced architectural blocks, and adaptive mechanisms for handling inter-variable relationships. Their efficient scaling and empirical performance have positioned them as competitive alternatives to Transformer-based approaches for sequential predictive tasks.
1. Architectural Foundations
Bi-Mamba4TS models are built upon advanced state space model (SSM) frameworks, integrating recent developments from the @@@@1@@@@. The input pipeline typically begins with instance normalization (e.g., RevIN), which stabilizes input distributions for nonstationary multivariate time series. Subsequent patch-wise segmentation transforms the raw sequence into a set of tokens.
A distinguishing feature of Bi-Mamba4TS is the use of bidirectional encoder layers composed of parallel Mamba+ blocks. Each encoder layer processes patch tokens in both forward and backward temporal directions, with the outputs fused via residual addition and further refined through position-wise feed-forward networks. Model output is derived from a linear MLP projector acting on the stacked encoder representations.
At the core of each encoder, the Mamba+ block employs a dual-branch structure: a 1-D convolutional branch with nonlinearity (typically SiLU activation) for local feature extraction, and a gating branch culminating in a “forget gate” that modulates the integration of new and historical features. The internal state evolution follows the discretization of a continuous state-space system,
discretized (using zero-order hold with interval ) as: with and . Parameters are learnable, and the gating mechanism allows the model to balance the influence of rapidly evolving features and long-term dependencies in a data-adaptive manner (Liang et al., 24 Apr 2024).
2. Bidirectional Temporal Modeling
A central innovation in Bi-Mamba4TS is explicit bidirectional sequence processing. Each stack of encoder layers contains two Mamba+ branches: one operates in the original (forward) temporal order, while the other processes a time-reversed (“flipped”) copy of the sequence. Outputs from these parallel branches are combined—typically through addition—before subsequent residual and feed-forward refinements.
This architecture enables the model to capture dependencies detectable only when traversing the sequence from both directions, enhancing the learning of long-range and inter-temporal relationships. Such a mechanism addresses weaknesses in standard unidirectional sequence models, especially where relevant signals may appear distant from target forecasting points or are manifested through complex interleaving of past and future information (Liang et al., 24 Apr 2024).
3. Series-Relation-Aware Tokenization
Bi-Mamba4TS introduces a Series-Relation-Aware (SRA) decider to automate the selection of appropriate tokenization strategies for multivariate time series. Depending on the degree of observed inter-series (cross-channel) dependency, the model decides between channel-independent (processing each variable separately) and channel-mixing (joint processing) strategies.
The SRA decider quantifies relationships between series using Spearman correlation coefficients. For each pair ,
where is the number of time samples. If a sufficient proportion of pairwise coefficients exceeds a pre-defined threshold ( in experiments), channel-mixing is selected; otherwise, channel-independence is preferred. This approach is particularly suitable for datasets featuring variable numbers or heterogeneous dependency structures among time series, as seen in domains like weather, traffic, or electricity forecasting (Liang et al., 24 Apr 2024).
4. Empirical Performance and Computational Considerations
Comprehensive evaluations on eight real-world datasets (e.g., Weather, Traffic, Electricity, Solar, ETT variants) demonstrate that Bi-Mamba4TS achieves lower mean squared error (MSE) and mean absolute error (MAE) across multiple forecast horizons (e.g., 96, 192, 336, 720) compared to both classical and state-of-the-art deep models (including various Transformer variants, iTransformer, PatchTST, and Autoformer).
Ablation studies indicate significant performance deterioration when omitting any key architectural element—bidirectionality, the SRA decider, or residual connections—underscoring the necessity of the integrated approach. For example, removal of bidirectionality alone increases error by more than 3%. The design maintains linear scaling in both memory and compute with sequence length due to the SSM-based formulation and patching strategy, in contrast to the quadratic scaling typical of self-attention mechanisms (Liang et al., 24 Apr 2024).
5. Applications and Broader Impact
The Bi-Mamba4TS methodology is broadly applicable in domains demanding both local pattern recognition and long-range dependency modeling over multivariate signals:
- Energy and Load Forecasting: Prediction of electricity consumption or renewable resource availability, especially in regimes with complex inter-variable dependencies.
- Traffic and Weather Prediction: Accurate modeling of large-scale, high-dimensional spatiotemporal data where both temporal and variable-level influences are critical.
- Financial Time Series: Forecasting indices or asset prices requiring sensitivity to both recent market signals and extended historical trends.
The adaptive tokenization design enables the model to automatically adjust to datasets that may differ widely in variable count and dependency structure. The bidirectional SSM approach is also relevant for sequence modeling tasks outside traditional forecasting, including anomaly detection, missing value imputation, and potentially in domains such as natural language processing or trajectory analysis, where sequence length and bidirectional context are important constraints (Liang et al., 24 Apr 2024).
6. Comparative Analysis and Future Directions
Compared to dual twin SSM variants (e.g., DTMamba), which stack two parallel Mamba modules per block to capture low- and high-level features, Bi-Mamba4TS typically employs a single-layered bidirectional structure, focusing on explicit forward/backward temporal modeling over multi-scale abstraction. While DTMamba integrates channel independence at the preprocessing step and focuses on dual-branch feature extraction, Bi-Mamba4TS leverages adaptive gating and bidirectional sequence traversal as primary mechanisms (Wu et al., 11 May 2024).
Developments in related models—such as MambaTS, which introduces variable permutation and optimal scan ordering, and TSMamba, which uses dual Mamba encoders and transfer learning—suggest further directions for Bi-Mamba4TS evolution. This includes integrating dynamic variable interaction schemes, more sophisticated bidirectional fusion mechanisms, or extending the model to multi-task settings encompassing classification or imputation alongside forecasting (Cai et al., 26 May 2024, Ma et al., 5 Nov 2024).
A plausible implication is that the Bi-Mamba4TS principle of combining selective state-space modeling, efficient bidirectional computation, and adaptive inter-variable tokenization will inform the next generation of scalable, high-accuracy sequence models in both time series and broader modalities.