Papers
Topics
Authors
Recent
2000 character limit reached

Bidirectional Mamba Blocks

Updated 9 December 2025
  • Bidirectional Mamba Blocks are neural architectural primitives that extend state-space models to process sequential data in both forward and backward directions.
  • They combine dual SSM recurrences with dynamic gating and fusion to effectively capture long-range dependencies while maintaining strict linear computational complexity.
  • Empirical results across speech, vision, and time series applications demonstrate enhanced accuracy and hardware efficiency compared to traditional unidirectional or quadratic complexity models.

A Bidirectional Mamba Block is a neural architectural primitive that extends the hardware-optimized, selective state-space modeling mechanisms of the Mamba model to process sequential or tokenized data in both forward (causal) and backward (anti-causal) directions. By running two independent state-space model (SSM) recurrences—one left-to-right and one right-to-left—and then fusing their outputs, Bidirectional Mamba enables efficient global context integration analogous to bidirectional LSTMs or bidirectional Transformers, but with strict linear complexity in sequence length. This design improves long-range dependency modeling, context-aware inference, and computational efficiency across domains including speech, vision, time series, multimodal, and biomedical signal processing.

1. Core Principles and Mathematical Foundations

A Bidirectional Mamba Block consists of two parallel, parameter-tied or independent selective SSM streams:

  • Forward SSM: At time tt, updates hidden state ht=Atht1+Btuth_t^\rightarrow = A_t^\rightarrow h_{t-1}^\rightarrow + B_t^\rightarrow u_t, outputting yt=Cthty_t^\rightarrow = C_t^\rightarrow h_t^\rightarrow.
  • Backward SSM: Processes the time-reversed sequence, ht=Atht+1+Btuth_t^\leftarrow = A_t^\leftarrow h_{t+1}^\leftarrow + B_t^\leftarrow u_t, yt=Cthty_t^\leftarrow = C_t^\leftarrow h_t^\leftarrow.
  • Input-dependent gating is typically realized by a small neural network: ut=σ(Wgxt+bg)(Wuxt+bu)u_t = \sigma(W_g x_t + b_g) \odot (W_u x_t + b_u); matrix parameters AtA_t, BtB_t, CtC_t are usually light-weight MLP-parameterized or diagonal.

Fusion is performed either by concatenation and linear projection, simple summation, or learned gating, yielding a unified representation:

zt=Wf[yt yt]+bf,orzt=αtyt+(1αt)ytz_t = W_f \begin{bmatrix} y_t^\rightarrow \ y_t^\leftarrow \end{bmatrix} + b_f, \quad \text{or} \quad z_t = \alpha_t \odot y_t^\rightarrow + (1-\alpha_t) \odot y_t^\leftarrow

Here, WfW_f is a learnable D×2DD\times 2D matrix, and αt\alpha_t may be a dynamic gate.

A residual connection and normalization (e.g., LayerNorm or RMSNorm) are always applied:

outt=LayerNorm(xt+zt)\mathrm{out}_t = \mathrm{LayerNorm}(x_t + z_t)

Each Bidirectional Mamba Block is commonly stacked LL times to form a deep model.

2. Algorithmic Variants and Architectural Design

Multiple Bidirectional Mamba block instantiations have been reported:

  • Dual-Column (DuaBiMamba): Contains two independent Mamba columns (forward/backward), fuses by concatenation \rightarrow fully connected layer \rightarrow residual ++ layer norm, deployed atop pre-trained acoustic models for spoof detection (Xiao et al., 15 Nov 2024).
  • Parallel Scan (BiMamba): Runs two SSMs on original and reversed input in parallel, concatenates or sums features elementwise, typical in time series, EEG, and diffusion models (Lavaud et al., 10 Dec 2024, Liang et al., 24 Apr 2024, Gao et al., 17 Oct 2024).
  • Task-Axis, Cross-Feature, and Spiral Scans: Adaptive scan orderings in vision/multitask/2D biomedical applications, e.g., spiral scan in medical imaging or cross-task scan in dense prediction for improved spatial/contextual coverage (Cao et al., 28 Aug 2025, Yuan et al., 12 May 2025).
  • Locally Bi-directional (LBMamba): Fuses a forward global scan with lightweight in-register local backward scans within each GPU thread, followed by alternating global direction reversal across blocks for full receptive field with minimal memory overhead (Zhang et al., 19 Jun 2025).

Bidirectional Mamba blocks often integrate additional innovations:

3. Computational Complexity and Efficiency

Bidirectional Mamba blocks maintain strict linear computational and activation memory complexity in sequence length TT (for time-indexed models) or NN (for spatially flattened images or patches), given fixed state or channel size DD:

  • Forward/Backward scan cost: O(2T(D2+D))O(2\,T\,(D^2 + D)) per block.
  • Fusion/Projection cost: O(T2D×D)O(T\,2D \times D) for fully connected merge.
  • Residual and normalization add negligible overhead.
  • The parallel scan implementation yields O(TD+D2logT)O(TD + D^2 \log T) on modern hardware; dual-column/bidirectional designs double compute cost relative to single-path, but remain O(T2)\ll O(T^2) of self-attention (Xiao et al., 15 Nov 2024, Zhu et al., 17 Jan 2024, Zhang et al., 19 Jun 2025).

Empirical benchmarking demonstrates substantial improvement in throughput, memory footprint, and scalability compared to self-attention backbones, especially for long or high-resolution sequences (Zhu et al., 17 Jan 2024, Zhang et al., 19 Jun 2025, Mo et al., 24 May 2024).

4. Applications Across Modalities

Bidirectional Mamba blocks have been integrated and validated in a range of domains:

5. Empirical Evidence and Design Trade-offs

Ablation and benchmarking data demonstrate:

6. Architectural Patterns and Theoretical Implications

Bidirectional Mamba blocks inherit the advantages of SSM kernels for long-range dependency, but supplement RNN-like history with explicit anti-causal modeling:

  • Global context without quadratic attention: Each token or spatial location “sees” both past and future, enabling non-causal inference essential for generative modeling, detection, and segmentation (Zhu et al., 17 Jan 2024, Mo et al., 24 May 2024).
  • Selective gating: Small MLPs choose which parts of the sequence or spatial structure to integrate at each step, enabling noise suppression, context mixing, and adaptation to heterogeneous data (Lavaud et al., 10 Dec 2024, Kheir et al., 20 May 2025).
  • Fusion with locality and multi-scale context: Designs such as spiral scan and position/task-wise fusions allow hierarchical and global context mixing at any desired spatial or temporal granularity (Cao et al., 28 Aug 2025, Yuan et al., 12 May 2025).

7. Limitations, Open Problems, and Prospective Directions

Despite strong empirical evidence for performance and efficiency, several design constraints persist:

  • Double compute cost: Full bidirectional blocks incur 2×2\times SSM compute per layer; local bidirectional variants (e.g., LBMamba) offer mitigation but at possible cost to global receptive field unless augmented with alternating data order or hybrid architectures (Zhang et al., 19 Jun 2025).
  • Expressiveness relative to full attention: Despite superior speed and memory, bidirectional SSMs lack pairwise content-based selection inherent to self-attention, which may still be desirable for certain modeling tasks (Zhu et al., 17 Jan 2024, Zhang et al., 21 May 2024).
  • Parameter sharing and data efficiency: When applied to low-data or high-variance regimes (e.g., speech spoofing), careful integration of pre-training, regularization, and gate-sharing is required (Xiao et al., 15 Nov 2024).
  • Application to complex or non-sequential modalities: Extending bidirectional Mamba to multi-modal, graph-structured, or hierarchical data remains an ongoing area of research.

Bidirectional Mamba Blocks represent a significant advancement in scalable, context-rich sequence and tensor modeling, combining the operational simplicity and linear scaling of SSMs with the full-context capabilities of bidirectional inference, yielding state-of-the-art results and enabling new application domains with practical hardware-awareness and algorithmic flexibility (Xiao et al., 15 Nov 2024, Lavaud et al., 10 Dec 2024, Zhu et al., 17 Jan 2024, Zhang et al., 19 Jun 2025, Cao et al., 28 Aug 2025, Mo et al., 24 May 2024, Zhou et al., 3 Nov 2024).

Whiteboard

Follow Topic

Get notified by email when new papers are published related to Bidirectional Mamba Blocks.