Papers
Topics
Authors
Recent
Search
2000 character limit reached

Bi-Mamba: Bidirectional Neural Architecture

Updated 19 February 2026
  • Bi-Mamba is a bidirectional state-space neural architecture that fuses forward and reverse Mamba modules to provide every timestep with past and future context.
  • It integrates specialized segmentation, K-regression, and α-regression blocks to directly infer diffusion parameters and detect change points in noisy trajectories.
  • Empirical results show Bi-Mamba’s superior accuracy and faster convergence compared to traditional unidirectional models and bidirectional RNN baselines.

Bi-Mamba refers to a class of neural architectures that generalize the Mamba selective State Space Model (SSM) framework to bidirectional processing. The Bi-Mamba principle is to run two Mamba state-space modules in parallel: one processes a sequence in the forward direction, and the other processes a reversed copy. Their outputs are concatenated or fused, furnishing every position with both past and future context. This bidirectional context differs sharply from the intrinsically causal, unidirectional standard in original Mamba, enabling Bi-Mamba to provide global receptive fields with linear-time complexity, and has shown strong empirical advantages in regression, segmentation, and dynamical parameter estimation for short, noisy sequences (Lavaud et al., 2024).

1. The Anomalous Diffusion Inference Problem

Bi-Mamba was initially introduced in the context of anomalous diffusion characterization. For a 2D trajectory r(t)=(x(t),y(t))r(t) = (x(t), y(t)), with t=1,,Tt=1,\dots,T, the task is to infer the effective diffusion coefficient KK and anomalous exponent α\alpha such that

MSD(t)=r(t)r(0)2=4Ktα\mathrm{MSD}(t) = \langle |r(t)-r(0)|^2 \rangle = 4 K t^{\alpha}

where α=1\alpha=1 represents normal diffusion, α<1\alpha < 1 corresponds to subdiffusion, and α>1\alpha > 1 to superdiffusion. For short, noisy trajectories, traditional mean-squared displacement (MSD) regression is unreliable. Bi-Mamba addresses this by providing an end-to-end regressor that directly outputs predicted (K^,α^)(\hat K, \hat\alpha) from single-trajectory data, along with auxiliary tasks such as diffusion-state segmentation and change-point detection (Lavaud et al., 2024).

2. Selective State-Space and Bidirectional Recurrence

Each Mamba block at time tt maintains a latent state htRHh_t \in \mathbb{R}^H updated by

zt=σ(Wzht1+Uzut+bz),h~t=tanh(Whht1+Uhut+bh)z_t = \sigma(W_z h_{t-1} + U_z u_t + b_z ), \quad \tilde{h}_t = \tanh(W_h h_{t-1} + U_h u_t + b_h )

ht=ztht1+(1zt)h~th_t = z_t \odot h_{t-1} + (1 - z_t) \odot \tilde{h}_t

ot=Vht+co_t = V h_t + c

where utu_t is a per-time-step feature vector, WW, UU, bb, VV, and cc are learned, and \odot is the elementwise product. The bidirectional scan duplicates this block: one processes u1:Tu_{1:T} (forward), one processes uT:1u_{T:1} (backward). Their hidden states, ht\overrightarrow{h}_t and ht\overleftarrow{h}_t, are concatenated

htbi=[ht    ht]R2Hh_t^{\text{bi}} = \left[ \overrightarrow{h}_t \;\|\; \overleftarrow{h}_t \right] \in \mathbb{R}^{2H}

providing every timestep with context from both earlier and later data (Lavaud et al., 2024).

3. Neural Architecture and Processing Pipeline

Bi-Mamba is organized in a pipeline of specialized blocks:

  • Segmentation block: BiMamba with per-step features (Δx\Delta x, Δy\Delta y, one-dimensional MSD, turning angle, radial distance, etc.), two parallel Mamba layers (H=128H=128), and a feed-forward net producing CC-class softmax outputs for segmentation/classification.
  • K-regression and α\alpha-regression blocks: Each is a unidirectional Mamba with H=128H=128 on augmented features (input+one-hot segment labels), whose final hidden vectors are pooled and passed to a small MLP for each output parameter.
  • Pooling strategy: For global regression, the final hidden state or a global average is used before an output MLP (Lavaud et al., 2024).

Table: Bi-Mamba Core Building Blocks

Stage Input Features Hidden Size Output Head
Segmentation Kinematic+geometry (dd≈10) H=128H=128 Per-timestep, CC-class softmax
K-regression Features + 1-hot seg. H=128H=128 Pooled, 64164 \to 1 MLP
α\alpha-regression Features + 1-hot seg. H=128H=128 Pooled, 64164 \to 1 MLP

4. Bidirectional Context and Task Heads

Bidirectionality is operationalized by two symmetric Mamba blocks scanning in forward and reverse time. In segmentation, each timestep's features are concatenated and processed by a feed-forward net to produce a CC-way softmax. For regression, the bidirectional hidden states are globally pooled (last state or mean), then fed through a small MLP to predict KK or α\alpha (Lavaud et al., 2024).

This symmetric fusion allows the model to capture context from both the past and the future, which improves classification of ambiguous trajectory segments that depend on non-local motion cues, and provides more accurate global parameter estimation than causal-only models.

5. Multi-Task Objective and Optimization

The model is trained with a composite loss:

Ltotal=Lseg+LK+Lα+λθ22\mathcal{L}_{\text{total}} = \mathcal{L}_{\rm seg} + \mathcal{L}_K + \mathcal{L}_\alpha + \lambda\|\theta\|^2_2

where

  • Lseg\mathcal{L}_{\rm seg}: weighted cross-entropy over per-timestep segmentation,
  • LK\mathcal{L}_K: mean-squared log error between predicted and true KK,
  • Lα\mathcal{L}_\alpha: mean absolute error for α\alpha,
  • λ\lambda: weight decay for regularization.

The optimizer is Adam, with a learning rate 104\sim 10^{-4} (Lavaud et al., 2024). Targets are supplied directly from the labeled dataset.

6. Inference Methodology

Inference on a trajectory r1:Tr_{1:T} follows a fixed pipeline:

  1. Compute per-timestep features utu_t for t=1,,Tt=1,\dots,T.
  2. Run forward and backward Bi-Mamba segmentation to obtain class probabilities.
  3. Augment per-step features with segmentation outputs.
  4. Run K- and α\alpha-regression Mamba blocks.
  5. Pool the hidden vectors and apply output heads to produce (K^,α^)(\hat K, \hat\alpha).

This procedure yields a deterministic, approximately MAP estimate

(K^,α^)argmaxK,αpθ(K,αr1:T)(\hat K, \hat\alpha) \approx \arg\max_{K,\alpha} p_\theta(K,\alpha | r_{1:T})

with no iterative fitting, suitable for real-time or high-throughput analysis (Lavaud et al., 2024).

7. Empirical Results and Comparative Performance

On the AnDi-2 single-trajectory subchallenge (no missing data, T200T \leq 200), Bi-Mamba achieved:

  • α\alpha-inference MAE: 0.27 (7th place)
  • KK-inference MSLE: 0.05 (9th place)
  • Diffusion-type classification (e.g., trapped α<0.2\alpha < 0.2, normal α1\alpha \approx 1, directed α>1.8\alpha > 1.8) F1 score: 0.91 (3rd place)
  • Change-point detection RMSE: 2.7 frames (10th place)

Relative to a bidirectional RNN baseline, Bi-Mamba demonstrates lower validation loss across all tasks, smaller variance over epochs, and faster/superior convergence (Lavaud et al., 2024). The design is particularly well-suited to single-trajectory inference, outperforming classical and strong RNN baselines—especially for short, noisy tracks where bidirectional context and informed gating are essential.


Bi-Mamba thus exemplifies a state-space sequence model that operationalizes bidirectional temporal context within a highly efficient, hardware-friendly architecture. Its design is domain-agnostic, but its utility is particularly prominent for short sequence regression and segmentation in stochastic systems. The architecture provides a scalable, multi-head foundation that can be adapted or extended to other time series, spatiotemporal, or dynamical settings with similar requirements for efficient joint modeling of local and global, past and future information.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Bi-Mamba.