Bi-Mamba: Bidirectional Neural Architecture

Updated 19 February 2026

Bi-Mamba is a bidirectional state-space neural architecture that fuses forward and reverse Mamba modules to provide every timestep with past and future context.
It integrates specialized segmentation, K-regression, and α-regression blocks to directly infer diffusion parameters and detect change points in noisy trajectories.
Empirical results show Bi-Mamba’s superior accuracy and faster convergence compared to traditional unidirectional models and bidirectional RNN baselines.

Bi-Mamba refers to a class of neural architectures that generalize the Mamba selective State Space Model (SSM) framework to bidirectional processing. The Bi-Mamba principle is to run two Mamba state-space modules in parallel: one processes a sequence in the forward direction, and the other processes a reversed copy. Their outputs are concatenated or fused, furnishing every position with both past and future context. This bidirectional context differs sharply from the intrinsically causal, unidirectional standard in original Mamba, enabling Bi-Mamba to provide global receptive fields with linear-time complexity, and has shown strong empirical advantages in regression, segmentation, and dynamical parameter estimation for short, noisy sequences (Lavaud et al., 2024).

1. The Anomalous Diffusion Inference Problem

Bi-Mamba was initially introduced in the context of anomalous diffusion characterization. For a 2D trajectory $r(t) = (x(t), y(t))$ , with $t=1,\dots,T$ , the task is to infer the effective diffusion coefficient $K$ and anomalous exponent $\alpha$ such that

$\mathrm{MSD}(t) = \langle |r(t)-r(0)|^2 \rangle = 4 K t^{\alpha}$

where $\alpha=1$ represents normal diffusion, $\alpha < 1$ corresponds to subdiffusion, and $\alpha > 1$ to superdiffusion. For short, noisy trajectories, traditional mean-squared displacement (MSD) regression is unreliable. Bi-Mamba addresses this by providing an end-to-end regressor that directly outputs predicted $(\hat K, \hat\alpha)$ from single-trajectory data, along with auxiliary tasks such as diffusion-state segmentation and change-point detection (Lavaud et al., 2024).

2. Selective State-Space and Bidirectional Recurrence

Each Mamba block at time $t$ maintains a latent state $h_t \in \mathbb{R}^H$ updated by

$z_t = \sigma(W_z h_{t-1} + U_z u_t + b_z ), \quad \tilde{h}_t = \tanh(W_h h_{t-1} + U_h u_t + b_h )$

$h_t = z_t \odot h_{t-1} + (1 - z_t) \odot \tilde{h}_t$

$o_t = V h_t + c$

where $u_t$ is a per-time-step feature vector, $W$ , $U$ , $b$ , $V$ , and $c$ are learned, and $\odot$ is the elementwise product. The bidirectional scan duplicates this block: one processes $u_{1:T}$ (forward), one processes $u_{T:1}$ (backward). Their hidden states, $\overrightarrow{h}_t$ and $\overleftarrow{h}_t$ , are concatenated

$h_t^{\text{bi}} = \left[ \overrightarrow{h}_t \;\|\; \overleftarrow{h}_t \right] \in \mathbb{R}^{2H}$

providing every timestep with context from both earlier and later data (Lavaud et al., 2024).

3. Neural Architecture and Processing Pipeline

Bi-Mamba is organized in a pipeline of specialized blocks:

Segmentation block: BiMamba with per-step features ( $\Delta x$ , $\Delta y$ , one-dimensional MSD, turning angle, radial distance, etc.), two parallel Mamba layers ( $H=128$ ), and a feed-forward net producing $C$ -class softmax outputs for segmentation/classification.
K-regression and $\alpha$ -regression blocks: Each is a unidirectional Mamba with $H=128$ on augmented features (input+one-hot segment labels), whose final hidden vectors are pooled and passed to a small MLP for each output parameter.
Pooling strategy: For global regression, the final hidden state or a global average is used before an output MLP (Lavaud et al., 2024).

Table: Bi-Mamba Core Building Blocks

Stage	Input Features	Hidden Size	Output Head
Segmentation	Kinematic+geometry ( $d$ ≈10)	$H=128$	Per-timestep, $C$ -class softmax
K-regression	Features + 1-hot seg.	$H=128$	Pooled, $64 \to 1$ MLP
$\alpha$ -regression	Features + 1-hot seg.	$H=128$	Pooled, $64 \to 1$ MLP

4. Bidirectional Context and Task Heads

Bidirectionality is operationalized by two symmetric Mamba blocks scanning in forward and reverse time. In segmentation, each timestep's features are concatenated and processed by a feed-forward net to produce a $C$ -way softmax. For regression, the bidirectional hidden states are globally pooled (last state or mean), then fed through a small MLP to predict $K$ or $\alpha$ (Lavaud et al., 2024).

This symmetric fusion allows the model to capture context from both the past and the future, which improves classification of ambiguous trajectory segments that depend on non-local motion cues, and provides more accurate global parameter estimation than causal-only models.

5. Multi-Task Objective and Optimization

The model is trained with a composite loss:

$\mathcal{L}_{\text{total}} = \mathcal{L}_{\rm seg} + \mathcal{L}_K + \mathcal{L}_\alpha + \lambda\|\theta\|^2_2$

where

$\mathcal{L}_{\rm seg}$ : weighted cross-entropy over per-timestep segmentation,
$\mathcal{L}_K$ : mean-squared log error between predicted and true $K$ ,
$\mathcal{L}_\alpha$ : mean absolute error for $\alpha$ ,
$\lambda$ : weight decay for regularization.

The optimizer is Adam, with a learning rate $\sim 10^{-4}$ (Lavaud et al., 2024). Targets are supplied directly from the labeled dataset.

6. Inference Methodology

Inference on a trajectory $r_{1:T}$ follows a fixed pipeline:

Compute per-timestep features $u_t$ for $t=1,\dots,T$ .
Run forward and backward Bi-Mamba segmentation to obtain class probabilities.
Augment per-step features with segmentation outputs.
Run K- and $\alpha$ -regression Mamba blocks.
Pool the hidden vectors and apply output heads to produce $(\hat K, \hat\alpha)$ .

This procedure yields a deterministic, approximately MAP estimate

$(\hat K, \hat\alpha) \approx \arg\max_{K,\alpha} p_\theta(K,\alpha | r_{1:T})$

with no iterative fitting, suitable for real-time or high-throughput analysis (Lavaud et al., 2024).

7. Empirical Results and Comparative Performance

On the AnDi-2 single-trajectory subchallenge (no missing data, $T \leq 200$ ), Bi-Mamba achieved:

$\alpha$ -inference MAE: 0.27 (7th place)
$K$ -inference MSLE: 0.05 (9th place)
Diffusion-type classification (e.g., trapped $\alpha < 0.2$ , normal $\alpha \approx 1$ , directed $\alpha > 1.8$ ) F1 score: 0.91 (3rd place)
Change-point detection RMSE: 2.7 frames (10th place)

Relative to a bidirectional RNN baseline, Bi-Mamba demonstrates lower validation loss across all tasks, smaller variance over epochs, and faster/superior convergence (Lavaud et al., 2024). The design is particularly well-suited to single-trajectory inference, outperforming classical and strong RNN baselines—especially for short, noisy tracks where bidirectional context and informed gating are essential.

Bi-Mamba thus exemplifies a state-space sequence model that operationalizes bidirectional temporal context within a highly efficient, hardware-friendly architecture. Its design is domain-agnostic, but its utility is particularly prominent for short sequence regression and segmentation in stochastic systems. The architecture provides a scalable, multi-head foundation that can be adapted or extended to other time series, spatiotemporal, or dynamical settings with similar requirements for efficient joint modeling of local and global, past and future information.

Markdown Upgrade to Chat

References (1)

Bidirectional Mamba state-space model for anomalous diffusion (2024)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Bi-Mamba.

Bi-Mamba: Bidirectional Neural Architecture

1. The Anomalous Diffusion Inference Problem

2. Selective State-Space and Bidirectional Recurrence

3. Neural Architecture and Processing Pipeline

4. Bidirectional Context and Task Heads

5. Multi-Task Objective and Optimization

6. Inference Methodology

7. Empirical Results and Comparative Performance

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research

Bi-Mamba: Bidirectional Neural Architecture

1. The Anomalous Diffusion Inference Problem

2. Selective State-Space and Bidirectional Recurrence

3. Neural Architecture and Processing Pipeline

4. Bidirectional Context and Task Heads

5. Multi-Task Objective and Optimization

6. Inference Methodology

7. Empirical Results and Comparative Performance

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research