Bi-Mamba: Bidirectional Neural Architecture
- Bi-Mamba is a bidirectional state-space neural architecture that fuses forward and reverse Mamba modules to provide every timestep with past and future context.
- It integrates specialized segmentation, K-regression, and α-regression blocks to directly infer diffusion parameters and detect change points in noisy trajectories.
- Empirical results show Bi-Mamba’s superior accuracy and faster convergence compared to traditional unidirectional models and bidirectional RNN baselines.
Bi-Mamba refers to a class of neural architectures that generalize the Mamba selective State Space Model (SSM) framework to bidirectional processing. The Bi-Mamba principle is to run two Mamba state-space modules in parallel: one processes a sequence in the forward direction, and the other processes a reversed copy. Their outputs are concatenated or fused, furnishing every position with both past and future context. This bidirectional context differs sharply from the intrinsically causal, unidirectional standard in original Mamba, enabling Bi-Mamba to provide global receptive fields with linear-time complexity, and has shown strong empirical advantages in regression, segmentation, and dynamical parameter estimation for short, noisy sequences (Lavaud et al., 2024).
1. The Anomalous Diffusion Inference Problem
Bi-Mamba was initially introduced in the context of anomalous diffusion characterization. For a 2D trajectory , with , the task is to infer the effective diffusion coefficient and anomalous exponent such that
where represents normal diffusion, corresponds to subdiffusion, and to superdiffusion. For short, noisy trajectories, traditional mean-squared displacement (MSD) regression is unreliable. Bi-Mamba addresses this by providing an end-to-end regressor that directly outputs predicted from single-trajectory data, along with auxiliary tasks such as diffusion-state segmentation and change-point detection (Lavaud et al., 2024).
2. Selective State-Space and Bidirectional Recurrence
Each Mamba block at time maintains a latent state updated by
where is a per-time-step feature vector, , , , , and are learned, and is the elementwise product. The bidirectional scan duplicates this block: one processes (forward), one processes (backward). Their hidden states, and , are concatenated
providing every timestep with context from both earlier and later data (Lavaud et al., 2024).
3. Neural Architecture and Processing Pipeline
Bi-Mamba is organized in a pipeline of specialized blocks:
- Segmentation block: BiMamba with per-step features (, , one-dimensional MSD, turning angle, radial distance, etc.), two parallel Mamba layers (), and a feed-forward net producing -class softmax outputs for segmentation/classification.
- K-regression and -regression blocks: Each is a unidirectional Mamba with on augmented features (input+one-hot segment labels), whose final hidden vectors are pooled and passed to a small MLP for each output parameter.
- Pooling strategy: For global regression, the final hidden state or a global average is used before an output MLP (Lavaud et al., 2024).
Table: Bi-Mamba Core Building Blocks
| Stage | Input Features | Hidden Size | Output Head |
|---|---|---|---|
| Segmentation | Kinematic+geometry (≈10) | Per-timestep, -class softmax | |
| K-regression | Features + 1-hot seg. | Pooled, MLP | |
| -regression | Features + 1-hot seg. | Pooled, MLP |
4. Bidirectional Context and Task Heads
Bidirectionality is operationalized by two symmetric Mamba blocks scanning in forward and reverse time. In segmentation, each timestep's features are concatenated and processed by a feed-forward net to produce a -way softmax. For regression, the bidirectional hidden states are globally pooled (last state or mean), then fed through a small MLP to predict or (Lavaud et al., 2024).
This symmetric fusion allows the model to capture context from both the past and the future, which improves classification of ambiguous trajectory segments that depend on non-local motion cues, and provides more accurate global parameter estimation than causal-only models.
5. Multi-Task Objective and Optimization
The model is trained with a composite loss:
where
- : weighted cross-entropy over per-timestep segmentation,
- : mean-squared log error between predicted and true ,
- : mean absolute error for ,
- : weight decay for regularization.
The optimizer is Adam, with a learning rate (Lavaud et al., 2024). Targets are supplied directly from the labeled dataset.
6. Inference Methodology
Inference on a trajectory follows a fixed pipeline:
- Compute per-timestep features for .
- Run forward and backward Bi-Mamba segmentation to obtain class probabilities.
- Augment per-step features with segmentation outputs.
- Run K- and -regression Mamba blocks.
- Pool the hidden vectors and apply output heads to produce .
This procedure yields a deterministic, approximately MAP estimate
with no iterative fitting, suitable for real-time or high-throughput analysis (Lavaud et al., 2024).
7. Empirical Results and Comparative Performance
On the AnDi-2 single-trajectory subchallenge (no missing data, ), Bi-Mamba achieved:
- -inference MAE: 0.27 (7th place)
- -inference MSLE: 0.05 (9th place)
- Diffusion-type classification (e.g., trapped , normal , directed ) F1 score: 0.91 (3rd place)
- Change-point detection RMSE: 2.7 frames (10th place)
Relative to a bidirectional RNN baseline, Bi-Mamba demonstrates lower validation loss across all tasks, smaller variance over epochs, and faster/superior convergence (Lavaud et al., 2024). The design is particularly well-suited to single-trajectory inference, outperforming classical and strong RNN baselines—especially for short, noisy tracks where bidirectional context and informed gating are essential.
Bi-Mamba thus exemplifies a state-space sequence model that operationalizes bidirectional temporal context within a highly efficient, hardware-friendly architecture. Its design is domain-agnostic, but its utility is particularly prominent for short sequence regression and segmentation in stochastic systems. The architecture provides a scalable, multi-head foundation that can be adapted or extended to other time series, spatiotemporal, or dynamical settings with similar requirements for efficient joint modeling of local and global, past and future information.