Papers
Topics
Authors
Recent
2000 character limit reached

LBMamba: Locally Bi-directional Mamba

Updated 4 December 2025
  • LBMamba is a neural sequence modeling block that integrates local forward and backward SSMs, enabling efficient bi-directional context within fixed windows.
  • It employs fixed-length window partitioning with simultaneous dual SSM passes and lightweight fusion, achieving near-linear computational scaling.
  • Empirical evaluations show LBMamba improves performance in computer vision, genomics, and time series tasks with minimal extra overhead.

Locally Bi-directional Mamba (LBMamba) refers to a class of neural sequence modeling blocks that integrate local bidirectional State Space Model (SSM) computations to enable efficient context aggregation within restricted windows of the input, combining the parallel-scan efficiency of Mamba with the representational strength of bi-directionality. Developed to address both the context-blindness of unidirectional recurrences and the prohibitive memory/computation costs of global bidirectional scans, LBMamba has been instantiated in varied domains, including computer vision, genomics, signal processing, and time series modeling. Characteristic features include (1) local window partitioning, (2) simultaneous forward/backward SSM passes per window, (3) lightweight fusion of directional states, and (4) strict linear or near-linear computational scaling with respect to input length or spatial size. Empirical investigations report consistent gains over unidirectional Mamba or attention-based architectures at minimal extra cost.

1. Motivation and Conceptual Foundations

Unidirectional SSM-based sequence models, notably Mamba (Zhang et al., 19 Jun 2025), can only access information from previous tokens, limiting their effectiveness for domains where both past and future context are crucial (e.g., vision, genomics, dense prediction, imputation). Traditional remedies—adding a global backward scan or fusing results from opposite directions—restore full receptive fields but nearly double memory bandwidth, kernel runtime, and inter-thread communication. LBMamba was introduced to enable bi-directional modeling confined to local windows, such that each token aggregates context from both directions within a limited spatial or temporal field, but without the bandwidth or memory penalties associated with global bidirectional sweeps (Zhang et al., 19 Jun 2025, Solís-García et al., 8 Oct 2024, Zhang et al., 3 Sep 2025, Cao et al., 28 Aug 2025, Schiff et al., 5 Mar 2024).

The approach generalizes across architectures: in computer vision, LBMamba blocks can be stacked with alternating scan directions to propagate local bidirectional information globally over depth; in time series and genomics, sliding windows or segmentations partition sequences for local bi-directional aggregation.

2. Core Architectural Elements and Mathematical Formulation

In LBMamba, a sequence is first partitioned into fixed-length local blocks (e.g., non-overlapping segments in time series, windows in vision, spatial patches in dense prediction). For each block, two directionally opposed SSMs—forward and backward—are computed, typically using shared parameters. Consider input x=(x1,...,xT)\mathbf{x} = (x_1, ..., x_T) split into windows of size MM.

  • Forward SSM (for window ww): sequential scan

htf=Aˉtfht1f+Bˉtfxth^{\mathrm{f}}_t = \bar{A}_t^{\mathrm{f}} h^{\mathrm{f}}_{t-1} + \bar{B}_t^{\mathrm{f}} x_t

  • Local Backward SSM (inside window ww):

htb={Bˉtfxt,tmodM=0 Aˉtfht+1b+Bˉtfxt,otherwiseh^{\mathrm{b}}_t = \begin{cases} \bar{B}_t^{\mathrm{f}} x_t, & t \bmod M = 0 \ \bar{A}_t^{\mathrm{f}} h^{\mathrm{b}}_{t+1} + \bar{B}_t^{\mathrm{f}} x_t, & \text{otherwise} \end{cases}

  • Fusion and Output Projection:

ht=htf+(htbBˉtfxt)h_t = h^{\mathrm{f}}_t + \big(h^{\mathrm{b}}_t - \bar{B}_t^{\mathrm{f}} x_t\big)

Followed by gating:

yt=Ctfht+Dfxty_t = C_t^{\mathrm{f}} h_t + D^{\mathrm{f}} x_t

with optional elementwise gating and projection, before applying residual and normalization.

In practical GPU kernels, both directional computations are performed entirely in per-thread registers and do not require extra high-bandwidth memory transfers or synchronization, preserving the bandwidth advantage of unidirectional Mamba (Zhang et al., 19 Jun 2025).

Variants for specific modalities implement the above with windowed convolutions (vision), segment tokenization (ECG/time series), or scan patterns combining directions and serialization/reshaping across both spatial and task axes (multi-task dense prediction) (Cao et al., 28 Aug 2025, Zhang et al., 3 Sep 2025).

3. Computational Complexity and Efficiency

LBMamba delivers strict locality and bi-directional context with minimal overhead:

  • For window/block size MM, the extra arithmetic for the local backward scan amounts to C/M\sim C/M (with CC the cost of the forward scan). With M=4M=4, this yields approximately 27% more FLOPs but results in only ∼2% wall-clock slowdown on specialized hardware (Zhang et al., 19 Jun 2025). Importantly, there is no added high-bandwidth memory traffic or register pressure.
  • For sliding-window variants (e.g., LBMamba in genomics (Schiff et al., 5 Mar 2024)), total complexity is O(TWlogW)O(TW \log W), which is effectively O(T)O(T) for constant WW.
  • When stacking multiple LBMamba blocks with alternating scan direction (LBVim), a global receptive field is recovered in depth U=O(T/M)U = O(T/M) (Zhang et al., 19 Jun 2025).

A summary of complexity profiles is provided below.

Model Asymptotic Complexity Memory/Bandwidth
Transformers O(T2)O(T^2) High
Unidirectional Mamba O(T)O(T) or O(TlogT)O(T \log T) Low
Global Bidirectional O(2T)O(2T) 2× Unidir Mamba
LBMamba O(T+T/M)O(T + T/M) Low

LBMamba thus provides a favorable accuracy-efficiency frontier compared to both unidirectional Mamba and global bidirectional Mamba (Zhang et al., 19 Jun 2025, Solís-García et al., 8 Oct 2024).

4. Domain-Specific Instantiations

Computer Vision: LBVim and LBVim-WSI

In LBVim, LBMamba blocks are stacked with scan direction alternated after each layer and applied to 1D-patchified image token sequences (Zhang et al., 19 Jun 2025). This design enables efficient global context propagation (within two layers every token receives information from the entire image), yielding Pareto improvements over convolutional or transformer-based backbones in classification, segmentation, and detection benchmarks:

  • ImageNet-1K: LBVim-300 achieves 77.7% top-1 @ 906 img/s versus Vim-Ti 76.1% @ 889 img/s.
  • ADE20K Segmentation: LBVim-300 achieves 43.7 mIoU @ 26 fps vs. 41.0 mIoU for Vim-Ti.
  • In pathology MIL (MambaMIL vs. LBMambaMIL), relative performance improvements up to +3.06% AUC, +3.39% F1, and +1.67% accuracy are reported on WSI benchmarks.

Multivariate Time Series and Diffusion Modeling

In TIMBA, LBMamba blocks are used for temporal encoding in imputation and denoising diffusion models (Solís-García et al., 8 Oct 2024). Each window operates over a length LL, running forward and backward S6 state-space recurrences, fusing via addition and gating, and projecting back into the model dimension with a residual connection. Experimental ablations demonstrate that switching from unidirectional to LBMamba blocks consistently reduces MSE by 4–12% across block- and point-missing scenarios on METR-LA and PEMS-BAY.

Bioinformatics and Genomics

LBMamba-DNA instantiates sliding-window local bi-directional blocks for long-range DNA sequence modeling, delivering O(T)O(T) complexity and matching or exceeding performance of global BiMamba on genomic benchmarks while reducing inference cost by ∼30%. Gains are particularly evident in long-context (VEP, T=131k) scenarios (Schiff et al., 5 Mar 2024).

Multi-task Dense Prediction

The BIM decoder applies local bi-directional scan ("BI-Scan") with interleaved task-first and position-first scanning, channel splitting, and multi-scale feature fusion. This achieves near pairwise-level cross-task interaction at O(THW)O(T\,H\,W) cost, with linear parameter and memory scaling. Empirically, LBMamba yields consistent boosts in segmentation mIoU, depth, and parsing accuracy over unidirectional Mamba baselines (Cao et al., 28 Aug 2025).

ECG Analysis

S²M²ECG employs LBMamba blocks on per-lead segmented ECG time series, applying forward and backward SSMs in each block, fusing via addition and residual, and aggregating across leads with FFN and SENet modules. This design achieves high diagnostic performance with only ∼0.7M parameters and strict O(Sd)O(Sd) cost per block (Zhang et al., 3 Sep 2025).

5. Implementation and Practical Considerations

  • LBMamba blocks are operationally backward-compatible with existing Mamba kernels, requiring only minor modification to introduce local backward scans within each thread/register context (Zhang et al., 19 Jun 2025).
  • The locality parameter (MM, window/segment size) is selected according to sequence length and device register constraints.
  • Class-token summarization is contraindicated; global average pooling or single-query attention is preferred for aggregation (Zhang et al., 19 Jun 2025).
  • Parameter initialization ensures SSM discretization stability (e.g., via learnable Δ0\Delta_0 biases such that Δ=log(1+exp(...))>0\Delta = \log(1+\exp(...)) > 0) (Zhang et al., 3 Sep 2025).
  • Gating mechanisms (e.g., SiLU, sigmoid) modulate the influence of fused directional outputs, suppressing irrelevant activations (Solís-García et al., 8 Oct 2024).

LBMamba blocks are easily extensible to non-vision modalities and multi-branch settings (e.g., multi-channel physiological time series), requiring structure-aware input tokenization and minor adaptation of the fusion steps.

6. Empirical Performance and Comparative Analysis

Across tasks and domains, the introduction of local bidirectionality in Mamba blocks yields measurable and often substantial improvements over both unidirectional and global-attention baselines at marginal compute or parameter cost:

  • LBVim and LBMambaMIL deliver +0.8–1.6% top-1/segmentation gains at equal or increased throughput (Zhang et al., 19 Jun 2025).
  • TIMBA with LBMamba blocks realizes 4–12% MSE improvement for time series imputation (Solís-García et al., 8 Oct 2024).
  • LBMamba-DNA reports +0.5–1% mean accuracy, +0.01 Matthews CC, and +0.03 AUROC on long-range genomics tasks, at 30% reduced inference cost (Schiff et al., 5 Mar 2024).
  • BIM (LBMamba) yields +1.58 mIoU (semseg, NYUD-v2), +4.83 mIoU total gain (PASCAL), and linear cost scaling in number of tasks (Cao et al., 28 Aug 2025).
  • S²M²ECG with LBMamba blocks maintains leading accuracy with <1<1M parameters and low inference latency (Zhang et al., 3 Sep 2025).

A common observation is that much of the context modeling power of global attention or bidirectional RNN/SSM can be recovered by stacking LBMamba blocks or carefully selecting scan/fusion patterns.

7. Relationship to Prior and Parallel Methods

LBMamba is distinct from earlier local/global Mamba variants (e.g., LocalMamba, LocalVMamba, as surveyed in (Xu et al., 29 Apr 2024)) in that it formally unifies in-register local backward recurrence with the forward scan, avoiding the redundant memory and synchronization overhead of explicit global bidirectional passes. No standalone definition or dedicated results for "LBMamba" exist in (Xu et al., 29 Apr 2024); only related local bidirectional architectures are reviewed under different terminology.

By comparison, Transformers and attention-based methods achieve context aggregation via O(T2)O(T^2) or O(N2)O(N^2) mechanisms, while global bidirectional SSMs double the unidirectional compute paths. Local variants of each, including LBMamba, focus on the accuracy–efficiency trade-off by constraining context range but increasing network depth or compositionality to compensate, yielding Pareto-optimal models in workload-constrained environments.

References

Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Locally Bi-directional Mamba (LBMamba).