Papers
Topics
Authors
Recent
Search
2000 character limit reached

ExtBiMamba: Flexible Bidirectional SSMs

Updated 12 May 2026
  • ExtBiMamba is a family of advanced bidirectional state-space models that integrates forward and backward passes with cross-task and multi-scale interactions.
  • It employs continuous-time SSM kernels for linear-time sequence modeling and adapts to diverse tasks such as language modeling, speaker diarization, and dense vision prediction.
  • The design features flexible, hardware-friendly quantization supporting 1-bit to multi-bit precision, significantly reducing energy consumption and boosting inference throughput.

ExtBiMamba is a family of advanced, flexible state space models that generalize and extend the Bi-Mamba and Mamba architectures by enabling bidirectional, multi-task, and multi-bit extensions. The core design leverages continuous-time state-space model (SSM) kernels for linear-time sequence modeling and augments them with bidirectional information flow, cross-task/multi-scale interactions, and flexible quantization. These properties enable ExtBiMamba to scale efficiently to long sequences or high-dimensional inputs, with direct applications across language modeling, speaker diarization, and multi-task dense prediction in vision and speech domains (Tang et al., 2024, Liao et al., 27 Jan 2026, Cao et al., 28 Aug 2025).

1. Foundational Principles and Core Architecture

ExtBiMamba originates from the selective SSM block of Mamba. For a sequence of inputs {xtRd}\{x_t\in\mathbb{R}^d\}, the SSM block computes hidden states hth_t and outputs yty_t via: ht=Aˉht1+Bˉxt,yt=Cht+Dxth_t = \bar A h_{t-1} + \bar B x_t,\quad y_t = C h_t + D x_t where Aˉ=exp(ΔA)\bar A = \exp(\Delta A) and Bˉ=(exp(ΔA)I)(ΔA)1ΔB\bar B = (\exp(\Delta A) - I)(\Delta A)^{-1} \Delta B are discretized SSM parameters, and CC, DD are learned output-mixing matrices. This structure allows global convolutional context with only linear time and memory cost in sequence length, scaling as O(TN2)O(T \cdot N^2).

To extend unidirectional SSMs, ExtBiMamba incorporates a second, backward SSM pass, mirroring the recurrence from sequence end-to-beginning, with its own independent parameters. Both forward and backward outputs are fused at each timestep using a learnable gating mechanism. This approach confers the ability to capture both past and future context, addressing the unidirectionality constraints of original Mamba and yielding richer sequence representations.

For multi-task settings, ExtBiMamba adapts its block structure to process per-task input streams and fuse their representations via advanced bidirectional scanning and cross-task feature refinement. In quantized variants, it also supports a spectrum of precision regimes, from aggressive 1-bit to multi-bit hybrid compositions (Tang et al., 2024, Cao et al., 28 Aug 2025).

2. Bidirectional and Cross-Task Recurrence Mechanisms

The bidirectional extension is mathematically formulated as follows for each timestep tt:

  • Forward pass (past hth_t0 present):

hth_t1

  • Backward pass (future hth_t2 present):

hth_t3

  • Fusion using a sigmoid gate:

hth_t4

where hth_t5 are trainable SSM parameters for each direction, and hth_t6 parameterize the fusion MLP.

For multi-task applications, bidirectional interaction is generalized. In BIM (an ExtBiMamba implementation), task-specific features undergo both task-first and position-first bidirectional BI-Scan passes—each modeled as a sequence processed by SSM blocks, coordinating dependencies across tasks or spatial positions efficiently at hth_t7 cost, where hth_t8 is the number of tasks and hth_t9 is the number of locations (Cao et al., 28 Aug 2025).

3. Flexible Quantization and Hybrid Precision

A central design axis in ExtBiMamba is its support for "flexible-bit" quantization, motivated by the Bi-Mamba 1-bit SSM framework. All linear layers are binarized as: yty_t0 where the learnable yty_t1 vectors preserve magnitude information. A straight-through estimator gradient is used to facilitate backpropagation through the sign operation.

ExtBiMamba extends this further via:

  • Multi-bit hybridization: Majority of weights remain 1-bit, while select blocks leverage 2–3 bits to boost dynamic range where sensitivity is highest, e.g., in output projections or gating modules. This configuration balances model expressivity with compute and memory efficiency.
  • Layerwise bit allocation: Early layers, responsible for extracting diverse features, may use higher precision; deeper layers revert to 1-bit, leveraging results demonstrating only modest perplexity increases under such partial binarization.

Energy, memory, and throughput are substantially improved with this quantization, with 1-bit matrix multiplies consuming up to yty_t2–yty_t3 less energy than 16-bit floating point, and up to a yty_t4 improvement in inference throughput over Transformer baselines on long contexts (Tang et al., 2024).

4. Advanced Scanning and Multi-Scale Interaction Modules

In multi-task vision contexts, ExtBiMamba introduces two scan-based modules:

  • BI-Scan (Bidirectional Interaction Scan) alternates between task-first and position-first serializations across all branches, applying SSMs in both orderings and directions. The process includes:
    • Serializing features by predetermined patterns
    • Running Mamba-based SSMs linearly over concatenated sequences (length yty_t5)
    • Aggregating forward and backward outputs
    • Fusing via gating masks for integration into each task-specific branch
  • MS-Scan (Multi-Scale Scan) extracts features at several spatial granularities. Input channels are split, windowed into varying patch sizes, each processed by four-way 2D SSM scans, then recombined—supporting adaptive multi-scale context integration.

Integration of BI-Scan and MS-Scan within the Mamba Feature Refinement (MFR) block delivers both intra-task scene structure modeling and fine-grained, scalable cross-task interactions. This design yields only linear complexity in yty_t6, compared to quadratic costs in naive cross-attention.

5. Empirical Performance and Applications

ExtBiMamba models have demonstrated consistently strong empirical results:

  • Language modeling: Bi-Mamba at 780M scale achieves perplexity yty_t7 (FP16: yty_t8), vastly outperforming traditional post-training binarization baselines (e.g., GPTQ-3bit: yty_t9; Bi-LLM: ht=Aˉht1+Bˉxt,yt=Cht+Dxth_t = \bar A h_{t-1} + \bar B x_t,\quad y_t = C h_t + D x_t0) (Tang et al., 2024).
  • Speaker diarization: In ConBiMamba, ExtBiMamba achieves state-of-the-art performance on datasets such as AISHELL-4 (DER 9.8% vs. 10.5% for Mamba-diarization) and VoxConverse (8.6% vs. 9.3%) (Liao et al., 27 Jan 2026).
  • Multi-task dense prediction (vision): On NYUD-V2, BIM achieves semantic segmentation mIoU of ht=Aˉht1+Bˉxt,yt=Cht+Dxth_t = \bar A h_{t-1} + \bar B x_t,\quad y_t = C h_t + D x_t1 (vs. ht=Aˉht1+Bˉxt,yt=Cht+Dxth_t = \bar A h_{t-1} + \bar B x_t,\quad y_t = C h_t + D x_t2 for baseline), depth RMSE ht=Aˉht1+Bˉxt,yt=Cht+Dxth_t = \bar A h_{t-1} + \bar B x_t,\quad y_t = C h_t + D x_t3 (vs. ht=Aˉht1+Bˉxt,yt=Cht+Dxth_t = \bar A h_{t-1} + \bar B x_t,\quad y_t = C h_t + D x_t4), and boundary odsF ht=Aˉht1+Bˉxt,yt=Cht+Dxth_t = \bar A h_{t-1} + \bar B x_t,\quad y_t = C h_t + D x_t5 (vs. ht=Aˉht1+Bˉxt,yt=Cht+Dxth_t = \bar A h_{t-1} + \bar B x_t,\quad y_t = C h_t + D x_t6) (Cao et al., 28 Aug 2025). On PASCAL-Context, comparable improvements accrue across all primary and auxiliary tasks.

The low memory and compute cost, combined with flexible quantization, make ExtBiMamba suitable for both resource-constrained edge inference and efficient large-scale training.

6. Computational Efficiency and Hardware Considerations

ExtBiMamba's critical efficiency properties derive from:

  • Linear time and memory scaling: Both in sequence length (SSMs) and number of per-task branches (BI-Scan), contrasting the ht=Aˉht1+Bˉxt,yt=Cht+Dxth_t = \bar A h_{t-1} + \bar B x_t,\quad y_t = C h_t + D x_t7 and ht=Aˉht1+Bˉxt,yt=Cht+Dxth_t = \bar A h_{t-1} + \bar B x_t,\quad y_t = C h_t + D x_t8 scaling in Transformers and pairwise attention, respectively.
  • Dense bit-packing: 1–3 bit weights enable dense storage in on-chip SRAM and fast bit-serial processing; recurrence updates in SSMs can be fused directly with binarized GEMMs.
  • Specialized hardware mapping: The architecture is explicitly amenable to bit-serial ASIC acceleration, where bitwise-XOR and popcount operations replace high-precision multiplies, and scale-and-shift (via ht=Aˉht1+Bˉxt,yt=Cht+Dxth_t = \bar A h_{t-1} + \bar B x_t,\quad y_t = C h_t + D x_t9) is applied post hoc by lightweight ALUs.

A consequential implication is that inference remains fast and scalable even for very long contexts or high-resolution spatial grids, unlocking real-time or low-latency applications.

7. Limitations and Prospective Extensions

Current ExtBiMamba models rely on predetermined scanning patterns and fixed backward passes, which may limit the capture of arbitrary spatial or structural couplings. Forward and backward SSM computation doubles certain resource costs; future work may explore fused or adaptive bidirectional kernels. The increased hyperparameter space in multi-bit and multi-scan settings introduces additional tuning challenges.

Proposed extensions include:

  • Learnable scan patterns: Reducing manual architectural choices by data-adaptive serialization (Cao et al., 28 Aug 2025).
  • Mixture-of-experts or attention-based task selection: Dynamic compute allocation per task or spatial region.
  • Deeper hardware–algorithm co-design: To further optimize bit-serial state updates and maximize inference throughput on emerging ASICs.

A plausible implication is continued accuracy gains with modest increases in energy and memory by incrementally reintroducing higher-bit modules along critical paths, as supported by empirical trade-offs observed in partially quantized models (Tang et al., 2024).

Summary Table of ExtBiMamba Features

Aspect Core Mechanism Complexity
Sequence modeling Bidirectional SSM + gating Aˉ=exp(ΔA)\bar A = \exp(\Delta A)0
Cross-task fusion BI-Scan (task/position-first, bidir) Aˉ=exp(ΔA)\bar A = \exp(\Delta A)1
Multi-scale context MS-Scan (windowed SSM at various scales) Aˉ=exp(ΔA)\bar A = \exp(\Delta A)2
Quantization Fully 1-bit; hybrid 2–3 bit in key blocks Aˉ=exp(ΔA)\bar A = \exp(\Delta A)3 (bits)
Hardware suitability Bit-serial compute, SRAM packing N/A

ExtBiMamba thus presents a versatile, computation- and memory-efficient sequence model, capable of state-of-the-art performance in tasks spanning language, speech, and vision, underpinned by holistic modeling of context, bit-flexibility, and hardware awareness (Tang et al., 2024, Liao et al., 27 Jan 2026, Cao et al., 28 Aug 2025).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to ExtBiMamba Model.