Dynamic Mamba Module: Adaptive SSM Blocks

Updated 10 November 2025

DM-module is a novel adaptive state-space component that dynamically reshapes computation with deformable scanning and dynamic tokenization.
It integrates parallel multi-grained SSM branches and early exit classifiers to balance long-term context with efficient inference.
Empirical results highlight improved accuracy and reduced computational cost across visual, reinforcement learning, and language modeling tasks.

The Dynamic Mamba Module (DM-module) refers to a class of building blocks in artificial neural networks that generalize the linear-time selective state-space modeling of Mamba architecture to enable various forms of dynamic processing. DM-modules have been instantiated across diverse domains including visual foundation models, offline and online reinforcement learning, multi-view stereo, and LLMs. Their defining characteristic is to replace, augment, or restructure standard Mamba blocks—typically by adding deformable scanning, multi-grained sequence branches, dynamic scan orders, or auxiliary classifier heads—so that computation and feature extraction are adaptively steered by local or global input structure.

1. Architectural Foundations and Key Mechanisms

DM-modules are typically constructed as extensions of the canonical Mamba state-space model block, which computes hidden states recurrently with learned matrices $A, B$ and produces outputs by $y_t = C h_t$ , where $h_t = \bar{A} h_{t-1} + \bar{B} x_t$ for discretized SSMs. These blocks achieve linear computational complexity as sequence length grows, contrasting the quadratic cost of Transformer attention (Liu et al., 8 Apr 2025, Jiang et al., 3 Nov 2025, Lv et al., 8 Jun 2024, Huang et al., 31 May 2024, Nogales et al., 29 Apr 2025).

Core mechanisms of DM-modules include:

Deformable Scanning and Dynamic Tokenization: In DefMamba (Liu et al., 8 Apr 2025), feature maps $x\in\mathbb{R}^{H\times W\times C}$ are warped using OffsetNet to produce spatial offsets $\Delta p$ and token-index offsets $\Delta t$ . Bilinear interpolation yields positions $\hat p = p + \hat{\Delta p}$ , while scan order is rearranged as $t_d = t_r + \Delta t$ , tokens are sorted on $t_d$ forming a dynamic sequence before SSM application.
Early Exit with Classifier Heads: DYNAMAX (Nogales et al., 29 Apr 2025) introduces DM-modules as standard Mamba blocks augmented with a two-logit classifier that decides, at each module and for each token, whether to terminate further computation based on predicted confidence $p_{\mathrm{exit}} \geq \theta$ .
Multi-Grained Branches: In Decision Mamba (Lv et al., 8 Jun 2024), each DM-module comprises a coarse-grained (inter-step) SSM modeling long temporal relationships and a fine-grained (intra-step) SSM capturing short-range dependencies among state, RTG, and action triplets. Parallel branches are fused via gating, producing richer cross-scale representations.
Reference-Centered Dynamic Scanning: In MVSMamba (Jiang et al., 3 Nov 2025), the DM-module fuses reference and source feature maps in four compass directions, extracts 1D sequences via dynamic Morton-like scan orders, and processes each via a dedicated Mamba block. This yields omnidirectional, cross-view global aggregation in linear time.

2. Mathematical Formulation and Implementation

A representative DM-block (DefMamba) follows:

Let $x \in \mathbb{R}^{H\times W\times C}$ , $o = \mathrm{OffsetNet}(x) \in \mathbb{R}^{H\times W\times 3}$ , $(\Delta p, \Delta t) = \mathrm{Split}(\tanh(o); 2, 1)$ :

Normalize $\Delta p$ to unit token scale.
$p_{i,j} = (\frac{2j}{W}-1, \frac{2i}{H}-1)$ is reference grid.
Sample features: $\widetilde{x}_{i,j} = \phi(x, \hat p_{i,j}) + \phi(R, \hat p_{i,j})$ .
Rearrange token sequence by deformable index $t_d = t_r + \Delta t$ ; sort and form permutation $\pi$ .
Run SSM on sequence $u = [\widetilde{x}_{\pi(1)}, \dots, \widetilde{x}_{\pi(N)}]$ , as $h_t = \bar{A} h_{t-1} + \bar{B} u_t$ .

Typical pseudocode for DM-block (Liu et al., 8 Apr 2025):

def DM_Block(x):
    o = OffsetNet(x)
    hat_o = np.tanh(o)
    Delta_p, Delta_t = np.split(hat_o, [2], axis=-1)
    Delta_p = Delta_p / np.array([W, H])
    p = create_reference_grid(H, W)
    hat_p = p + Delta_p
    x_sampled = bilinear_sample(x, hat_p) + bilinear_sample(R, hat_p)
    t_r = flatten_indices(H, W)
    t_d = t_r + Delta_t.flatten()
    permutation = t_d.argsort()
    u = x_sampled.flatten()[permutation]
    y_ss, h_out = SSM_Select(u)
    x_prime = x + y_ss
    return x_prime + FFN(LN(x_prime))

Other DM-module variants adopt similar formal constructs: multiple parallel SSM branches, auxiliary classifier heads, or context-dependent sub-goal vectors.

3. Domain-Specific Implementations

The DM-module paradigm spans several domains, with each instantiation uniquely tailoring the module structure:

Visual Modeling (DefMamba, MVSMamba): DM-blocks in DefMamba (Liu et al., 8 Apr 2025) and MVSMamba (Jiang et al., 3 Nov 2025) dynamically control feature aggregation, scan order, and spatial bias to better capture context and object boundaries than previous fixed-scan Mamba variants. MVSMamba leverages dynamic scan offsets and Morton-like orders to process multi-view stereo inputs efficiently.
Reinforcement Learning (Decision Mamba, DM-H): DM modules substitute Transformer backbones in RL agents (Lv et al., 8 Jun 2024, Huang et al., 31 May 2024), decomposing the state into multi-grained SSM branches. DM-H introduces a hybrid architecture, where Mamba generates temporally extended sub-goals, prompting a Transformer which synthesizes high-quality predictions. Valuable state selection ensures alignment between learned and optimal sub-goals.
LLMs and Inference Efficiency (DYNAMAX): DM-modules are placed as checkpoints in Mamba-based LLM decoders (Nogales et al., 29 Apr 2025), providing token-wise "exit-vs-continue" classification. This enables dynamic computational truncation, significant FLOP savings (15–35%) at negligible drop in NLP metrics, outperforming conventional static layer pruning.

4. Performance Analysis and Empirical Results

Systematic benchmarking demonstrates that DM-modules yield competitive or state-of-the-art results with superior efficiency:

Domain	Metric	Baseline	DM-Module Variant	Improvement
Vision (ImageNet)	Top-1 Accuracy (Tiny, 8M params)	76.9%	78.6% (DM-block)	+1.7%
Vision (COCO/ADE)	Detection/Segmentation (mAP/mIoU)	VMamba-T	DefMamba-S	+0.3–0.6
Stereo (DTU)	Depth error (mm), runtime, params	MVSFormer++*	MVSMamba*	Lower error, –52% runtime, –67% params
NLP (TriviaQA)	Exact Match (%)	46.2 (full)	45.8 (DM, 20% FLOPs saved)	Comparable
RL (HalfCheetah)	Return	94.1 (BC-10%)	96.2±0.3 (DM-H)	+2.1

Ablation studies further indicate that key components (deformable points/tokens, offset bias, channel attention) contribute incrementally to performance (Liu et al., 8 Apr 2025). In DYNAMAX (Nogales et al., 29 Apr 2025), dynamic exit via DM-modules uniformly exceeds static pruning for cost-effective inference.

5. Advantages, Limitations, and Deployment Practices

Advantages of DM-modules include:

Linear-time processing for arbitrarily long sequences, enabling scalability otherwise infeasible in Transformer-based settings.
Flexible feature warping (DefMamba, MVSMamba) and dynamic scan-order adaptation for content-aware extraction.
Effective computational allocation via early-exit classifiers (DYNAMAX).
Improved OOD and noise robustness in RL via multi-grained SSM and progressive regularization (PSER) (Lv et al., 8 Jun 2024).
Orthogonality to quantization, pruning, and parameter-efficient finetuning (LoRA).

However, certain limitations arise:

Deformable branches alone can destabilize training; stabilization via parallel vanilla SSM branches is generally required (Liu et al., 8 Apr 2025).
Not all domains benefit equally—for example, replicating dynamic scanning at every FPN scale in MVS does not yield additive gains (Jiang et al., 3 Nov 2025).
Gradients through sorting operations in dynamic scan-order branches are approximated, not exact.

Deployment practices recommend:

Placing DM-modules in latter network halves to preserve long-range context (DYNAMAX).
Leveraging Mamba blocks themselves as classifier heads for early exit.
Combining DM-modules with quantization/LoRA for maximized efficiency.
Using held-out sets to calibrate exit thresholds and sub-goal intervals for optimal cost-accuracy trade-offs.

6. Research Significance and Future Directions

The DM-module concept exemplifies the shift towards adaptive and context-aware computation within the state-space model regime. By generalizing linear-time Mamba blocks with mechanisms for deformable sampling, multi-grained sequential reasoning, and data-dependent computational allocation, DM-modules offer a scalable alternative to Transformer-based architectures especially for vision, RL, multi-view geometry, and LLMs where inference efficiency and context depth are crucial.

Continued research is exploring:

Further domain adaptation of DM-modules, including integration with other efficiency-driven techniques (quantization, low-rank adapters).
More principled methods for dynamic scan-order gradient estimation.
Cross-domain fusion: hybrid visual-linguistic DM-modules or task-conditioned dynamic computation.

A plausible implication is the increasing adoption in settings where computational resources and latency are constrained, notably on-device and embedded AI, as demonstrated by VMamba and DYNAMAX (Jiang et al., 3 Nov 2025, Nogales et al., 29 Apr 2025). The DM-module framework thus marks a convergence between architectural adaptability and resource efficiency in modern deep learning.