Deformable Mamba: Adaptive SSM Models

Updated 7 December 2025

Deformable Mamba (DF-Mamba) is a class of neural architectures that enhance state-space models with adaptive deformation for superior content and geometric feature aggregation.
It employs mechanisms like content-adaptive aggregation, learnable token ordering, and offset-guided resampling to overcome fixed-grid limitations in vision, medical imaging, and remote sensing.
Extensive experiments demonstrate that DF-Mamba architectures improve performance in tasks such as MRI super-resolution, 3D understanding, and point cloud analysis while maintaining computational efficiency.

Deformable Mamba (DF-Mamba) refers to a class of neural architectures that augment Mamba-based state space models (SSMs) with explicit mechanisms for content- or geometry-adaptive feature aggregation, adaptive scan orders, or structured deformation. DF-Mamba architectures have been developed for diverse vision, medical imaging, remote sensing, and 3D understanding tasks. They share the common motivation of overcoming limitations inherent in fixed grid/topology neural models—namely, poor adaptation to content, shape, or spatially variant semantic structure—by introducing deformable scanning, adaptive sequencing, or offset-guided resampling directly within the SSM-based modeling pipeline.

1. Fundamental Principles and Variants

Deformable Mamba integrates two key ideas: (i) the use of SSMs, particularly Mamba, which delivers efficient long-sequence modeling in linear time and memory; and (ii) a deformable component, such as content-adaptive sampling, learnable token orderings, or offset-based resampling, which enables geometry- or content-aware information flow. This formulation generalizes and subsumes a range of prior concepts from deformable convolutions and adaptive serialization, while applying them within the scope of state-space-based sequence models.

Several variants exist, differentiated by domain and specific mechanism:

Modulated Deform Block: Content-adaptive local aggregation via learned spatial offsets and modulation scalars (e.g., for medical super-resolution) (Ji et al., 8 Jul 2024).
Deformable Scanning: Token index offsets and spatial shifts, enabling dynamic (learnable) scan paths through data (e.g., images or point clouds) (Liu et al., 8 Apr 2025, Liu et al., 3 Dec 2025).
Sparse/Adaptive Sequencing: Attention- or similarity-based token selection yielding sparse, deformable sequences for efficient SSM computation (notably for hyperspectral and temporal data) (Xu et al., 13 Apr 2025, Dewis et al., 29 Jul 2025).
Grouped/Topology-Guided Deformation: Enforcing structural priors (e.g., centerlines, anatomy) via group-wise or topology-aware SSM branches (Wang et al., 14 Aug 2024).

2. Technical Formulation Across Modalities

2.1. SSM and Mamba Core

Mamba blocks implement a selective scan mechanism, often codified as an input-driven discretization of the continuous SSM: $h'_t = A h_{t-1} + B x_t,\quad y_t = C h_t$ with $A, B, C$ parametrized by learned or input-dependent projections. The discretization—via the zero-order hold (ZOH)—yields efficient, global, linear recurrence or convolution over the sequence (Liu et al., 8 Apr 2025).

2.2. Deformable Mechanisms

a) Content-Adaptive Aggregation (Convolutional Deformation)

A spatial feature at location $p$ is computed as

$Y(p) = \sum_{k=1}^K w_k\, X\bigl(p + p_k + \Delta p_k\bigr) \cdot \Delta m_k$

where $p_k$ are canonical kernel offsets, $\Delta p_k$ and $\Delta m_k$ are spatially adaptive, learned via light convnets, and $w_k$ are filter weights. Bilinear interpolation is applied for non-integer offsets (Ji et al., 8 Jul 2024, Li et al., 1 Jul 2025, Hu et al., 25 Nov 2024).

b) Deformable Scanning / Token Reordering

Scan paths are adaptively shifted by learning both spatial offsets and token index offsets: $[\Delta p,\,\Delta t] = \tanh(\mathrm{OffsetNet}(F_\mathrm{agg}))$ producing deformed 1D sequences via sorted $t_{\rm raw} + \Delta t$ (Liu et al., 8 Apr 2025, Liu et al., 3 Dec 2025). In point cloud settings, differentiable continuous reordering is achieved with Gaussian weightings over index shifts (Liu et al., 3 Dec 2025).

c) Sparse/Attention-Based Deformable Sequencing

Tokens are selected by learned attention or cosine-similarity metrics with respect to anchors, with only a subset forwarded to SSM blocks: $\overline Z_j = [Z_j[i]]_{i \in I_s}$ where $I_s$ are indices of most relevant tokens. This is applied spatially, spectrally, or temporally as needed (Xu et al., 13 Apr 2025, Dewis et al., 29 Jul 2025).

3. Architectures Leveraging DF-Mamba

The deformable Mamba paradigm underpins a diverse set of architectures:

Variant	Deformation Mode	Application Domain
Deform-Mamba Net	Modulated Deform + SSM	MRI super-resolution
DefMamba	Deformable Scanning SSM	General vision (classification, det/seg)
DM3D	Offset-guided Gaussian scan	Point cloud understanding
Sparse Deformable Mamba	Sparse deform. sequence	HSI, MODIS classification
TGDM	Topology-guided deformable SSM	Anatomy segmentation (costal cartilage)
MambaReg	Disentangled sparse+deform	Unsupervised multimodal registration
UAVD-Mamba	Deformable token fusion	Multimodal UAV detection

Notably, the architectures consistently alternate standard SSM/Mamba blocks with deformable mechanisms and employ multi-branch or multi-scale strategies for robust feature representation across highly structured or irregular domains (Ji et al., 8 Jul 2024, Liu et al., 8 Apr 2025, Liu et al., 3 Dec 2025, Xu et al., 13 Apr 2025, Wang et al., 14 Aug 2024, Li et al., 1 Jul 2025).

4. Experimental Benchmarks and Ablation Findings

DF-Mamba models have delivered state-of-the-art or near-SOTA performance in each application domain tested:

MRI Super-Resolution: Outperforms SRCNN, VDSR, FMISR, T²Net, HAT on IXI and fastMRI; ablation shows necessity of deformable block, multi-scale context, and contrastive loss (Ji et al., 8 Jul 2024).
Point Cloud Analysis: On ModelNet40, DF-Mamba achieves 93.76% (no pretrain), surpassing PointMamba (92.9%) and PCM (93.4%); TPFF, deformable SSM, GKR, and GDR each contribute significant accuracy improvements (Liu et al., 3 Dec 2025).
Visual Recognition: On ImageNet-1K, DefMamba and its variants surpass ViT and SwinT at both tiny (8M) and base (51M) model scales; ablations confirm 1.0% top-1 gain from combined spatial and token offsets (Liu et al., 8 Apr 2025).
Wide FoV Segmentation: The Deformable Mamba decoder increases mIoU by 2.5 points on Stanford2D3D and uses 72% fewer FLOPs compared to UperNet, indicating substantial benefits for distortion-prone domains (Hu et al., 25 Nov 2024).
HSI and MODIS: Sparse deformable token sequencing drastically reduces computation (e.g., 59% FLOP reduction for SDMamba) while improving classification accuracy on Indian Pines and MODIS; small-class and boundary preservation are observed (Xu et al., 13 Apr 2025, Dewis et al., 29 Jul 2025).
Medical Registration: MambaReg/TGDM demonstrate significant gains in non-rigid multimodal alignment and anatomy segmentation, with improvements in Dice coefficient and robustness to anatomy-specific variation (Wen et al., 3 Nov 2024, Wang et al., 14 Aug 2024).
3D Hand Pose Estimation: DF-Mamba tribrid backbone yields measurable improvements (0.3–1 mm in MPJPE, +2.5% AUC) with throughput matching or exceeding ResNet-50 (Zhou et al., 2 Dec 2025).

Ablation studies across works consistently show that removal of any deformable or adaptive component results in a measurable drop in performance, underscoring the functional necessity of the adaptive mechanisms (e.g., –3% to –4% for removing deformable branches in DM3D (Liu et al., 3 Dec 2025), –1.1 mm MPJPE without deformable scan in hand pose (Zhou et al., 2 Dec 2025), and similar patterns in TGDM, DefMamba, and SDMamba).

5. Applications, Limitations, and Future Directions

Applications of Deformable Mamba span:

Super-resolution and segmentation in medical imaging with complex, multiscale, or topology-driven anatomical features (Ji et al., 8 Jul 2024, Wang et al., 14 Aug 2024).
Multimodal image registration, leveraging disentangled feature learning for difficult cross-contrast or cross-sensor tasks (Wen et al., 3 Nov 2024, Guo et al., 25 Jan 2024).
Remote sensing and Earth observation, including land cover dynamics from MODIS and classification from hyperspectral data (Dewis et al., 29 Jul 2025, Xu et al., 13 Apr 2025).
3D object understanding, segmentation, part recognition, and few-shot learning in point clouds (Liu et al., 3 Dec 2025).
Multimedia detection tasks where robust geometric adaptation and fusion are needed (e.g., UAV detection with multimodal inputs) (Li et al., 1 Jul 2025).
Sequence labeling and framewise prediction under occlusion and structured context dependencies (e.g., hand pose estimation) (Zhou et al., 2 Dec 2025).

Limitations identified by original works include:

Computational overhead from deformable index and offset computation, particularly in very large data regimes due to KNN, sorting, or attention-based selection.
Decreased robustness under strong domain shift (e.g., out-of-distribution segmentation in TGDM (Wang et al., 14 Aug 2024)).
The requirement for tuning sparsity/adaptivity parameters per dataset (λ in SDMamba, SDTM).
Algorithmic complexity for reproducibility, especially where non-differentiable operations (e.g., sorting) require approximations (Liu et al., 8 Apr 2025, Liu et al., 3 Dec 2025).

Prospective directions are suggested:

Dynamic or multiscale deformation, with point- or layer-adaptive sparsity (Xu et al., 13 Apr 2025, Liu et al., 3 Dec 2025).
More principled mechanisms for differentiable discrete ordering (e.g., Gumbel-softmax) (Xu et al., 13 Apr 2025).
Stronger topology and connectivity regularization based on topological priors or persistent-homology (Wang et al., 14 Aug 2024).
Efficient approximate algorithms for KNN and sequence reordering in large-scale 3D or point-based domains (Liu et al., 3 Dec 2025).
Cross-domain adaptation and pretraining to address domain shift (Wang et al., 14 Aug 2024).

6. Theoretical and Empirical Implications

The DF-Mamba paradigm unifies adaptive neighborhood aggregation, attention/sequence modeling, and domain-driven priors within a single flexible family of efficient, scalable models. This architecture bridges the gap between the spatial flexibility of deformable convolutions and the long-range contextual power of modern state-space models, while offering reduced computational requirements compared to Transformers with full self-attention. The empirical evidence across domains indicates that adaptivity in both spatial and sequence space enables retention of fine structure and salient semantically-aligned features, with strong performance benefits in both dense prediction and structured regression contexts (Ji et al., 8 Jul 2024, Liu et al., 8 Apr 2025, Liu et al., 3 Dec 2025, Wang et al., 14 Aug 2024).

7. Summary Table of Representative Results

Architecture	Domain	Key Deformable Mechanism	Metric	SOTA/Improvement
Deform-Mamba Net (Ji et al., 8 Jul 2024)	MRI SR	Modulated Deform Block + MVC	PSNR/SSIM	+1.4dB, +0.1 SSIM vs T²Net
DM3D/DF-Mamba (Liu et al., 3 Dec 2025)	Point cloud	Offset-Gaussian scan, TPFF	Acc./mIoU	+0.86% vs PCM (ModelNet40)
DefMamba (Liu et al., 8 Apr 2025)	Vision (ImageNet/COCO)	Deform. scanning (Δp, Δt)	Top-1 Acc.	+1.0% vs PlainMamba-L1
SDMamba (Xu et al., 13 Apr 2025)	HSI classification	Sparse Deform. Seq. (attn.)	OA (%)	+0.26% (IP) vs HyperMamba
TGDM (Wang et al., 14 Aug 2024)	Med. segmentation	Topology priors, grouped SSM	DSC/NSD	+2.5 DSC vs nnMamba
DF-Mamba (Zhou et al., 2 Dec 2025)	Hand pose estimation	DSSM aggregation, tribrid backbone	MPJPE/AUC	–1.56 mm, +2.51% AUC
UAVD-Mamba (Li et al., 1 Jul 2025)	UAV detection	Deformable tokens, dual-modal fusion	mAP	+3.6% vs OAFA baseline

The consistent pattern is that combining SSM-based recurrence with content- or topology-adaptive deformation yields quantifiable, robust improvement in dense, structured, and geometry-sensitive tasks, typically with efficient compute and memory profiles.