Deformable Mamba: Adaptive SSM Models
- Deformable Mamba (DF-Mamba) is a class of neural architectures that enhance state-space models with adaptive deformation for superior content and geometric feature aggregation.
- It employs mechanisms like content-adaptive aggregation, learnable token ordering, and offset-guided resampling to overcome fixed-grid limitations in vision, medical imaging, and remote sensing.
- Extensive experiments demonstrate that DF-Mamba architectures improve performance in tasks such as MRI super-resolution, 3D understanding, and point cloud analysis while maintaining computational efficiency.
Deformable Mamba (DF-Mamba) refers to a class of neural architectures that augment Mamba-based state space models (SSMs) with explicit mechanisms for content- or geometry-adaptive feature aggregation, adaptive scan orders, or structured deformation. DF-Mamba architectures have been developed for diverse vision, medical imaging, remote sensing, and 3D understanding tasks. They share the common motivation of overcoming limitations inherent in fixed grid/topology neural models—namely, poor adaptation to content, shape, or spatially variant semantic structure—by introducing deformable scanning, adaptive sequencing, or offset-guided resampling directly within the SSM-based modeling pipeline.
1. Fundamental Principles and Variants
Deformable Mamba integrates two key ideas: (i) the use of SSMs, particularly Mamba, which delivers efficient long-sequence modeling in linear time and memory; and (ii) a deformable component, such as content-adaptive sampling, learnable token orderings, or offset-based resampling, which enables geometry- or content-aware information flow. This formulation generalizes and subsumes a range of prior concepts from deformable convolutions and adaptive serialization, while applying them within the scope of state-space-based sequence models.
Several variants exist, differentiated by domain and specific mechanism:
- Modulated Deform Block: Content-adaptive local aggregation via learned spatial offsets and modulation scalars (e.g., for medical super-resolution) (Ji et al., 8 Jul 2024).
- Deformable Scanning: Token index offsets and spatial shifts, enabling dynamic (learnable) scan paths through data (e.g., images or point clouds) (Liu et al., 8 Apr 2025, Liu et al., 3 Dec 2025).
- Sparse/Adaptive Sequencing: Attention- or similarity-based token selection yielding sparse, deformable sequences for efficient SSM computation (notably for hyperspectral and temporal data) (Xu et al., 13 Apr 2025, Dewis et al., 29 Jul 2025).
- Grouped/Topology-Guided Deformation: Enforcing structural priors (e.g., centerlines, anatomy) via group-wise or topology-aware SSM branches (Wang et al., 14 Aug 2024).
2. Technical Formulation Across Modalities
2.1. SSM and Mamba Core
Mamba blocks implement a selective scan mechanism, often codified as an input-driven discretization of the continuous SSM: with parametrized by learned or input-dependent projections. The discretization—via the zero-order hold (ZOH)—yields efficient, global, linear recurrence or convolution over the sequence (Liu et al., 8 Apr 2025).
2.2. Deformable Mechanisms
a) Content-Adaptive Aggregation (Convolutional Deformation)
A spatial feature at location is computed as
where are canonical kernel offsets, and are spatially adaptive, learned via light convnets, and are filter weights. Bilinear interpolation is applied for non-integer offsets (Ji et al., 8 Jul 2024, Li et al., 1 Jul 2025, Hu et al., 25 Nov 2024).
b) Deformable Scanning / Token Reordering
Scan paths are adaptively shifted by learning both spatial offsets and token index offsets: producing deformed 1D sequences via sorted (Liu et al., 8 Apr 2025, Liu et al., 3 Dec 2025). In point cloud settings, differentiable continuous reordering is achieved with Gaussian weightings over index shifts (Liu et al., 3 Dec 2025).
c) Sparse/Attention-Based Deformable Sequencing
Tokens are selected by learned attention or cosine-similarity metrics with respect to anchors, with only a subset forwarded to SSM blocks: where are indices of most relevant tokens. This is applied spatially, spectrally, or temporally as needed (Xu et al., 13 Apr 2025, Dewis et al., 29 Jul 2025).
3. Architectures Leveraging DF-Mamba
The deformable Mamba paradigm underpins a diverse set of architectures:
| Variant | Deformation Mode | Application Domain |
|---|---|---|
| Deform-Mamba Net | Modulated Deform + SSM | MRI super-resolution |
| DefMamba | Deformable Scanning SSM | General vision (classification, det/seg) |
| DM3D | Offset-guided Gaussian scan | Point cloud understanding |
| Sparse Deformable Mamba | Sparse deform. sequence | HSI, MODIS classification |
| TGDM | Topology-guided deformable SSM | Anatomy segmentation (costal cartilage) |
| MambaReg | Disentangled sparse+deform | Unsupervised multimodal registration |
| UAVD-Mamba | Deformable token fusion | Multimodal UAV detection |
Notably, the architectures consistently alternate standard SSM/Mamba blocks with deformable mechanisms and employ multi-branch or multi-scale strategies for robust feature representation across highly structured or irregular domains (Ji et al., 8 Jul 2024, Liu et al., 8 Apr 2025, Liu et al., 3 Dec 2025, Xu et al., 13 Apr 2025, Wang et al., 14 Aug 2024, Li et al., 1 Jul 2025).
4. Experimental Benchmarks and Ablation Findings
DF-Mamba models have delivered state-of-the-art or near-SOTA performance in each application domain tested:
- MRI Super-Resolution: Outperforms SRCNN, VDSR, FMISR, T²Net, HAT on IXI and fastMRI; ablation shows necessity of deformable block, multi-scale context, and contrastive loss (Ji et al., 8 Jul 2024).
- Point Cloud Analysis: On ModelNet40, DF-Mamba achieves 93.76% (no pretrain), surpassing PointMamba (92.9%) and PCM (93.4%); TPFF, deformable SSM, GKR, and GDR each contribute significant accuracy improvements (Liu et al., 3 Dec 2025).
- Visual Recognition: On ImageNet-1K, DefMamba and its variants surpass ViT and SwinT at both tiny (8M) and base (51M) model scales; ablations confirm 1.0% top-1 gain from combined spatial and token offsets (Liu et al., 8 Apr 2025).
- Wide FoV Segmentation: The Deformable Mamba decoder increases mIoU by 2.5 points on Stanford2D3D and uses 72% fewer FLOPs compared to UperNet, indicating substantial benefits for distortion-prone domains (Hu et al., 25 Nov 2024).
- HSI and MODIS: Sparse deformable token sequencing drastically reduces computation (e.g., 59% FLOP reduction for SDMamba) while improving classification accuracy on Indian Pines and MODIS; small-class and boundary preservation are observed (Xu et al., 13 Apr 2025, Dewis et al., 29 Jul 2025).
- Medical Registration: MambaReg/TGDM demonstrate significant gains in non-rigid multimodal alignment and anatomy segmentation, with improvements in Dice coefficient and robustness to anatomy-specific variation (Wen et al., 3 Nov 2024, Wang et al., 14 Aug 2024).
- 3D Hand Pose Estimation: DF-Mamba tribrid backbone yields measurable improvements (0.3–1 mm in MPJPE, +2.5% AUC) with throughput matching or exceeding ResNet-50 (Zhou et al., 2 Dec 2025).
Ablation studies across works consistently show that removal of any deformable or adaptive component results in a measurable drop in performance, underscoring the functional necessity of the adaptive mechanisms (e.g., –3% to –4% for removing deformable branches in DM3D (Liu et al., 3 Dec 2025), –1.1 mm MPJPE without deformable scan in hand pose (Zhou et al., 2 Dec 2025), and similar patterns in TGDM, DefMamba, and SDMamba).
5. Applications, Limitations, and Future Directions
Applications of Deformable Mamba span:
- Super-resolution and segmentation in medical imaging with complex, multiscale, or topology-driven anatomical features (Ji et al., 8 Jul 2024, Wang et al., 14 Aug 2024).
- Multimodal image registration, leveraging disentangled feature learning for difficult cross-contrast or cross-sensor tasks (Wen et al., 3 Nov 2024, Guo et al., 25 Jan 2024).
- Remote sensing and Earth observation, including land cover dynamics from MODIS and classification from hyperspectral data (Dewis et al., 29 Jul 2025, Xu et al., 13 Apr 2025).
- 3D object understanding, segmentation, part recognition, and few-shot learning in point clouds (Liu et al., 3 Dec 2025).
- Multimedia detection tasks where robust geometric adaptation and fusion are needed (e.g., UAV detection with multimodal inputs) (Li et al., 1 Jul 2025).
- Sequence labeling and framewise prediction under occlusion and structured context dependencies (e.g., hand pose estimation) (Zhou et al., 2 Dec 2025).
Limitations identified by original works include:
- Computational overhead from deformable index and offset computation, particularly in very large data regimes due to KNN, sorting, or attention-based selection.
- Decreased robustness under strong domain shift (e.g., out-of-distribution segmentation in TGDM (Wang et al., 14 Aug 2024)).
- The requirement for tuning sparsity/adaptivity parameters per dataset (λ in SDMamba, SDTM).
- Algorithmic complexity for reproducibility, especially where non-differentiable operations (e.g., sorting) require approximations (Liu et al., 8 Apr 2025, Liu et al., 3 Dec 2025).
Prospective directions are suggested:
- Dynamic or multiscale deformation, with point- or layer-adaptive sparsity (Xu et al., 13 Apr 2025, Liu et al., 3 Dec 2025).
- More principled mechanisms for differentiable discrete ordering (e.g., Gumbel-softmax) (Xu et al., 13 Apr 2025).
- Stronger topology and connectivity regularization based on topological priors or persistent-homology (Wang et al., 14 Aug 2024).
- Efficient approximate algorithms for KNN and sequence reordering in large-scale 3D or point-based domains (Liu et al., 3 Dec 2025).
- Cross-domain adaptation and pretraining to address domain shift (Wang et al., 14 Aug 2024).
6. Theoretical and Empirical Implications
The DF-Mamba paradigm unifies adaptive neighborhood aggregation, attention/sequence modeling, and domain-driven priors within a single flexible family of efficient, scalable models. This architecture bridges the gap between the spatial flexibility of deformable convolutions and the long-range contextual power of modern state-space models, while offering reduced computational requirements compared to Transformers with full self-attention. The empirical evidence across domains indicates that adaptivity in both spatial and sequence space enables retention of fine structure and salient semantically-aligned features, with strong performance benefits in both dense prediction and structured regression contexts (Ji et al., 8 Jul 2024, Liu et al., 8 Apr 2025, Liu et al., 3 Dec 2025, Wang et al., 14 Aug 2024).
7. Summary Table of Representative Results
| Architecture | Domain | Key Deformable Mechanism | Metric | SOTA/Improvement |
|---|---|---|---|---|
| Deform-Mamba Net (Ji et al., 8 Jul 2024) | MRI SR | Modulated Deform Block + MVC | PSNR/SSIM | +1.4dB, +0.1 SSIM vs T²Net |
| DM3D/DF-Mamba (Liu et al., 3 Dec 2025) | Point cloud | Offset-Gaussian scan, TPFF | Acc./mIoU | +0.86% vs PCM (ModelNet40) |
| DefMamba (Liu et al., 8 Apr 2025) | Vision (ImageNet/COCO) | Deform. scanning (Δp, Δt) | Top-1 Acc. | +1.0% vs PlainMamba-L1 |
| SDMamba (Xu et al., 13 Apr 2025) | HSI classification | Sparse Deform. Seq. (attn.) | OA (%) | +0.26% (IP) vs HyperMamba |
| TGDM (Wang et al., 14 Aug 2024) | Med. segmentation | Topology priors, grouped SSM | DSC/NSD | +2.5 DSC vs nnMamba |
| DF-Mamba (Zhou et al., 2 Dec 2025) | Hand pose estimation | DSSM aggregation, tribrid backbone | MPJPE/AUC | –1.56 mm, +2.51% AUC |
| UAVD-Mamba (Li et al., 1 Jul 2025) | UAV detection | Deformable tokens, dual-modal fusion | mAP | +3.6% vs OAFA baseline |
The consistent pattern is that combining SSM-based recurrence with content- or topology-adaptive deformation yields quantifiable, robust improvement in dense, structured, and geometry-sensitive tasks, typically with efficient compute and memory profiles.