Papers
Topics
Authors
Recent
2000 character limit reached

Deformable Mamba: Adaptive SSM Models

Updated 7 December 2025
  • Deformable Mamba (DF-Mamba) is a class of neural architectures that enhance state-space models with adaptive deformation for superior content and geometric feature aggregation.
  • It employs mechanisms like content-adaptive aggregation, learnable token ordering, and offset-guided resampling to overcome fixed-grid limitations in vision, medical imaging, and remote sensing.
  • Extensive experiments demonstrate that DF-Mamba architectures improve performance in tasks such as MRI super-resolution, 3D understanding, and point cloud analysis while maintaining computational efficiency.

Deformable Mamba (DF-Mamba) refers to a class of neural architectures that augment Mamba-based state space models (SSMs) with explicit mechanisms for content- or geometry-adaptive feature aggregation, adaptive scan orders, or structured deformation. DF-Mamba architectures have been developed for diverse vision, medical imaging, remote sensing, and 3D understanding tasks. They share the common motivation of overcoming limitations inherent in fixed grid/topology neural models—namely, poor adaptation to content, shape, or spatially variant semantic structure—by introducing deformable scanning, adaptive sequencing, or offset-guided resampling directly within the SSM-based modeling pipeline.

1. Fundamental Principles and Variants

Deformable Mamba integrates two key ideas: (i) the use of SSMs, particularly Mamba, which delivers efficient long-sequence modeling in linear time and memory; and (ii) a deformable component, such as content-adaptive sampling, learnable token orderings, or offset-based resampling, which enables geometry- or content-aware information flow. This formulation generalizes and subsumes a range of prior concepts from deformable convolutions and adaptive serialization, while applying them within the scope of state-space-based sequence models.

Several variants exist, differentiated by domain and specific mechanism:

  • Modulated Deform Block: Content-adaptive local aggregation via learned spatial offsets and modulation scalars (e.g., for medical super-resolution) (Ji et al., 8 Jul 2024).
  • Deformable Scanning: Token index offsets and spatial shifts, enabling dynamic (learnable) scan paths through data (e.g., images or point clouds) (Liu et al., 8 Apr 2025, Liu et al., 3 Dec 2025).
  • Sparse/Adaptive Sequencing: Attention- or similarity-based token selection yielding sparse, deformable sequences for efficient SSM computation (notably for hyperspectral and temporal data) (Xu et al., 13 Apr 2025, Dewis et al., 29 Jul 2025).
  • Grouped/Topology-Guided Deformation: Enforcing structural priors (e.g., centerlines, anatomy) via group-wise or topology-aware SSM branches (Wang et al., 14 Aug 2024).

2. Technical Formulation Across Modalities

2.1. SSM and Mamba Core

Mamba blocks implement a selective scan mechanism, often codified as an input-driven discretization of the continuous SSM: ht=Aht1+Bxt,yt=Chth'_t = A h_{t-1} + B x_t,\quad y_t = C h_t with A,B,CA, B, C parametrized by learned or input-dependent projections. The discretization—via the zero-order hold (ZOH)—yields efficient, global, linear recurrence or convolution over the sequence (Liu et al., 8 Apr 2025).

2.2. Deformable Mechanisms

a) Content-Adaptive Aggregation (Convolutional Deformation)

A spatial feature at location pp is computed as

Y(p)=k=1KwkX(p+pk+Δpk)ΔmkY(p) = \sum_{k=1}^K w_k\, X\bigl(p + p_k + \Delta p_k\bigr) \cdot \Delta m_k

where pkp_k are canonical kernel offsets, Δpk\Delta p_k and Δmk\Delta m_k are spatially adaptive, learned via light convnets, and wkw_k are filter weights. Bilinear interpolation is applied for non-integer offsets (Ji et al., 8 Jul 2024, Li et al., 1 Jul 2025, Hu et al., 25 Nov 2024).

b) Deformable Scanning / Token Reordering

Scan paths are adaptively shifted by learning both spatial offsets and token index offsets: [Δp,Δt]=tanh(OffsetNet(Fagg))[\Delta p,\,\Delta t] = \tanh(\mathrm{OffsetNet}(F_\mathrm{agg})) producing deformed 1D sequences via sorted traw+Δtt_{\rm raw} + \Delta t (Liu et al., 8 Apr 2025, Liu et al., 3 Dec 2025). In point cloud settings, differentiable continuous reordering is achieved with Gaussian weightings over index shifts (Liu et al., 3 Dec 2025).

c) Sparse/Attention-Based Deformable Sequencing

Tokens are selected by learned attention or cosine-similarity metrics with respect to anchors, with only a subset forwarded to SSM blocks: Zj=[Zj[i]]iIs\overline Z_j = [Z_j[i]]_{i \in I_s} where IsI_s are indices of most relevant tokens. This is applied spatially, spectrally, or temporally as needed (Xu et al., 13 Apr 2025, Dewis et al., 29 Jul 2025).

3. Architectures Leveraging DF-Mamba

The deformable Mamba paradigm underpins a diverse set of architectures:

Variant Deformation Mode Application Domain
Deform-Mamba Net Modulated Deform + SSM MRI super-resolution
DefMamba Deformable Scanning SSM General vision (classification, det/seg)
DM3D Offset-guided Gaussian scan Point cloud understanding
Sparse Deformable Mamba Sparse deform. sequence HSI, MODIS classification
TGDM Topology-guided deformable SSM Anatomy segmentation (costal cartilage)
MambaReg Disentangled sparse+deform Unsupervised multimodal registration
UAVD-Mamba Deformable token fusion Multimodal UAV detection

Notably, the architectures consistently alternate standard SSM/Mamba blocks with deformable mechanisms and employ multi-branch or multi-scale strategies for robust feature representation across highly structured or irregular domains (Ji et al., 8 Jul 2024, Liu et al., 8 Apr 2025, Liu et al., 3 Dec 2025, Xu et al., 13 Apr 2025, Wang et al., 14 Aug 2024, Li et al., 1 Jul 2025).

4. Experimental Benchmarks and Ablation Findings

DF-Mamba models have delivered state-of-the-art or near-SOTA performance in each application domain tested:

  • MRI Super-Resolution: Outperforms SRCNN, VDSR, FMISR, T²Net, HAT on IXI and fastMRI; ablation shows necessity of deformable block, multi-scale context, and contrastive loss (Ji et al., 8 Jul 2024).
  • Point Cloud Analysis: On ModelNet40, DF-Mamba achieves 93.76% (no pretrain), surpassing PointMamba (92.9%) and PCM (93.4%); TPFF, deformable SSM, GKR, and GDR each contribute significant accuracy improvements (Liu et al., 3 Dec 2025).
  • Visual Recognition: On ImageNet-1K, DefMamba and its variants surpass ViT and SwinT at both tiny (8M) and base (51M) model scales; ablations confirm 1.0% top-1 gain from combined spatial and token offsets (Liu et al., 8 Apr 2025).
  • Wide FoV Segmentation: The Deformable Mamba decoder increases mIoU by 2.5 points on Stanford2D3D and uses 72% fewer FLOPs compared to UperNet, indicating substantial benefits for distortion-prone domains (Hu et al., 25 Nov 2024).
  • HSI and MODIS: Sparse deformable token sequencing drastically reduces computation (e.g., 59% FLOP reduction for SDMamba) while improving classification accuracy on Indian Pines and MODIS; small-class and boundary preservation are observed (Xu et al., 13 Apr 2025, Dewis et al., 29 Jul 2025).
  • Medical Registration: MambaReg/TGDM demonstrate significant gains in non-rigid multimodal alignment and anatomy segmentation, with improvements in Dice coefficient and robustness to anatomy-specific variation (Wen et al., 3 Nov 2024, Wang et al., 14 Aug 2024).
  • 3D Hand Pose Estimation: DF-Mamba tribrid backbone yields measurable improvements (0.3–1 mm in MPJPE, +2.5% AUC) with throughput matching or exceeding ResNet-50 (Zhou et al., 2 Dec 2025).

Ablation studies across works consistently show that removal of any deformable or adaptive component results in a measurable drop in performance, underscoring the functional necessity of the adaptive mechanisms (e.g., –3% to –4% for removing deformable branches in DM3D (Liu et al., 3 Dec 2025), –1.1 mm MPJPE without deformable scan in hand pose (Zhou et al., 2 Dec 2025), and similar patterns in TGDM, DefMamba, and SDMamba).

5. Applications, Limitations, and Future Directions

Applications of Deformable Mamba span:

Limitations identified by original works include:

  • Computational overhead from deformable index and offset computation, particularly in very large data regimes due to KNN, sorting, or attention-based selection.
  • Decreased robustness under strong domain shift (e.g., out-of-distribution segmentation in TGDM (Wang et al., 14 Aug 2024)).
  • The requirement for tuning sparsity/adaptivity parameters per dataset (λ in SDMamba, SDTM).
  • Algorithmic complexity for reproducibility, especially where non-differentiable operations (e.g., sorting) require approximations (Liu et al., 8 Apr 2025, Liu et al., 3 Dec 2025).

Prospective directions are suggested:

6. Theoretical and Empirical Implications

The DF-Mamba paradigm unifies adaptive neighborhood aggregation, attention/sequence modeling, and domain-driven priors within a single flexible family of efficient, scalable models. This architecture bridges the gap between the spatial flexibility of deformable convolutions and the long-range contextual power of modern state-space models, while offering reduced computational requirements compared to Transformers with full self-attention. The empirical evidence across domains indicates that adaptivity in both spatial and sequence space enables retention of fine structure and salient semantically-aligned features, with strong performance benefits in both dense prediction and structured regression contexts (Ji et al., 8 Jul 2024, Liu et al., 8 Apr 2025, Liu et al., 3 Dec 2025, Wang et al., 14 Aug 2024).

7. Summary Table of Representative Results

Architecture Domain Key Deformable Mechanism Metric SOTA/Improvement
Deform-Mamba Net (Ji et al., 8 Jul 2024) MRI SR Modulated Deform Block + MVC PSNR/SSIM +1.4dB, +0.1 SSIM vs T²Net
DM3D/DF-Mamba (Liu et al., 3 Dec 2025) Point cloud Offset-Gaussian scan, TPFF Acc./mIoU +0.86% vs PCM (ModelNet40)
DefMamba (Liu et al., 8 Apr 2025) Vision (ImageNet/COCO) Deform. scanning (Δp, Δt) Top-1 Acc. +1.0% vs PlainMamba-L1
SDMamba (Xu et al., 13 Apr 2025) HSI classification Sparse Deform. Seq. (attn.) OA (%) +0.26% (IP) vs HyperMamba
TGDM (Wang et al., 14 Aug 2024) Med. segmentation Topology priors, grouped SSM DSC/NSD +2.5 DSC vs nnMamba
DF-Mamba (Zhou et al., 2 Dec 2025) Hand pose estimation DSSM aggregation, tribrid backbone MPJPE/AUC –1.56 mm, +2.51% AUC
UAVD-Mamba (Li et al., 1 Jul 2025) UAV detection Deformable tokens, dual-modal fusion mAP +3.6% vs OAFA baseline

The consistent pattern is that combining SSM-based recurrence with content- or topology-adaptive deformation yields quantifiable, robust improvement in dense, structured, and geometry-sensitive tasks, typically with efficient compute and memory profiles.

Whiteboard

Follow Topic

Get notified by email when new papers are published related to Deformable Mamba (DF-Mamba).