Mamba-ND: N-Dimensional Selective SSMs

Updated 21 February 2026

Mamba-ND is a generalization of selective state-space models that extends 1D scans to arbitrary N-dimensional arrays, delivering a global receptive field with linear complexity.
It employs content-aware, gated state-space scans along principal axes to dynamically integrate long-range context across diverse high-dimensional domains.
Empirical results demonstrate state-of-the-art accuracy and efficiency in applications such as computer vision, medical imaging, scientific forecasting, and computational biology.

Mamba-ND is a generalization of the selective state-space modeling (sSSM) paradigm underpinning the original Mamba architecture, extending its core principles from one-dimensional sequential data to arbitrary N-dimensional arrays. By replacing Transformer-style self-attention with content-aware, gated state-space scans along principal axes, Mamba-ND achieves linear complexity in input size, global receptive field, and state-of-the-art accuracy across diverse high-dimensional domains including computer vision, medical imaging, scientific forecasting, and computational biology (Li et al., 2024). Theoretical and empirical investigations show Mamba-ND's versatility and scalability relative to both canonical SSMs and quadratic-complexity Transformer architectures.

1. Core Principles and Model Formulation

Mamba-ND is grounded in continuous-time linear state-space models of the form

$\dot h(t) = A h(t) + B x(t), \qquad y(t) = C h(t) + D x(t),$

where $x(t)$ (input), $h(t)$ (hidden state), and $y(t)$ (output) are evolved via learnable matrices $(A,B,C,D)$ . Discretization (zero-order hold) yields

$\bar A = e^{\Delta A}, \quad \bar B = (\Delta A)^{-1} (e^{\Delta A}-I) B,\ h_k = \bar A h_{k-1} + \bar B x_k, \quad y_k = C h_k + D x_k.$

The key innovation in Mamba and its ND extension is selectivity: $\Delta$ , $B$ , $C$ , and $D$ are input-conditioned via small neural networks, allowing dynamic gating, selective memory, and input-dependent recurrence (Li et al., 2024, Medina et al., 3 Mar 2025). Parallel scan algorithms reduce sequential cost to $O(L)$ for sequence length $L$ . In the ND case ( $N$ -dimensional grids), the array is repeatedly flattened along different axes and directions, with each 1D scan imparting content-sensitive long-range context for that dimension.

2. Mamba-ND Architecture and Axis-Alternation

The canonical Mamba-ND block interleaves 1D selective SSM layers along each axis (and direction) of the input array. For a data tensor $X \in \mathbb{R}^{D_1 \times \ldots \times D_N}$ , layers alternate scan axes in row-major and reverse order (e.g., H+, H−, W+, W− for 2D; T+, T−, H+, H−, W+, W− for 3D), followed by reshaping back to multi-dimensional form.

Pseudocode for a Mamba-ND block (N=3) (Li et al., 2024):

input: X ∈ ℝ^{T×H×W}, state h ∈ ℝ^n
for axis, dir in [(H,+),(H,−),(W,+),(W,−),(T,+),(T,−)]:
    X_seq ← flatten(X, ordering=(axis,dir))
    Y_seq, h ← Mamba1D_Layer(X_seq, h)  # sSSM scan
    X ← reshape(Y_seq, shape=(T,H,W))
return X, h

The SSM parameters at each step are content-gated; per-block cost is

O(D\cdot L \cdot n^2)

where

D=2N

is number of directions,

L

is total number of elements, and

n

is hidden width. Unlike multi-dimensional convolutions which are spatially local, this interleaving provides a global receptive field with linear scaling per axis.

3. Empirical Results and Benchmarks

Mamba-ND achieves or surpasses state-of-the-art performance in a variety of N-dimensional learning tasks (Li et al., 2024, Wang et al., 25 Mar 2025, Hu et al., 2024):

Domain	Benchmark	Model/Config	Performance	Baseline	Param/FLOP Efficiency
Image Classification	ImageNet-1K	Mamba-2D-B (24 × 768)	83.0% top-1	Swin-B 83.5%	92M params, linear
Video Recognition	HMDB-51	Mamba-3D (32 × 384)	60.9% top-1	Video-Swin-S 58.1%	36M params
Weather Forecasting	ERA5	Mamba-3D (50M)	ACC 90.1	Cli-ViT 89.3	50M params
3D Med Segmentation	BTCV, AMOS et al.	Mamba-3D, UlikeMamba	Dice 89–90.6%	UNETR/Swin-UNETR	33M params, O(NL)
Panoramic Segmentation	Stanford2D3D 360°	DMamba-M (Decoder)	mIoU 59.3	UperHead 56.8	31M vs. 206.9G FLOPs

Notably, replacing Transformer self-attention with Mamba-ND in UNETR yields equivalent/better Dice with up to 70% fewer parameters on 3D imaging tasks (Li et al., 2024, Wang et al., 25 Mar 2025). In panoramic and fisheye segmentation, Deformable Mamba Decoders reduce FLOPs by >87% and improve mIoU by +2.5 points versus leading Transformer-based decoders (Hu et al., 2024).

4. Adaptations and Practical Variants

Numerous Mamba-ND instantiations target specific dense prediction, modeling, and compression tasks:

3D Volumetric Segmentation: UlikeMamba networks (Wang et al., 25 Mar 2025) leverage ND depthwise convolutions and selective ND SSMs at each encoder–decoder stage, along with multi-scale adapters and axis ensembling (tri-scan/N-scan) for robust context aggregation. Multi-scale and multi-axis variants confer additional accuracy at modest compute increase.
Distortion-Aware Decoders: Deformable Mamba Fusion blocks combine multi-directional cross-scans (ND SSMs in four scan directions) with learned deformable convolutional fusion for distortion-robust wide-FoV semantic segmentation (Hu et al., 2024).
Knowledge Compression: Progressive Knowledge Distillation (PKD) with Mamba-ND blocks enables cascade ensembles of weak learners, allowing resource-accuracy tradeoff by choosing different ensemble depths at inference (Medina et al., 3 Mar 2025). Small student models (1%–19% FLOPs of the teacher) recover 60%–86% of accuracy on MNIST/CIFAR-10, with full ensemble reaching >98% at 63% computational cost.
Single-Cell Omics: scMamba (designated as Mamba-ND) processes full-length gene expression vectors (≈19,000 genes) with bidirectional Mamba blocks and no embedding reduction, trained via masked expression modeling for robust cell/gene representation, imputation, and classification (Oh et al., 12 Feb 2025).

5. Application to Biomedical Imaging and Omics

Mamba-ND demonstrates unique strengths in biomedical domains where standard attention-based architectures encounter scalability or bias limitations:

MRI Modality Disentanglement: In multi-contrast MRI, Mamba-based modality disentanglement networks employ ND selective SSM blocks to masked mixed-domain features, iteratively purifying target representations and outperforming multi-contrast fusion baselines by 1–2 dB PSNR on IXI/BraTS datasets (Lyu et al., 22 Dec 2025).
Single-Nucleus RNA-seq: scMamba achieves the highest macro, micro, and weighted F1 on cell type labeling and doublet detection benchmarks, as well as superior imputation MSE and differential gene expression robustness, without reliance on highly variable gene selection or dimension reduction (Oh et al., 12 Feb 2025).

6. Limitations, Theoretical Challenges, and Future Extensions

Scan Ordering and Anisotropy: The original Mamba-ND relies on fixed row-major scans per axis. Optimal orderings (e.g., learned traversals, diagonal/zig-zag), adaptive scan fusion, or attention-based scan routing remain largely unexplored (Li et al., 2024).
Hardware and Memory Constraints: While compute is linear in input size, per-axis scan parallelization may become memory-bounded for extremely high-dimensional or multi-modal data. Device-aware choices of channel/hidden width, scan grouping, and parameter sharing are crucial for deployment (Medina et al., 3 Mar 2025, Wang et al., 25 Mar 2025).
Generalization to Arbitrary Topologies: Direct extension of ND scans to non-Euclidean data (e.g., graphs, manifolds) is an open direction, requiring redefinition of scan/jump paths and state updating strategies.
Biological Applicability: Pretraining bias (e.g., scMamba's focus on brain nuclei) limits immediate use outside the pretraining context; retraining is needed for new tissues or modalities (Oh et al., 12 Feb 2025). Model size may be prohibitive in ultra-sparse or multi-omics settings.

7. Summary Table: Mamba-ND Variants and Key Use Cases

Variant/Framework	Target Domain	Key Innovations	Notable Results	Reference
Mamba-ND (Orig.)	Vision, 3D	Axis-alternating ND SSMs	ImageNet: 83% top-1; HMDB-51: 60.9%	(Li et al., 2024)
UlikeMamba	3D Medical	Multi-scale, tri-scan SSMs	AMOS Dice: 89.95%E, 20–40% fewer GFLOPs	(Wang et al., 25 Mar 2025)
DMamba	Panoramic Seg.	Deformable fusion, 4-direction SSM	+2.5 mIoU at 87% fewer decoder FLOPs	(Hu et al., 2024)
scMamba	Omics	Bidirectional, full-gene SSMs	F1: 0.98 major, 0.72 subtype; best imputation	(Oh et al., 12 Feb 2025)
PKD-Mamba-ND	Compression	Progressive ensemble distillation	MNIST: 98% acc. at 63% teacher FLOPs	(Medina et al., 3 Mar 2025)
MambaMDN	MRI	Iterative ND SSM feature subtraction	+1–2 dB PSNR over SOTA fusion	(Lyu et al., 22 Dec 2025)

Mamba-ND establishes a unified, hardware-efficient, content-adaptive, and highly generalizable approach to modeling high-dimensional scientific and medical data. By decoupling global receptive field acquisition from the quadratic cost of attention, it expands the practical and scientific reach of structured sequence modeling architectures.