Point Mamba: 3D Point Cloud SSM Models

Updated 21 February 2026

Point Mamba is a family of 3D point cloud deep learning architectures utilizing state-space models for efficient global spatial reasoning with linear computational complexity.
It employs geometric serialization techniques like space-filling curves and multi-path orderings to convert unordered point sets into structured sequences, preserving local spatial relationships.
Hybrid models combine SSMs with transformer and convolutional components to enhance accuracy in tasks such as classification, segmentation, registration, and generative modeling.

Point Mamba is a family of 3D point cloud deep learning architectures that leverage state-space models (SSMs), specifically the Mamba framework, to achieve global modeling with strictly linear complexity in sequence length. Point Mamba models have been shown to match or surpass transformer-based counterparts in a range of 3D vision tasks by replacing attention with a parameterized SSM that operates over sequences derived from reordering point clouds. The core principles involve geometric serialization of unordered 3D data, dynamic or input-dependent SSM parameterization, and tailored integration of local and global spatial dependencies.

1. Mathematical Foundations: State Space Formulation and Complexity

The fundamental building block of Point Mamba models is the linear time-invariant or time-varying SSM, written in continuous time as: $\dot{h}(t) = A h(t) + B x(t), \qquad y(t) = C h(t) + D x(t)$ where $x(t)$ is the input (feature vector), $h(t)$ is the latent state, and $y(t)$ is the output. This is discretized (e.g., with zero-order hold or Tustin transform) for sequence modeling: $h_k = \bar{A} h_{k-1} + \bar{B} x_k, \quad y_k = \bar{C} h_k + \bar{D} x_k$ The SSM can be “unrolled” as a convolution: $y = K * x, \quad K[t] = \bar{C} \bar{A}^t \bar{B}$ Mamba’s central innovation is dynamic or selective parameterization: SSM parameters (often $\bar{B}, \bar{C}$ , occasionally $\Delta$ ) are made input-dependent via lightweight neural heads, facilitating data-adaptive filtering while retaining strictly $O(N)$ compute in sequence length (Liang et al., 2024).

Point Mamba backbones implement SSMs either unidirectionally (causal, as in the original Mamba) or bidirectionally, enabling context integration from both past and future tokens (Chen et al., 2024). Forward and backward SSMs are computed separately and fused, with each SSM operating as a global convolutional filter over the ordered point sequence.

2. Geometric Serialization and Ordering Strategies

Point clouds are unordered and irregular; SSMs require a fixed, causal sequence as input. All Point Mamba architectures thus employ geometric serialization:

Space-filling curves (Z-order/Morton, Hilbert, Trans-Hilbert): Points are quantized onto a 3D grid, Morton or Hilbert codes are computed, and the point set is sorted accordingly, preserving local proximity in the serialized sequence (Liu et al., 2024, Lin et al., 23 Jul 2025, Li et al., 2024, Li et al., 25 Nov 2025).
Multi-path / multi-permutation: Several serialization strategies are mixed across blocks or network stages (e.g., “xyz”, “xzy”, etc. (Zhang et al., 2024), or four Hilbert/z-order variants (Li et al., 2024)) to increase robustness and mitigate artifacts of any single ordering.
Learned or adaptive ordering: Recent works utilize learned geometry-constrained orderings or graph spectral traversals for isometry-invariant and semantically coherent sequencing (Zha et al., 27 May 2025, Bahri et al., 6 Mar 2025).

The purpose is to ensure that spatially proximal 3D points remain adjacent in the 1D sequence, maximizing the effectiveness of causal SSM recurrences for geometric reasoning.

3. Architecture Variants and Design Extensions

3.1 Unidirectional, Bidirectional, and Augmented SSMs

While early Point Mamba models relied on unidirectional SSMs, recent research demonstrates that bidirectional SSMs (running the model in both sequence directions and fusing outputs) consistently improve global feature extraction and boost accuracy, especially for point cloud classification (Chen et al., 2024, Zhang et al., 2024, Li et al., 2024, Li et al., 2024). Causal and anti-causal recurrences are parameter-shared or independent, depending on the implementation.

3.2 Hybrid Mamba-Transformer Networks

Several models combine Mamba (for efficient global context) with transformer-style attention mechanisms to enhance local geometry modeling. Examples include:

PointABM: Integrates an initial Transformer patch encoder with a deep stack of bidirectional Mamba blocks. Transformer self-attention processes KNN-based local patches (enhancing local geometry), while global features are extracted in the SSM stage. This hybrid arrangement yields accuracy above transformer-only baselines on ScanObjectNN and ModelNet40 (Chen et al., 2024).
PoinTramba: Utilizes intra-group Transformers for local patch modeling followed by an inter-group Mamba with bi-directional importance-aware ordering (BIO) for enhanced global context (Wang et al., 2024).
PointLAMA: Introduces lightweight Latent Attention blocks to inject local bias into a global Mamba pipeline, using space-filling or axis-wise serialization depending on the pretraining task (Lin et al., 23 Jul 2025).
MT-PCR: Employs Z-order spatial serialization, a Mamba-based encoder at the global level, and a light Transformer refiner for registration (Liu et al., 16 Jun 2025).

3.3 Efficient Patch-Based and Segmentation Architectures

Many models leverage patch grouping and grid-based pooling for scalability:

Patch-based SSM: Points are grouped (e.g., via FPS + KNN), embedded by PointNet/MLP, and ordered for SSM processing (Liang et al., 2024, Liu et al., 2024).
Serialized Point Mamba: Implements stage-wise grid pooling and serialization, interleaved with conditional submanifold convolutional positional encodings, in a U-Net architecture for semantic/instance segmentation (Wang et al., 2024).
ConvMamba block: Combines local sparse convolutions (e.g. 3×3×3, as in MinkowskiEngine) with bidirectional Mamba for superior balance of local and global context (Li et al., 2024).

3.4 Parametric and Task-Specific Adapters

Parameter-efficient adapters (e.g., PMA) employ a geometry-constrained gating prompt generator to fuse multi-layer intermediate features via a Mamba block, allowing effective reuse of large frozen backbones with dynamic per-layer token ordering and geometry-aware gating (Zha et al., 27 May 2025).

4. Empirical Performance and Complexity Comparison

Point Mamba models consistently achieve or exceed the state of the art across point cloud benchmarks:

Model	ModelNet40 (%)	ScanObjectNN PB-T50-RS (%)	ShapeNetPart Inst-mIoU (%)	Segmentation (ScanNet mIoU)	Params/FLOPs	Key Reference
PointMamba	92.4-93.6	82.5-88.3	85.8-86.0	–	12.3M/1.8G-3.1G	(Liang et al., 2024, Liu et al., 2024)
PointABM	92.6-93.1	86.2-88.3	–	–	single 4090, O(NC)	(Chen et al., 2024)
ZigzagPointMamba	93.15	88.65	85.78	–	12.3M/3.1G	(Diao et al., 27 May 2025)
Serialized Pt. Mamba	–	–	–	76.8 (ScanNet)	50M/4.4GB	(Wang et al., 2024)
StruMamba3D	95.1	92.8	86.7	–	15.8M/4.0G	(Wang et al., 26 Jun 2025)

PointMamba, PointABM, and hybrid models exhibit a significant reduction in both parameters and FLOPs compared to Transformer analogues (25–44% fewer FLOPs) while matching or surpassing their performance. Ablations confirm that multi-path serialization, bidirectional SSM, and dynamic ordering each contribute measurable accuracy gains (Chen et al., 2024, Liu et al., 2024, Li et al., 25 Nov 2025).

5. Applications and Variants

5.1 3D Recognition and Segmentation

Point Mamba and its variants are applied to object classification, semantic/instance segmentation, part segmentation, and few-shot classification across synthetic and real-world benchmarks (ModelNet40, ScanObjectNN, ShapeNetPart, S3DIS, ScanNet). Memory and inference scaling remain linear, enabling full-scene processing without patchwise approximations (Liu et al., 2024, Wang et al., 2024, Zhang et al., 2024).

5.2 Registration, Completion, and Enhancement

Extensions to completion use HyperPoint generation with Mamba-based feature selection (Li et al., 2024), and domain adaptation employs cross-domain SSM alignment to bridge geometric and semantic gaps between domains (Li et al., 25 Nov 2025). For high-fidelity registration, hybrid Mamba-Transformer architectures achieve SOTA with far lower computational cost (Liu et al., 16 Jun 2025).

5.3 Generative and Pretraining Paradigms

Diffusion-based point cloud generation in TFDM introduces dual-stream Mamba blocks with frequency-aware latent sampling, obtaining favorable trade-offs in generation speed, diversity, and fidelity (Liu et al., 17 Mar 2025). Pretraining approaches (e.g., PointLAMA, ZigzagPointMamba) use conditional diffusion or semantic masking with serialization strategies tailored to mask redundancy and enforce global context (Lin et al., 23 Jul 2025, Diao et al., 27 May 2025).

5.4 Energy-Efficient and Neuromorphic Variants

Spiking Point Mamba integrates SSMs with bio-inspired SNN blocks for extremely energy-efficient point cloud analysis, demonstrating SOTA accuracies at ≥3.5× lower energy than comparable ANNs (Wu et al., 19 Apr 2025).

6. Limitations, Open Challenges, and Extensions

Order dependence: Fixed space-filling or permutation scan strategies, while effective, remain non-adaptive. Recent work proposes dynamic learning or spectral traversals for order selection (Zha et al., 27 May 2025, Bahri et al., 6 Mar 2025).
Local-global coupling: Pure SSMs are suboptimal for fine-grained geometry; most strong models hybridize Mamba with local attention/convolutions or explicit spatial/state tokens (Chen et al., 2024, Li et al., 2024, Wang et al., 26 Jun 2025).
Scaling to massive clouds: Multi-stage pooling, staged or U-Net architectures, and bidirectional SSMs address scale, but receptive field limitations in very deep or strongly pooled networks may arise (Wang et al., 2024).
Downstream transfer: Length-adaptive strategies and explicit structural state updates have been proposed to mitigate memory loss across sequence/domain shifts (Wang et al., 26 Jun 2025).
Hardware parallelism: SSM models’ sequential nature limits parallelism compared to block-wise attention but remains tractable for practical N (Li et al., 2024).

Potential directions include adaptive or fused graph-SSM models, cross-modal (2D/3D) backbones, continuous-time SSMs for generative diffusion, and generalization to panoptic segmentation, detection, or LiDAR super-resolution (Li et al., 25 Nov 2025, Liu et al., 17 Mar 2025, Chen et al., 15 May 2025).

7. Historical and Theoretical Significance

Point Mamba marks a pivotal development in 3D vision, demonstrating that structured state space models with carefully designed serialization can efficiently replace Transformer attention in global geometric reasoning. By combining local spatial priors and SSM-based global context—usually with adaptive ordering and/or multi-path traversal—modern Point Mamba variants now define new standards for efficiency and accuracy in point cloud analysis, segmentation, registration, completion, and generative modeling (Liang et al., 2024, Liu et al., 2024, Wang et al., 26 Jun 2025, Lin et al., 23 Jul 2025).