Point Cloud Mamba: Point Cloud Learning via State Space Model (2403.00762v4)
Abstract: Recently, state space models have exhibited strong global modeling capabilities and linear computational complexity in contrast to transformers. This research focuses on applying such architecture to more efficiently and effectively model point cloud data globally with linear computational complexity. In particular, for the first time, we demonstrate that Mamba-based point cloud methods can outperform previous methods based on transformer or multi-layer perceptrons (MLPs). To enable Mamba to process 3-D point cloud data more effectively, we propose a novel Consistent Traverse Serialization method to convert point clouds into 1-D point sequences while ensuring that neighboring points in the sequence are also spatially adjacent. Consistent Traverse Serialization yields six variants by permuting the order of \textit{x}, \textit{y}, and \textit{z} coordinates, and the synergistic use of these variants aids Mamba in comprehensively observing point cloud data. Furthermore, to assist Mamba in handling point sequences with different orders more effectively, we introduce point prompts to inform Mamba of the sequence's arrangement rules. Finally, we propose positional encoding based on spatial coordinate mapping to inject positional information into point cloud sequences more effectively. Point Cloud Mamba surpasses the state-of-the-art (SOTA) point-based method PointNeXt and achieves new SOTA performance on the ScanObjectNN, ModelNet40, ShapeNetPart, and S3DIS datasets. It is worth mentioning that when using a more powerful local feature extraction module, our PCM achieves 79.6 mIoU on S3DIS, significantly surpassing the previous SOTA models, DeLA and PTv3, by 5.5 mIoU and 4.9 mIoU, respectively.
- Graph Mamba: Towards Learning on Graphs with State Space Models. 2024.
- End-to-end object detection with transformers. In ECCV, 2020.
- Shapenet: An information-rich 3d model repository. arXiv:1512.03012, 2015.
- Pointgpt: Auto-regressively generative pre-training from point clouds. arXiv:2305.11487, 2023.
- An image is worth 16x16 words: Transformers for image recognition at scale. In ICLR, 2021.
- Explore in-context learning for 3d point cloud understanding. NeurIPS, 2023.
- Revisiting point cloud shape classification with a simple and effective baseline. In ICML, 2021.
- Mamba: Linear-time sequence modeling with selective state spaces. arXiv preprint arXiv:2312.00752, 2023.
- Efficiently modeling long sequences with structured state spaces. arXiv preprint arXiv:2111.00396, 2021.
- Efficiently modeling long sequences with structured state spaces. In ICLR, 2022.
- Pct: Point cloud transformer. In CVM, 2021.
- Pan-mamba: Effective pan-sharpening with state space model. arXiv preprint arXiv:2402.12192, 2024.
- Über die stetige abbildung einer linie auf ein flächenstück. Dritter Band: Analysis· Grundlagen der Mathematik· Physik Verschiedenes: Nebst Einer Lebensgeschichte, pages 1–2, 1935.
- Masked autoencoders in 3d point cloud representation learning. arXiv:2207.01545, 2022.
- A-cnn: Annularly convolutional neural networks on point clouds. In CVPR, 2019.
- 3d vision with transformers: A survey, 2022.
- Stratified transformer for 3d point cloud segmentation. In CVPR, 2022.
- Deepgcns: Making gcns go as deep as cnns. PAMI, 2021.
- Mamba-nd: Selective state space modeling for multi-dimensional data. arXiv preprint arXiv:2402.05892, 2024.
- Transformer-based visual segmentation: A survey. arXiv pre-print, 2023.
- Pointcnn: Convolution on x-transformed points. In NeurIPS, 2018.
- Pointmamba: A simple state space model for point cloud analysis. arXiv preprint arXiv:2402.10739, 2024.
- Masked discrimination for self-supervised learning on point clouds. In ECCV, 2022.
- Relation-shape convolutional neural network for point cloud analysis. In CVPR, 2019.
- Vmamba: Visual state space model. arXiv preprint arXiv:2401.10166, 2024.
- Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017.
- U-mamba: Enhancing long-range dependency for biomedical image segmentation. arXiv preprint arXiv:2401.04722, 2024.
- Rethinking network design and local geometry in point cloud: A simple residual mlp framework. In ICLR, 2022.
- Guy M Morton. A computer oriented geodetic data base and a new technique in file sequencing. 1966.
- Masked autoencoders for point cloud self-supervised learning. In ECCV, 2022.
- Pointnet: Deep learning on point sets for 3d classification and segmentation. In CVPR, 2017.
- Pointnet++: Deep hierarchical feature learning on point sets in a metric space. In NeurIPS, 2017.
- Assanet: An anisotropical separable set abstraction for efficient point cloud representation learning. In NeurIPS, 2021.
- Pointnext: Revisiting pointnet++ with improved training and scaling strategies. NeurIPS, 2022.
- Surface representation for point clouds. In CVPR, 2022.
- Vm-unet: Vision mamba unet for medical image segmentation. arXiv preprint arXiv:2402.02491, 2024.
- Mask3D: Mask Transformer for 3D Semantic Instance Segmentation. 2023.
- Mining point cloud local structures by kernel correlation and graph pooling. In CVPR, 2018.
- Roformer: Enhanced transformer with rotary position embedding. Neurocomputing, 568:127063, 2024.
- Superpoint transformer for 3d scene instance segmentation. AAAI, 2023.
- Kpconv: Flexible and deformable convolution for point clouds. In ICCV, 2019.
- Revisiting point cloud classification: A new benchmark dataset and classification model on real-world data. In ICCV, 2019.
- Attention is all you need. In NeurIPS, 2017.
- Skeleton-in-context: Unified skeleton sequence modeling with in-context learning. CVPR, 2024.
- Dynamic graph cnn for learning on point clouds. In TOG, 2019.
- Pointconv: Deep convolutional networks on 3d point clouds. In CVPR, 2019.
- Point transformer v3: Simpler, faster, stronger. In CVPR, 2024.
- Point transformer v2: Grouped vector attention and partition-based pooling. In NeurIPS, 2022.
- 3d shapenets: A deep representation for volumetric shapes. In CVPR, 2015.
- Walk in the cloud: Learning curves for point clouds shape analysis. In ICCV, 2021.
- Pointcontrast: Unsupervised pre-training for 3d point cloud understanding. In ECCV, 2020.
- Segmamba: Long-range sequential modeling mamba for 3d medical image segmentation. arXiv preprint arXiv:2401.13560, 2024.
- Paconv: Position adaptive convolution with dynamic kernel assembling on point clouds. In CVPR, 2021.
- Learning geometry-disentangled representation for complementary understanding of 3d object point cloud. In AAAI, 2021.
- Vivim: a video vision mamba for medical video object segmentation. arXiv preprint arXiv:2401.14168, 2024.
- A scalable active framework for region annotation in 3d shape collections. In TOG, 2016.
- Point-bert: Pre-training 3d point cloud transformers with masked point modeling. In CVPR, 2022.
- Self-supervised pretraining of 3d features on any point-cloud. In ICCV, 2021.
- Point transformer. In ICCV, 2021.
- Vision mamba: Efficient visual representation learning with bidirectional state space model. arXiv preprint arXiv:2401.09417, 2024.