MambaMOS: LiDAR-based 3D Moving Object Segmentation with Motion-aware State Space Model (2404.12794v2)
Abstract: LiDAR-based Moving Object Segmentation (MOS) aims to locate and segment moving objects in point clouds of the current scan using motion information from previous scans. Despite the promising results achieved by previous MOS methods, several key issues, such as the weak coupling of temporal and spatial information, still need further study. In this paper, we propose a novel LiDAR-based 3D Moving Object Segmentation with Motion-aware State Space Model, termed MambaMOS. Firstly, we develop a novel embedding module, the Time Clue Bootstrapping Embedding (TCBE), to enhance the coupling of temporal and spatial information in point clouds and alleviate the issue of overlooked temporal clues. Secondly, we introduce the Motion-aware State Space Model (MSSM) to endow the model with the capacity to understand the temporal correlations of the same object across different time steps. Specifically, MSSM emphasizes the motion states of the same object at different time steps through two distinct temporal modeling and correlation steps. We utilize an improved state space model to represent these motion differences, significantly modeling the motion states. Finally, extensive experiments on the SemanticKITTI-MOS and KITTI-Road benchmarks demonstrate that the proposed MambaMOS achieves state-of-the-art performance. The source code is publicly available at https://github.com/Terminal-K/MambaMOS.
- Unitary evolution recurrent neural networks. In International Conference on Machine Learning (ICML).
- Mapping the static parts of dynamic scenes from 3D LiDAR point clouds exploiting ground segmentation. In 2021 European Conference on Mobile Robots (ECMR).
- SemanticKITTI: A Dataset for Semantic Scene Understanding of LiDAR Sequences. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV).
- The lovász-softmax loss: A tractable surrogate for the optimization of the intersection-over-union measure in neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
- Moving object segmentation in 3D LiDAR data: A learning-based approach exploiting sequential data. IEEE Robotics and Automation Letters (RA-L) 6, 4 (2021), 6529–6536.
- Automatic labeling to generate training data for online LiDAR-based moving object segmentation. IEEE Robotics and Automation Letters (RA-L) 7, 3 (2022), 6107–6114.
- SuMa++: Efficient LiDAR-based Semantic SLAM. In 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).
- MF-MOS: A Motion-Focused Model for Moving Object Segmentation. In 2024 IEEE International Conference on Robotics and Automation (ICRA).
- SalsaNext: Fast, Uncertainty-Aware Semantic Segmentation of LiDAR Point Clouds. In International Symposium on Visual Computing (ISVC).
- Flashattention: Fast and memory-efficient exact attention with io-awareness. Advances in Neural Information Processing Systems (NeurIPS) (2022).
- The pascal visual object classes (voc) challenge. International Journal of Computer Vision (IJCV) 88 (2010), 303–338.
- RangeDet: In Defense of Range View for LiDAR-based 3D Object Detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV).
- Vision meets Robotics: The KITTI Dataset. International Journal of Robotics Research (IJRR) 32, 11 (2013), 1231–1237.
- Albert Gu. 2023. Modeling Sequences with Structured State Spaces. Stanford University.
- Albert Gu and Tri Dao. 2023. Mamba: Linear-time sequence modeling with selective state spaces. arXiv preprint arXiv:2312.00752 (2023).
- Deep residual learning for image recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
- David Hilbert. 1935. Dritter Band: Analysis· Grundlagen der Mathematik· Physik Verschiedenes: Nebst Einer Lebensgeschichte. (1935).
- Transformer quality in linear time. In International Conference on Machine Learning (ICML).
- Giseop Kim and Ayoung Kim. 2020. Remove, then revert: Static point cloud map construction using multiresolution range images. In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).
- RVMOS: Range-View Moving Object Segmentation Leveraged by Semantic and Motion Features. IEEE Robotics and Automation Letters (RA-L) 7, 3 (2022), 8044–8051.
- Rethinking range view representation for lidar segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV).
- Stratified Transformer for 3D Point Cloud Segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
- Qipeng Li and Yuan Zhuang. 2023. An efficient image-guided-based 3D point cloud moving object segmentation with transformer-attention in autonomous driving. International Journal of Applied Earth Observation and Geoinformation 123 (2023), 103488.
- Multi-sensor fusion for robust localization with moving object segmentation in complex dynamic 3D scenes. International Journal of Applied Earth Observation and Geoinformation (2023).
- ERASOR: Egocentric ratio of pseudo occupancy-based dynamic object removal for static 3D point cloud map building. IEEE Robotics and Automation Letters (RA-L) 6, 2 (2021), 2272–2279.
- ERASOR2: Instance-aware robust 3D mapping of the static world in dynamic scenes. In Robotics: Science and Systems (RSS).
- Point Mamba: A Novel Point Cloud Backbone Based on State Space Model with Octree-Based Ordering Strategy. arXiv preprint arXiv:2403.06467 (2024).
- Ilya Loshchilov and Frank Hutter. 2018. Decoupled Weight Decay Regularization. In International Conference on Learning Representations (ICLR).
- Receding Moving Object Segmentation in 3D LiDAR Data Using Sparse 4D Convolutions. IEEE Robotics and Automation Letters (RA-L) 7, 3 (2022), 7503–7510.
- Building volumetric beliefs for dynamic environments exploiting map-based moving object segmentation. IEEE Robotics and Automation Letters (RA-L) 8, 8 (2023), 5180–5187.
- RangeNet ++: Fast and Accurate LiDAR Semantic Segmentation. In 2019 IEEE/RSJ international conference on intelligent robots and systems (IROS).
- LiMoSeg: Real-time Bird’s Eye View based LiDAR Motion Segmentation. In International Conference on Computer Vision Theory and Applications (VISAPP).
- Guy M Morton. 1966. A computer oriented geodetic data base and a new technique in file sequencing. (1966).
- Dynamic Object Aware LiDAR SLAM based on Automatic Generation of Training Data. In 2021 IEEE International Conference on Robotics and Automation (ICRA).
- U-Net: Convolutional networks for biomedical image segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI).
- Johannes Schauer and Andreas Nüchter. 2018. The Peopleremover - Removing Dynamic Objects From 3-D Point Cloud Data by Traversing a Voxel Occupancy Grid. IEEE Robotics and Automation Letters (RA-L) 3, 3 (2018), 1679–1686.
- Dynablox: Real-Time Detection of Diverse Dynamic Objects in Complex Environments. IEEE Robotics and Automation Letters (RA-L) 8, 10 (2023), 6259–6266.
- SSF-MOS: Semantic Scene Flow Assisted Moving Object Segmentation for Autonomous Vehicles. IEEE Transactions on Instrumentation and Measurement (TIM) 73 (2024), 1–12.
- Efficient Spatial-Temporal Information Fusion for LiDAR-Based 3D Moving Object Segmentation. In 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).
- Attention is all you need. In Advances in Neural Information Processing Systems (NeurIPS).
- InsMOS: Instance-aware moving object segmentation in LiDAR data. In 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).
- Peng-Shuai Wang. 2023. OctFormer: Octree-based Transformers for 3D Point Clouds. ACM Transactions on Graphics (TOG) 42, 4 (2023), 1–11.
- Point Transformer V3: Simpler, Faster, Stronger. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
- Swin3D: A Pretrained Transformer Backbone for 3D Indoor Scene Understanding. arXiv preprint arXiv:2304.06906 (2023).
- PolarNet: An Improved Grid Representation for Online LiDAR Point Clouds Semantic Segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
- MotionBEV: Attention-Aware Online LiDAR Moving Object Segmentation With Bird’s Eye View Based Appearance and Motion Features. IEEE Robotics and Automation Letters (RA-L) 8, 12 (2023), 8074–8081.