Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

MAELi: Masked Autoencoder for Large-Scale LiDAR Point Clouds (2212.07207v5)

Published 14 Dec 2022 in cs.CV

Abstract: The sensing process of large-scale LiDAR point clouds inevitably causes large blind spots, i.e. regions not visible to the sensor. We demonstrate how these inherent sampling properties can be effectively utilized for self-supervised representation learning by designing a highly effective pre-training framework that considerably reduces the need for tedious 3D annotations to train state-of-the-art object detectors. Our Masked AutoEncoder for LiDAR point clouds (MAELi) intuitively leverages the sparsity of LiDAR point clouds in both the encoder and decoder during reconstruction. This results in more expressive and useful initialization, which can be directly applied to downstream perception tasks, such as 3D object detection or semantic segmentation for autonomous driving. In a novel reconstruction approach, MAELi distinguishes between empty and occluded space and employs a new masking strategy that targets the LiDAR's inherent spherical projection. Thereby, without any ground truth whatsoever and trained on single frames only, MAELi obtains an understanding of the underlying 3D scene geometry and semantics. To demonstrate the potential of MAELi, we pre-train backbones in an end-to-end manner and show the effectiveness of our unsupervised pre-trained weights on the tasks of 3D object detection and semantic segmentation.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (58)
  1. Learning Representations and Generative Models for 3D Point Clouds. In Proc. ICML, 2018.
  2. SemanticKITTI: A Dataset for Semantic Scene Understanding of LiDAR Sequences. In Proc. ICCV, 2019.
  3. ALSO: Automotive Lidar Self-supervision by Occupancy estimation. In Proc. CVPR, 2023.
  4. Deep Clustering for Unsupervised Learning of Visual Features. In Proc. ECCV, 2018.
  5. Unsupervised Learning of Visual Features by Contrasting Cluster Assignments. In Proc. NeurIPS, 2020.
  6. ShapeNet: An Information-Rich 3D Model Repository. arXiv:1512.03012, 2015.
  7. PiMAE: Point Cloud and Image Interactive Masked Autoencoders for 3D Object Detection. In Proc. CVPR, 2023.
  8. A Simple Framework for Contrastive Learning of Visual Representations. In Proc. ICML, 2020.
  9. 4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks. In Proc. CVPR, 2019.
  10. ScanNet: Richly-Annotated 3D Reconstructions of Indoor Scenes. In Proc. CVPR, 2017.
  11. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proc. NAACL: Human Language Technologies, 2019.
  12. Are we ready for autonomous driving? The KITTI vision benchmark suite. In Proc. CVPR, 2012.
  13. Learning a Predictable and Generative Vector Representation for Objects. In Proc. ECCV, 2016.
  14. 3D Semantic Segmentation With Submanifold Sparse Convolutional Networks. In Proc. CVPR, 2018.
  15. Masked Autoencoders Are Scalable Vision Learners. In Proc. CVPR, 2022.
  16. Momentum Contrast for Unsupervised Visual Representation Learning. In Proc. CVPR, 2020.
  17. Masked Autoencoder for Self-Supervised Pre-Training on Lidar Point Clouds. In WACV Workshop, 2023.
  18. Learning deep representations by mutual information estimation and maximization. In Proc. ICLR, 2019.
  19. Mask3D: Pre-Training 2D Vision Transformers by Learning Masked 3D Priors. In Proc. CVPR, 2023.
  20. What You See is What You Get: Exploiting Visibility for 3D Object Detection. In Proc. CVPR, 2020.
  21. Spatio-Temporal Self-Supervised Representation Learning for 3D Point Clouds. In Proc. ICCV, 2021.
  22. Adam: A Method for Stochastic Optimization. In Proc. ICLR, 2015.
  23. Exploring Geometry-Aware Contrast and Clustering Harmonization for Self-Supervised 3D Object Detection. In Proc. ICCV, 2021.
  24. KITTI-360: A Novel Dataset and Benchmarks for Urban Scene Understanding in 2D and 3D. IEEE TPAMI, 45(3):3292–3310, 2023.
  25. Masked Discrimination for Self-Supervised Learning on Point Clouds. In Proc. ECCV, 2022.
  26. One Million Scenes for Autonomous Driving: ONCE Dataset. In Proc. NeurIPS, 2021.
  27. Occupancy-MAE: Self-Supervised Pre-Training Large-Scale LiDAR Point Clouds With Masked Occupancy Autoencoders. Transactions on Intelligent Vehicles, pages 1–13, 2023.
  28. SegContrast: 3D Point Cloud Feature Representation Learning Through Self-Supervised Segment Discrimination. Robotics and Automation Letters, 7(2):2116–2123, 2022.
  29. Temporal Consistent 3D LiDAR Representation Learning for Semantic Perception in Autonomous Driving. In Proc. CVPR, 2023.
  30. Masked Autoencoders for Point Cloud Self-supervised Learning. In Proc. ECCV, 2022.
  31. Contrastive Predictive Autoencoders for Dynamic Point Cloud Self-Supervised Learning. In Proc. AAAI, 2023.
  32. PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection. In Proc. CVPR, 2020.
  33. From Points to Parts: 3D Object Detection From Point Cloud With Part-Aware and Part-Aggregation Network. IEEE TPAMI, 43(08):2647–2664, 2021.
  34. Leslie N. Smith. A disciplined approach to neural network hyper-parameters: Part 1 – learning rate, batch size, momentum, and weight decay. arXiv:1803.09820, 2018.
  35. Scalability in Perception for Autonomous Driving: Waymo Open Dataset. In Proc. CVPR, 2020.
  36. OpenPCDet Development Team. OpenPCDet: An Open-source Toolbox for 3D Object Detection from Point Clouds. https://github.com/open-mmlab/OpenPCDet, 2020.
  37. GeoMAE: Masked Geometric Target Prediction for Self-Supervised Point Cloud Pre-Training. In Proc. CVPR, 2023.
  38. Unsupervised Point Cloud Pre-Training via Occlusion Completion. In Proc. ICCV, 2021.
  39. Exploring Cross-Image Pixel Contrast for Semantic Segmentation. In Proc. ICCV, 2021.
  40. Train in Germany, Test in the USA: Making 3D Object Detectors Generalize. In Proc. CVPR, 2020.
  41. Point Cloud Completion by Skip-Attention Network With Hierarchical Folding. In Proc. CVPR, 2020.
  42. Masked Scene Contrast: A Scalable Framework for Unsupervised 3D Representation Learning. In Proc. CVPR, 2023.
  43. Spatiotemporal Self-Supervised Learning for Point Clouds in the Wild. In Proc. CVPR, 2023.
  44. 3D ShapeNets: A Deep Representation for Volumetric Shapes. In Proc. CVPR, 2015.
  45. Masked Autoencoder for Pre-Training on 3D Point Cloud Object Detection. Mathematics, 10(19):3549, 2022.
  46. PointContrast: Unsupervised Pre-training for 3D Point Cloud Understanding. In Proc. ECCV, 2020.
  47. Behind the Curtain: Learning Occluded Shapes for 3D Object Detection. In Proc. AAAI, 2022.
  48. MV-JAR: Masked Voxel Jigsaw and Reconstruction for LiDAR-Based Self-Supervised Pre-Training. In Proc. CVPR, 2023.
  49. Implicit Autoencoder for Point-Cloud Self-Supervised Representation Learning. In Proc. ICCV, 2023.
  50. SECOND: Sparsely Embedded Convolutional Detection. Sensors, 18(10):3337, 2018.
  51. GD-MAE: Generative Decoder for MAE Pre-Training on LiDAR Point Clouds. In Proc. CVPR, 2023.
  52. Progressive Seed Generation Auto-Encoder for Unsupervised Point Cloud Learning. In Proc. ICCV, 2021.
  53. Semi-supervised 3D Object Detection with Proficient Teachers. In Proc. ECCV, 2022.
  54. ProposalContrast: Unsupervised Pre-training for LiDAR-based 3D Object Detection. In Proc. ECCV, 2022.
  55. Center-Based 3D Object Detection and Tracking. In Proc. CVPR, 2021.
  56. Point-BERT: Pre-Training 3D Point Cloud Transformers With Masked Point Modeling. In Proc. CVPR, 2022.
  57. Point-M2AE: Multi-scale Masked Autoencoders for Hierarchical Point Cloud Pre-training. In Proc. NeurIPS, 2022.
  58. Self-Supervised Pretraining of 3D Features on Any Point-Cloud. In Proc. ICCV, 2021.
Citations (13)

Summary

We haven't generated a summary for this paper yet.