Point Clouds Are Specialized Images: A Knowledge Transfer Approach for 3D Understanding (2307.15569v2)
Abstract: Self-supervised representation learning (SSRL) has gained increasing attention in point cloud understanding, in addressing the challenges posed by 3D data scarcity and high annotation costs. This paper presents PCExpert, a novel SSRL approach that reinterprets point clouds as "specialized images". This conceptual shift allows PCExpert to leverage knowledge derived from large-scale image modality in a more direct and deeper manner, via extensively sharing the parameters with a pre-trained image encoder in a multi-way Transformer architecture. The parameter sharing strategy, combined with a novel pretext task for pre-training, i.e., transformation estimation, empowers PCExpert to outperform the state of the arts in a variety of tasks, with a remarkable reduction in the number of trainable parameters. Notably, PCExpert's performance under LINEAR fine-tuning (e.g., yielding a 90.02% overall accuracy on ScanObjectNN) has already approached the results obtained with FULL model fine-tuning (92.66%), demonstrating its effective and robust representation capability.
- Crosspoint: Self-supervised cross-modal contrastive learning for 3d point cloud understanding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9902–9912, 2022.
- Vlmo: Unified vision-language pre-training with mixture-of-modality-experts. Advances in Neural Information Processing Systems, 35:32897–32912, 2022.
- Semantickitti: A dataset for semantic scene understanding of lidar sequences. In Proceedings of the IEEE/CVF international conference on computer vision, pages 9297–9307, 2019.
- Shapenet: An information-rich 3d model repository. arXiv preprint arXiv:1512.03012, 2015.
- Autoencoders as cross-modal teachers: Can pretrained 2d image transformers help 3d representation learning? In The Eleventh International Conference on Learning Representations, 2023.
- An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
- Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity. The Journal of Machine Learning Research, 23(1):5232–5270, 2022.
- Ulip: Learning a unified representation of language, images, and point clouds for 3d understanding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1179–1189, 2023.
- Imagebind: One embedding space to bind them all. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15180–15190, 2023.
- Lidarclip or: How i learned to talk to point clouds. arXiv preprint arXiv:2212.06858, 2022.
- Frozen clip model is efficient point cloud backbone. arXiv preprint arXiv:2212.04098, 2022.
- Cross-modal center loss for 3d cross-modal retrieval. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3142–3151, 2021.
- Toward extracting and exploiting generalizable knowledge of deep 2d transformations in computer vision. Neurocomputing, 562:126882, 2023.
- A closer look at invariances in self-supervised pre-training for 3d vision. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXX, pages 656–673. Springer, 2022.
- Pointcnn: Convolution on x-transformed points. Advances in neural information processing systems, 31, 2018.
- Simipu: Simple 2d image and 3d point cloud unsupervised pre-training for spatial-aware visual representations. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 1500–1508, 2022.
- Learning from 2d: Contrastive pixel-to-point knowledge transfer for 3d pretraining. arXiv preprint arXiv:2104.04687, 2021.
- Decoupled weight decay regularization. In International Conference on Learning Representations, 2018.
- Rethinking network design and local geometry in point cloud: A simple residual mlp framework. In International Conference on Learning Representations, 2021.
- Masked autoencoders for point cloud self-supervised learning. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part II, pages 604–621. Springer, 2022.
- Pointnet: Deep learning on point sets for 3d classification and segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 652–660, 2017.
- Pointnet++: Deep hierarchical feature learning on point sets in a metric space. Advances in neural information processing systems, 30, 2017.
- Contrast with reconstruct: Contrastive 3d representation learning guided by generative pretraining. In International Conference on Machine Learning (ICML), 2023.
- Pointnext: Revisiting pointnet++ with improved training and scaling strategies. Advances in Neural Information Processing Systems, 35:23192–23204, 2022.
- Pix4point: Image pretrained transformers for 3d point cloud understanding. arXiv preprint arXiv:2208.12259, 2022.
- Geometric back-projection network for point cloud classification. IEEE Transactions on Multimedia, 24:1943–1955, 2021.
- Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
- A survey on lidar scanning mechanisms. Electronics, 9(5):741, 2020.
- Accelerating 3d deep learning with pytorch3d. arXiv:2007.08501, 2020.
- Efficient 3d scene semantic segmentation via active learning on rendered 2d images. IEEE Transactions on Image Processing, 2023.
- Vipformer: Efficient vision-and-pointcloud transformer for unsupervised pointcloud understanding. arXiv preprint arXiv:2303.14376, 2023.
- Point-lgmask: Local and global contexts embedding for point cloud pre-training with multi-ratio masking. IEEE Transactions on Multimedia, 2023.
- Self-supervised learning with multi-view rendering for 3d point cloud analysis. In Proceedings of the Asian Conference on Computer Vision, pages 3086–3103, 2022.
- Revisiting point cloud classification: A new benchmark dataset and classification model on real-world data. In Proceedings of the IEEE/CVF international conference on computer vision, pages 1588–1597, 2019.
- Attention is all you need. Advances in neural information processing systems, 30, 2017.
- Unsupervised point cloud pre-training via occlusion completion. In Proceedings of the IEEE/CVF international conference on computer vision, pages 9782–9792, 2021.
- Image as a foreign language: Beit pretraining for vision and vision-language tasks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 19175–19186, 2023.
- Dynamic graph cnn for learning on point clouds. Acm Transactions On Graphics (tog), 38(5):1–12, 2019.
- P2p: Tuning pre-trained image models for point cloud analysis with point-to-pixel prompting. Advances in neural information processing systems, 35:14388–14402, 2022.
- Self-supervised intra-modal and cross-modal contrastive learning for point cloud understanding. IEEE Transactions on Multimedia, 2023.
- 3d shapenets: A deep representation for volumetric shapes. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1912–1920, 2015.
- Image2point: 3d point-cloud understanding with 2d image pretrained models. In European Conference on Computer Vision, pages 638–656. Springer, 2022.
- Semantic navigation of powerpoint-based lecture video for autonote generation. IEEE Transactions on Learning Technologies, 16(1):1–17, 2022.
- Disn: Deep implicit surface network for high-quality single-view 3d reconstruction. Advances in neural information processing systems, 32, 2019.
- Point-bert: Pre-training 3d point cloud transformers with masked point modeling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 19313–19322, 2022.
- Clip-fo3d: Learning free open-world 3d scene representations from 2d dense clip. arXiv preprint arXiv:2303.04748, 2023.
- Point-m2ae: Multi-scale masked autoencoders for hierarchical point cloud pre-training. In Advances in Neural Information Processing Systems, 2022.
- Pointclip: Point cloud understanding by clip. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8552–8562, 2022.
- Self-supervised pretraining of 3d features on any point-cloud. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 10252–10263, 2021.
- Pointcmc: Cross-modal multi-scale correspondences learning for point cloud understanding. arXiv preprint arXiv:2211.12032, 2022.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.