Mitigating Object Dependencies: Improving Point Cloud Self-Supervised Learning through Object Exchange (2404.07504v1)
Abstract: In the realm of point cloud scene understanding, particularly in indoor scenes, objects are arranged following human habits, resulting in objects of certain semantics being closely positioned and displaying notable inter-object correlations. This can create a tendency for neural networks to exploit these strong dependencies, bypassing the individual object patterns. To address this challenge, we introduce a novel self-supervised learning (SSL) strategy. Our approach leverages both object patterns and contextual cues to produce robust features. It begins with the formulation of an object-exchanging strategy, where pairs of objects with comparable sizes are exchanged across different scenes, effectively disentangling the strong contextual dependencies. Subsequently, we introduce a context-aware feature learning strategy, which encodes object patterns without relying on their specific context by aggregating object features across various scenes. Our extensive experiments demonstrate the superiority of our method over existing SSL techniques, further showing its better robustness to environmental changes. Moreover, we showcase the applicability of our approach by transferring pre-trained models to diverse point cloud datasets.
- Self-supervised learning for domain adaptation on point clouds. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 123–133, 2021.
- Learning representations and generative models for 3d point clouds. In International Conference on Machine Learning, pages 40–49. PMLR, 2018.
- A semantic classification approach for indoor robot navigation. Electronics, 11(13):2063, 2022.
- 3d semantic parsing of large-scale indoor spaces. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1534–1543, 2016.
- Self-improving semantic perception for indoor localisation. In Conference on Robot Learning, pages 1211–1222. PMLR, 2022.
- Gridmask data augmentation. arXiv preprint arXiv:2001.04086, 2020.
- 4dcontrast: Contrastive learning with dynamic correspondences for 3d scene understanding. In European Conference on Computer Vision, pages 543–560. Springer, 2022.
- 4d spatio-temporal convnets: Minkowski convolutional neural networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 3075–3084, 2019.
- Scannet: Richly-annotated 3d reconstructions of indoor scenes. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 5828–5839, 2017.
- Improved regularization of convolutional neural networks with cutout. arXiv preprint arXiv:1708.04552, 2017.
- Modeling visual context is key to augmenting object detection datasets. In Proceedings of the European Conference on Computer Vision (ECCV), pages 364–380, 2018.
- Cut, paste and learn: Surprisingly easy synthesis for instance detection. In Proceedings of the IEEE international conference on computer vision, pages 1301–1310, 2017.
- Efficient graph-based image segmentation. International journal of computer vision, 59:167–181, 2004.
- Simple copy-paste is a strong data augmentation method for instance segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2918–2928, 2021.
- Bootstrap your own latent-a new approach to self-supervised learning. Advances in Neural Information Processing Systems, 33:21271–21284, 2020.
- Exploring data-efficient 3d scene understanding with contrastive scene contexts. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15587–15597, 2021.
- Spatio-temporal self-supervised representation learning for 3d point clouds. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 6535–6545, 2021.
- Pointgroup: Dual-set point grouping for 3d instance segmentation. In Proceedings of the IEEE/CVF conference on computer vision and Pattern recognition, pages 4867–4876, 2020.
- Self-supervised modal and view invariant feature learning. arXiv preprint arXiv:2005.14169, 2020.
- Boosting low-data instance segmentation by unsupervised pre-training with saliency prompt. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15485–15494, 2023.
- Mix3d: Out-of-context data augmentation for 3d scenes. In 2021 International Conference on 3D Vision (3DV), pages 116–125. IEEE, 2021.
- Segcontrast: 3d point cloud feature representation learning through self-supervised segment discrimination. IEEE Robotics Autom. Lett., 7(2):2116–2123, 2022.
- Mind your augmentation: The key to decoupling dense self-supervised learning. In The Twelfth International Conference on Learning Representations, 2024.
- The synthia dataset: A large collection of synthetic images for semantic segmentation of urban scenes. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3234–3243, 2016.
- Self-supervised deep learning on point clouds by reconstructing space. Advances in Neural Information Processing Systems, 32, 2019.
- Hide-and-seek: A data augmentation technique for weakly-supervised localization and beyond. arXiv preprint arXiv:1811.02545, 2018.
- Jack Sklansky. Finding the convex hull of a simple polygon. Pattern Recognition Letters, 1(2):79–83, 1982.
- Searching efficient 3d architectures with sparse point-voxel convolution. In European conference on computer vision, pages 685–702. Springer, 2020.
- Learning spatiotemporal occupancy grid maps for lifelong navigation in dynamic scenes. In 2022 International Conference on Robotics and Automation (ICRA), pages 484–490. IEEE, 2022.
- Learning to generate synthetic data via compositing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 461–470, 2019.
- Peng-Shuai Wang. Octformer: Octree-based transformers for 3d point clouds. ACM Transactions on Graphics (SIGGRAPH), 42(4), 2023.
- Masked scene contrast: A scalable framework for unsupervised 3d representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9415–9424, 2023a.
- Spatiotemporal self-supervised learning for point clouds in the wild. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 5251–5260, 2023b.
- Pointcontrast: Unsupervised pre-training for 3d point cloud understanding. In European Conference on Computer Vision, pages 574–591. Springer, 2020.
- Weakly supervised point cloud segmentation via deep morphological semantic information embedding. CAAI Transactions on Intelligence Technology, 2023.
- Learning object placement by inpainting for compositional data augmentation. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XIII 16, pages 566–581. Springer, 2020.
- Self-supervised pretraining of 3d features on any point-cloud. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 10252–10263, 2021.
- In-place scene labelling and understanding with implicit scene representation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 15838–15847, 2021.
- Random erasing data augmentation. In Proceedings of the AAAI conference on artificial intelligence, pages 13001–13008, 2020.