Multi-modality Affinity Inference for Weakly Supervised 3D Semantic Segmentation (2312.16578v2)
Abstract: 3D point cloud semantic segmentation has a wide range of applications. Recently, weakly supervised point cloud segmentation methods have been proposed, aiming to alleviate the expensive and laborious manual annotation process by leveraging scene-level labels. However, these methods have not effectively exploited the rich geometric information (such as shape and scale) and appearance information (such as color and texture) present in RGB-D scans. Furthermore, current approaches fail to fully leverage the point affinity that can be inferred from the feature extraction network, which is crucial for learning from weak scene-level labels. Additionally, previous work overlooks the detrimental effects of the long-tailed distribution of point cloud data in weakly supervised 3D semantic segmentation. To this end, this paper proposes a simple yet effective scene-level weakly supervised point cloud segmentation method with a newly introduced multi-modality point affinity inference module. The point affinity proposed in this paper is characterized by features from multiple modalities (e.g., point cloud and RGB), and is further refined by normalizing the classifier weights to alleviate the detrimental effects of long-tailed distribution without the need of the prior of category distribution. Extensive experiments on the ScanNet and S3DIS benchmarks verify the effectiveness of our proposed method, which outperforms the state-of-the-art by ~4% to ~6% mIoU. Codes are released at https://github.com/Sunny599/AAAI24-3DWSSG-MMA.
- Learning pixel-level semantic affinity with image-level supervision for weakly supervised semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, 4981–4990.
- Joint 2d-3d-semantic data for indoor scene understanding. arXiv preprint arXiv:1702.01105.
- 3d semantic parsing of large-scale indoor spaces. In Proceedings of the IEEE conference on computer vision and pattern recognition, 1534–1543.
- What’s the point: Semantic segmentation with point supervision. In European conference on computer vision, 549–565. Springer.
- Combining labeled and unlabeled data with co-training. In Proceedings of the eleventh annual conference on Computational learning theory, 92–100.
- 4d spatio-temporal convnets: Minkowski convolutional neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 3075–3084.
- Class-balanced loss based on effective number of samples. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 9268–9277.
- Scannet: Richly-annotated 3d reconstructions of indoor scenes. Proceedings of the IEEE conference on computer vision and pattern recognition, 5828–5839.
- Exploring data-efficient 3d scene understanding with contrastive scene contexts. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 15587–15597.
- Randla-net: Efficient semantic segmentation of large-scale point clouds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 11108–11117.
- Rethinking class-balanced methods for long-tailed visual recognition from a domain adaptation perspective. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 7610–7619.
- Decoupling Representation and Classifier for Long-Tailed Recognition. In International Conference on Learning Representations.
- Stratified Transformer for 3D Point Cloud Segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8500–8509.
- Pointcnn: Convolution on x-transformed points. In Advances in neural information processing systems, volume 31.
- One thing one click: A self-training approach for weakly supervised 3d semantic segmentation. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 1726–1736.
- A framework for multiple-instance learning. Advances in neural information processing systems, 10.
- Pointnet: Deep learning on point sets for 3d classification and segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, 652–660.
- Pointnet++: Deep hierarchical feature learning on point sets in a metric space. Advances in neural information processing systems, 30.
- 3d spatial recognition without spatially labeled 3d. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 13204–13213.
- Pointrcnn: 3d object proposal generation and detection from point cloud. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 770–779.
- Kpconv: Flexible and deformable convolution for point clouds. In Proceedings of the IEEE/CVF international conference on computer vision, 6411–6420.
- Attention is all you need. Advances in neural information processing systems, 30.
- Graph attention convolution for point cloud semantic segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 10296–10305.
- Adaptive class suppression loss for long-tail object detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 3103–3112.
- Attentive Feature Augmentation for Long-Tailed Visual Recognition. IEEE Transactions on Circuits and Systems for Video Technology, 32(9): 5803–5816.
- Dynamic graph cnn for learning on point clouds. Acm Transactions On Graphics (tog), 38(5): 1–12.
- Multi-path region mining for weakly supervised 3d semantic segmentation on point clouds. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 4384–4393.
- Pointconv: Deep convolutional networks on 3d point clouds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 9621–9630.
- Paconv: Position adaptive convolution with dynamic kernel assembling on point clouds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 3173–3182.
- Weakly supervised semantic point cloud segmentation: Towards 10x fewer labels. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 13706–13715.
- An MIL-Derived Transformer for Weakly Supervised Point Cloud Segmentation. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 11830–11839.
- Weakly supervised semantic segmentation for large-scale point cloud. Proceedings of the AAAI Conference on Artificial Intelligence, 35(4): 3421–3429.
- Perturbed Self-Distillation: Weakly Supervised Large-Scale Point Cloud Semantic Segmentation. In 2021 IEEE/CVF International Conference on Computer Vision (ICCV), 15500–15508.
- Point transformer. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 16259–16268.
- Learning deep features for discriminative localization. In Proceedings of the IEEE conference on computer vision and pattern recognition, 2921–2929.