Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
194 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Rethinking Few-shot 3D Point Cloud Semantic Segmentation (2403.00592v1)

Published 1 Mar 2024 in cs.CV

Abstract: This paper revisits few-shot 3D point cloud semantic segmentation (FS-PCS), with a focus on two significant issues in the state-of-the-art: foreground leakage and sparse point distribution. The former arises from non-uniform point sampling, allowing models to distinguish the density disparities between foreground and background for easier segmentation. The latter results from sampling only 2,048 points, limiting semantic information and deviating from the real-world practice. To address these issues, we introduce a standardized FS-PCS setting, upon which a new benchmark is built. Moreover, we propose a novel FS-PCS model. While previous methods are based on feature optimization by mainly refining support features to enhance prototypes, our method is based on correlation optimization, referred to as Correlation Optimization Segmentation (COSeg). Specifically, we compute Class-specific Multi-prototypical Correlation (CMC) for each query point, representing its correlations to category prototypes. Then, we propose the Hyper Correlation Augmentation (HCA) module to enhance CMC. Furthermore, tackling the inherent property of few-shot training to incur base susceptibility for models, we propose to learn non-parametric prototypes for the base classes during training. The learned base prototypes are used to calibrate correlations for the background class through a Base Prototypes Calibration (BPC) module. Experiments on popular datasets demonstrate the superiority of COSeg over existing methods. The code is available at: https://github.com/ZhaochongAn/COSeg

Definition Search Book Streamline Icon: https://streamlinehq.com
References (58)
  1. 3d semantic parsing of large-scale indoor spaces. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1534–1543, 2016.
  2. Point convolutional neural networks by extension operators. arXiv preprint arXiv:1803.10091, 2018.
  3. Layer normalization. arXiv preprint arXiv:1607.06450, 2016.
  4. Semantickitti: A dataset for semantic scene understanding of lidar sequences. In Proceedings of the IEEE/CVF international conference on computer vision, pages 9297–9307, 2019.
  5. Multi-view 3d object detection network for autonomous driving. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pages 1907–1915, 2017.
  6. Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289, 2015.
  7. Scannet: Richly-annotated 3d reconstructions of indoor scenes. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 5828–5839, 2017.
  8. Vote3deep: Fast object detection in 3d point clouds using efficient convolutional neural networks. In IEEE International Conference on Robotics and Automation (ICRA), pages 1355–1361. IEEE, 2017.
  9. Exploring spatial context for 3d semantic segmentation of point clouds. In Proceedings of the IEEE international conference on computer vision workshops, pages 716–724, 2017.
  10. Know what your neighbors do: 3d semantic segmentation of point clouds. In Proceedings of the European Conference on Computer Vision (ECCV) Workshops, pages 0–0, 2018.
  11. Prototype adaption and projection for few-and zero-shot 3d point cloud semantic segmentation. IEEE Transactions on Image Processing, 2023.
  12. Randla-net: Efficient semantic segmentation of large-scale point clouds. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11108–11117, 2020.
  13. Pointwise convolutional neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 984–993, 2018.
  14. Pointsift: A sift-like network module for 3d point cloud semantic segmentation. arXiv preprint arXiv:1807.00652, 2018.
  15. Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning, pages 5156–5165. PMLR, 2020.
  16. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907, 2016.
  17. A-cnn: Annularly convolutional neural networks on point clouds. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 7421–7430, 2019.
  18. Stratified transformer for 3d point cloud segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8500–8509, 2022.
  19. Learning what not to segment: A new perspective on few-shot segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8057–8067, 2022.
  20. Spherical kernel for efficient graph convolution on 3d point clouds. IEEE transactions on pattern analysis and machine intelligence, 43(10):3664–3680, 2020.
  21. Pointcnn: Convolution on x-transformed points. Advances in neural information processing systems, 31, 2018.
  22. Convolution in the cloud: Learning deformable kernels in 3d graph convolution networks for point cloud analysis. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1800–1809, 2020.
  23. Relation-shape convolutional neural network for point cloud analysis. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8895–8904, 2019a.
  24. Point-voxel cnn for efficient 3d deep learning. Advances in Neural Information Processing Systems, 32, 2019b.
  25. Bidirectional feature globalization for few-shot semantic segmentation of 3d point cloud scenes. In 2022 International Conference on 3D Vision (3DV), pages 505–514. IEEE, 2022.
  26. Rangenet++: Fast and accurate lidar semantic segmentation. In 2019 IEEE/RSJ international conference on intelligent robots and systems (IROS), pages 4213–4220. IEEE, 2019.
  27. Geometric deep learning on graphs and manifolds using mixture model cnns. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 5115–5124, 2017.
  28. Pyramid architecture for multi-scale processing in point cloud segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 17284–17294, 2022.
  29. Boosting few-shot 3d point cloud segmentation via query-guided enhancement. arXiv preprint arXiv:2308.03177, 2023.
  30. Fast point transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16949–16958, 2022.
  31. Point cloud semantic segmentation using a deep learning framework for cultural heritage. Remote Sensing, 12(6):1005, 2020.
  32. Pointnet: Deep learning on point sets for 3d classification and segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 652–660, 2017a.
  33. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. Advances in neural information processing systems, 30, 2017b.
  34. 3d graph neural networks for rgbd semantic segmentation. In Proceedings of the IEEE international conference on computer vision, pages 5199–5208, 2017c.
  35. Learning inner-group relations on point clouds. In Proceedings of the IEEE/CVF international conference on computer vision, pages 15477–15487, 2021.
  36. Meta-learning for semi-supervised few-shot classification. arXiv preprint arXiv:1803.00676, 2018.
  37. Mining point cloud local structures by kernel correlation and graph pooling. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4548–4557, 2018.
  38. Prototypical networks for few-shot learning. Advances in neural information processing systems, 30, 2017.
  39. Tangent convolutions for dense prediction in 3d. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3887–3896, 2018.
  40. Kpconv: Flexible and deformable convolution for point clouds. In Proceedings of the IEEE/CVF international conference on computer vision, pages 6411–6420, 2019.
  41. Prior guided feature enrichment network for few-shot segmentation. IEEE transactions on pattern analysis and machine intelligence, 44(2):1050–1065, 2020.
  42. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  43. Feastnet: Feature-steered graph convolutions for 3d shape analysis. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2598–2606, 2018.
  44. Matching networks for one shot learning. Advances in neural information processing systems, 29, 2016.
  45. Few-shot point cloud semantic segmentation via contrastive self-supervision and multi-resolution attention. In IEEE International Conference on Robotics and Automation (ICRA), pages 2811–2817. IEEE, 2023.
  46. Graph attention convolution for point cloud semantic segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10296–10305, 2019a.
  47. Dynamic graph cnn for learning on point clouds. ACM Transactions on Graphics (tog), 38(5):1–12, 2019b.
  48. Spidercnn: Deep learning on point sets with parameterized convolutional filters. In Proceedings of the European conference on computer vision (ECCV), pages 87–102, 2018.
  49. 3d recurrent neural networks with context fusion for point cloud semantic segmentation. In Proceedings of the European conference on computer vision (ECCV), pages 403–417, 2018.
  50. Patchformer: An efficient point transformer with patch attention. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11799–11808, 2022.
  51. Few-shot 3d point cloud semantic segmentation via stratified class-specific attention based transformer network. In AAAI, 2023a.
  52. Improving graph representation for point cloud segmentation via attentive filtering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1244–1254, 2023b.
  53. Sg-one: Similarity guidance network for one-shot semantic segmentation. IEEE transactions on cybernetics, 50(9):3855–3865, 2020.
  54. Shellnet: Efficient point cloud convolutional neural networks using concentric shells statistics. In Proceedings of the IEEE/CVF international conference on computer vision, pages 1607–1616, 2019.
  55. Point transformer. In Proceedings of the IEEE/CVF international conference on computer vision, pages 16259–16268, 2021a.
  56. Few-shot 3d point cloud semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8873–8882, 2021b.
  57. Adaptive graph convolution for point cloud analysis. In Proceedings of the IEEE/CVF international conference on computer vision, pages 4965–4974, 2021.
  58. Cross-class bias rectification for point cloud few-shot segmentation. IEEE Transactions on Multimedia, 2023.
Citations (4)

Summary

  • The paper redefines FS-3D segmentation by introducing COSeg, a novel model that addresses foreground leakage and sparse point issues.
  • COSeg leverages Class-specific Multi-prototypical Correlation and Hyper Correlation Augmentation to enhance semantic context.
  • Experimental results on S3DIS and ScanNet demonstrate significant mIoU improvements, establishing COSeg as state-of-the-art.

Insights into "Rethinking Few-shot 3D Point Cloud Semantic Segmentation"

This paper presents a thorough reevaluation of few-shot 3D point cloud semantic segmentation (FS-PCS) paradigms, identifying and addressing two primary issues that distort performance evaluations: foreground leakage and sparse point distribution. The proposed solutions introduce a more rigorous FS-PCS setting alongside a novel model, termed Correlation Optimization Segmentation (COSeg), aiming for both methodological and performance advancements.

Identified Challenges in FS-PCS

The authors highlight two prevailing issues in FS-PCS workflows. Foreground leakage results from non-uniform point sampling that biases models towards exploiting point density differences rather than learning from semantic cues. Sparse point distribution arises from a limited number of sampled points (2,048), which restricts semantic richness and fidelity to real-world settings. Both issues undermine the accuracy and generalizability of FS-PCS evaluations.

Methodological Innovations

The paper proposes a standardized FS-PCS setting to rectify these issues, utilizing a uniform point sampling strategy to eliminate foreground leakage and increasing the point count to 20,480 to provide richer semantic information. Within this revised framework, the proposed COSeg model diverges from traditional feature optimization strategies by focusing on correlation optimization.

COSeg introduces Class-specific Multi-prototypical Correlation (CMC), which explicitly models relationships between query points and category prototypes rather than merely refining feature representations. Furthermore, COSeg leverages Hyper Correlation Augmentation (HCA), a module designed to enhance CMC by modeling point-to-point and foreground-background relations, thereby improving contextual dependencies in few-shot tasks.

Base Prototypes Calibration

COSeg addresses the base susceptibility problem—a bias towards base classes inherent in meta-learning—by learning non-parametric prototypes for base classes that evolve in tandem with the training phase. The Base Prototypes Calibration (BPC) module utilizes these prototypes to adjust background correlations, thereby mitigating their potential interference and enhancing the segmentation of novel classes. This approach is demonstrated effective via calibration during both training and evaluation phases.

Experimental Validation

Empirical results affirm COSeg's superior performance on the S3DIS and ScanNet datasets over prior methods. Quantitatively, COSeg achieves marked improvements in mean Intersection over Union (mIoU) scores across several few-shot learning settings, establishing it as the new state-of-the-art. The extensive ablation studies underscore the advantages of correlation optimization over feature optimization and validate the significance of the HCA and BPC modules in bolstering the model's generalization capacity.

Implications and Future Work

The introduction of a rigorous FS-PCS setting alongside an innovative model framework plays a pivotal role in steering the field towards more robust and accurate methods for few-shot learning in 3D point cloud segmentation. These advancements carry implications for improving real-world applications where data annotation is resource-intensive. Future research may expand upon the scalability of correlation optimization techniques to more complex and diverse FS-PCS tasks and explore the integration of other advanced neural architectures to further enhance performance. Additionally, aligning FS-PCS methodologies with evolving 3D sensing technologies could broaden their applicability in practical environments.

In conclusion, the proposed reevaluation and methodological progress provide a substantial contribution to the field, challenging existing benchmarks and opening avenues for future developments in few-shot 3D semantic segmentation. The insights offered by COSeg represent an important stride towards understanding and overcoming critical challenges in 3D point cloud processing.

X Twitter Logo Streamline Icon: https://streamlinehq.com
Youtube Logo Streamline Icon: https://streamlinehq.com