Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
175 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Zero-shot point cloud segmentation by transferring geometric primitives (2210.09923v3)

Published 18 Oct 2022 in cs.CV

Abstract: We investigate transductive zero-shot point cloud semantic segmentation, where the network is trained on seen objects and able to segment unseen objects. The 3D geometric elements are essential cues to imply a novel 3D object type. However, previous methods neglect the fine-grained relationship between the language and the 3D geometric elements. To this end, we propose a novel framework to learn the geometric primitives shared in seen and unseen categories' objects and employ a fine-grained alignment between language and the learned geometric primitives. Therefore, guided by language, the network recognizes the novel objects represented with geometric primitives. Specifically, we formulate a novel point visual representation, the similarity vector of the point's feature to the learnable prototypes, where the prototypes automatically encode geometric primitives via back-propagation. Besides, we propose a novel Unknown-aware InfoNCE Loss to fine-grained align the visual representation with language. Extensive experiments show that our method significantly outperforms other state-of-the-art methods in the harmonic mean-intersection-over-union (hIoU), with the improvement of 17.8\%, 30.4\%, 9.2\% and 7.9\% on S3DIS, ScanNet, SemanticKITTI and nuScenes datasets, respectively. Codes are available (https://github.com/runnanchen/Zero-Shot-Point-Cloud-Segmentation)

Definition Search Book Streamline Icon: https://streamlinehq.com
References (45)
  1. Joint 2d-3d-semantic data for indoor scene understanding. arXiv preprint arXiv:1702.01105 (2017).
  2. SemanticKITTI: A Dataset for Semantic Scene Understanding of LiDAR Sequences. In Proc. of the IEEE/CVF International Conf. on Computer Vision (ICCV).
  3. Zero-shot semantic segmentation. Advances in Neural Information Processing Systems 32 (2019).
  4. nuScenes: A multimodal dataset for autonomous driving. arXiv preprint arXiv:1903.11027 (2019).
  5. Unsupervised learning of intrinsic structural representation points. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 9121–9130.
  6. Runnan Chen. 2023. Studies on attention modeling for visual understanding. HKU Theses Online (HKUTO) (2023).
  7. Towards Label-free Scene Understanding by Vision Foundation Models. arXiv preprint arXiv:2306.03899 (2023).
  8. CLIP2Scene: Towards Label-efficient 3D Scene Understanding by CLIP. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7020–7030.
  9. Towards 3d scene understanding by referring synthetic models. arXiv preprint arXiv:2203.10546 (2022).
  10. 2-s3net: Attentive feature fusion with adaptive feature selection for sparse semantic segmentation network. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 12547–12556.
  11. Mitigating the hubness problem for zero-shot learning of 3d objects. arXiv preprint arXiv:1907.06371 (2019).
  12. Transductive zero-shot learning for 3d point cloud classification. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 923–933.
  13. Zero-shot learning on 3d point cloud objects and beyond. arXiv preprint arXiv:2104.04980 (2021).
  14. Zero-shot learning of 3d point cloud objects. In 2019 16th International Conference on Machine Vision Applications (MVA). IEEE, 1–6.
  15. 4d spatio-temporal convnets: Minkowski convolutional neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3075–3084.
  16. MMDetection3D Contributors. 2020. MMDetection3D: OpenMMLab next-generation platform for general 3D object detection.
  17. ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes. In Proc. Computer Vision and Pattern Recognition (CVPR), IEEE.
  18. Devise: A deep visual-semantic embedding model. Advances in neural information processing systems 26 (2013).
  19. Paraphrase generation with latent bag of words. Advances in Neural Information Processing Systems 32 (2019).
  20. Context-aware feature generation for zero-shot semantic segmentation. In Proceedings of the 28th ACM International Conference on Multimedia. 1921–1929.
  21. Uncertainty-aware learning for zero-shot semantic segmentation. Advances in Neural Information Processing Systems 33 (2020), 21713–21724.
  22. Randla-net: Efficient semantic segmentation of large-scale point clouds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11108–11117.
  23. Rethinking range view representation for lidar segmentation. arXiv preprint arXiv:2303.05367 (2023).
  24. Benchmarking 3D Perception Robustness to Common Corruptions and Sensor Failure. In International Conference on Learning Representations 2023 Workshop on Scene Representations for Autonomous Driving.
  25. Robo3d: Towards robust and reliable 3d perception against corruptions. arXiv preprint arXiv:2303.17597 (2023).
  26. Consistent structural relation learning for zero-shot segmentation. Advances in Neural Information Processing Systems 33 (2020), 10317–10327.
  27. Segment Any Point Cloud Sequences by Distilling Vision Foundation Models. arXiv preprint arXiv:2306.09347 (2023).
  28. See More and Know More: Zero-shot Point Cloud Segmentation via Multi-modal Visual Data. arXiv preprint arXiv:2307.10782 (2023).
  29. Learning unbiased zero-shot semantic segmentation networks via transductive transfer. IEEE Signal Processing Letters 27 (2020), 1640–1644.
  30. Generative Zero-Shot Learning for Semantic Segmentation of 3D Point Clouds. In 2021 International Conference on 3D Vision (3DV). IEEE, 992–1002.
  31. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013).
  32. Distributed representations of words and phrases and their compositionality. Advances in neural information processing systems 26 (2013).
  33. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). 1532–1543.
  34. Pointnet: Deep learning on point sets for 3d classification and segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 652–660.
  35. Hubs in space: Popular nearest neighbors in high-dimensional data. Journal of Machine Learning Research 11, sept (2010), 2487–2531.
  36. Ridge regression, hubness, and zero-shot learning. In Joint European conference on machine learning and knowledge discovery in databases. Springer, 135–151.
  37. Fixmatch: Simplifying semi-supervised learning with consistency and confidence. Advances in Neural Information Processing Systems 33 (2020), 596–608.
  38. Kpconv: Flexible and deformable convolution for point clouds. In Proceedings of the IEEE/CVF international conference on computer vision. 6411–6420.
  39. Hanna M Wallach. 2006. Topic modeling: beyond bag-of-words. In Proceedings of the 23rd international conference on Machine learning. 977–984.
  40. Zero-shot learning—a comprehensive evaluation of the good, the bad and the ugly. IEEE transactions on pattern analysis and machine intelligence 41, 9 (2018), 2251–2265.
  41. Rpvnet: A deep and efficient range-point-voxel fusion network for lidar point cloud segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 16024–16033.
  42. Human-centric Scene Understanding for 3D Large-scale Scenarios. arXiv preprint arXiv:2307.14392 (2023).
  43. Hui Zhang and Henghui Ding. 2021. Prototypical matching and open set rejection for zero-shot semantic segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 6974–6983.
  44. Learning a deep embedding model for zero-shot learning. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2021–2030.
  45. Cylindrical and asymmetrical 3d convolution networks for lidar segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 9939–9948.
Citations (6)

Summary

We haven't generated a summary for this paper yet.

Github Logo Streamline Icon: https://streamlinehq.com