Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

PartSLIP: Low-Shot Part Segmentation for 3D Point Clouds via Pretrained Image-Language Models (2212.01558v2)

Published 3 Dec 2022 in cs.CV and cs.RO

Abstract: Generalizable 3D part segmentation is important but challenging in vision and robotics. Training deep models via conventional supervised methods requires large-scale 3D datasets with fine-grained part annotations, which are costly to collect. This paper explores an alternative way for low-shot part segmentation of 3D point clouds by leveraging a pretrained image-LLM, GLIP, which achieves superior performance on open-vocabulary 2D detection. We transfer the rich knowledge from 2D to 3D through GLIP-based part detection on point cloud rendering and a novel 2D-to-3D label lifting algorithm. We also utilize multi-view 3D priors and few-shot prompt tuning to boost performance significantly. Extensive evaluation on PartNet and PartNet-Mobility datasets shows that our method enables excellent zero-shot 3D part segmentation. Our few-shot version not only outperforms existing few-shot approaches by a large margin but also achieves highly competitive results compared to the fully supervised counterpart. Furthermore, we demonstrate that our method can be directly applied to iPhone-scanned point clouds without significant domain gaps.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (82)
  1. Flamingo: a visual language model for few-shot learning. arXiv preprint arXiv:2204.14198, 2022.
  2. A 3d shape segmentation approach for robot grasping by parts. Robotics and Autonomous Systems, 60(3):358–366, 2012.
  3. Joint supervised and self-supervised learning for 3d real world challenges. In 2020 25th International Conference on Pattern Recognition (ICPR), pages 6718–6725. IEEE, 2021.
  4. Towards part-based understanding of rgb-d scans. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7484–7494, 2021.
  5. Text and image guided 3d avatar generation and manipulation. arXiv preprint arXiv:2202.06079, 2022.
  6. Shapenet: An information-rich 3d model repository. arXiv preprint arXiv:1512.03012, 2015.
  7. Bae-net: Branched autoencoder for shape co-segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 8490–8499, 2019.
  8. Box2mask: Weakly supervised 3d semantic instance segmentation using bounding boxes. In European Conference on Computer Vision, pages 681–699. Springer, 2022.
  9. Icm-3d: Instantiated category modeling for 3d instance segmentation. IEEE Robotics and Automation Letters, 7(1):57–64, 2021.
  10. Voxel-informed language grounding. arXiv preprint arXiv:2205.09710, 2022.
  11. Blenderproc. arXiv preprint arXiv:1911.01911, 2019.
  12. Label-efficient learning on point clouds using approximate convex decompositions. In European Conference on Computer Vision, pages 473–491. Springer, 2020.
  13. Compositionally generalizable 3d structure prediction. arXiv preprint arXiv:2012.02493, 2020.
  14. Unsupervised multi-task feature learning on point clouds. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 8160–8171, 2019.
  15. Learning and memorizing representative prototypes for 3d point cloud semantic and instance segmentation. In European Conference on Computer Vision, pages 564–580. Springer, 2020.
  16. Avatarclip: Zero-shot text-driven generation and animation of 3d avatars. arXiv preprint arXiv:2205.08535, 2022.
  17. 3d-sis: 3d semantic instance segmentation of rgb-d scans. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4421–4430, 2019.
  18. Exploring data-efficient 3d scene understanding with contrastive scene contexts. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15587–15597, 2021.
  19. Zero-shot text-guided object generation with dream fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 867–876, 2022.
  20. Putting nerf on a diet: Semantically consistent few-shot view synthesis. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5885–5894, 2021.
  21. Nikolay Jetchev. Clipmatrix: Text-controlled creation of 3d textured meshes. arXiv preprint arXiv:2109.12922, 2021.
  22. Scaling up visual and vision-language representation learning with noisy text supervision. In International Conference on Machine Learning, pages 4904–4916. PMLR, 2021.
  23. Pointgroup: Dual-set point grouping for 3d instance segmentation. In Proceedings of the IEEE/CVF conference on computer vision and Pattern recognition, pages 4867–4876, 2020.
  24. Text to mesh without 3d supervision using limit subdivision. arXiv preprint arXiv:2203.13333, 2022.
  25. Semantic implicit neural scene representations with semi-supervised training. In 2020 International Conference on 3D Vision (3DV), pages 423–433. IEEE, 2020.
  26. Partglot: Learning shape part segmentation from language reference games. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16505–16514, 2022.
  27. Cut pursuit: Fast algorithms to learn piecewise constant functions on general weighted graphs. SIAM Journal on Imaging Sciences, 10(4):1724–1766, 2017.
  28. Large-scale point cloud semantic segmentation with superpoint graphs. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4558–4567, 2018.
  29. Grounded language-image pre-training. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10965–10975, 2022.
  30. Self-prediction for joint instance and semantic segmentation of point clouds. In European Conference on Computer Vision, pages 187–204. Springer, 2020.
  31. Frame mining: a free lunch for learning robotic manipulation from 3d point clouds. arXiv preprint arXiv:2210.07442, 2022.
  32. Less: Label-efficient semantic segmentation for lidar point clouds. In European Conference on Computer Vision, pages 70–89. Springer, 2022.
  33. Autogpart: Intermediate supervision search for generalizable 3d part segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11624–11634, 2022.
  34. Relation-shape convolutional neural network for point cloud analysis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8895–8904, 2019.
  35. Box2seg: Learning semantics of 3d point clouds with box-level supervision. arXiv preprint arXiv:2201.02963, 2022.
  36. One thing one click: A self-training approach for weakly supervised 3d semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1726–1736, 2021.
  37. Learning to group: A bottom-up framework for 3d part discovery in unseen categories. arXiv preprint arXiv:2002.06478, 2020.
  38. Exploring the limits of weakly supervised pretraining. In Proceedings of the European conference on computer vision (ECCV), pages 181–196, 2018.
  39. Text2mesh: Text-driven neural stylization for meshes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13492–13502, 2022.
  40. Structurenet: Hierarchical graph networks for 3d shape generation. arXiv preprint arXiv:1908.00575, 2019.
  41. Partnet: A large-scale benchmark for fine-grained and hierarchical part-level 3d object understanding. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 909–918, 2019.
  42. 3d compositional zero-shot learning with decompositional consensus. In European Conference on Computer Vision, pages 713–730. Springer, 2022.
  43. Scan2part: Fine-grained and hierarchical part-level understanding of real-world 3d scans. arXiv preprint arXiv:2206.02366, 2022.
  44. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. Advances in neural information processing systems, 30, 2017.
  45. Pointnext: Revisiting pointnet++ with improved training and scaling strategies. arXiv:2206.04670, 2022.
  46. Learning transferable visual models from natural language supervision. In International Conference on Machine Learning, pages 8748–8763. PMLR, 2021.
  47. Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125, 2022.
  48. Denseclip: Language-guided dense prediction with context-aware prompting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18082–18091, 2022.
  49. Accelerating 3d deep learning with pytorch3d. arXiv preprint arXiv:2007.08501, 2020.
  50. Language-grounded indoor 3d semantic segmentation in the wild. arXiv preprint arXiv:2204.07761, 2022.
  51. Photorealistic text-to-image diffusion models with deep language understanding. arXiv preprint arXiv:2205.11487, 2022.
  52. Clip-forge: Towards zero-shot text-to-shape generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18603–18613, 2022.
  53. Prifit: Learning to fit primitives improves few shot point cloud segmentation. In Computer Graphics Forum, volume 41, pages 39–50. Wiley Online Library, 2022.
  54. Mvdecor: Multi-view dense correspondence learning for fine-grained 3d segmentation. arXiv preprint arXiv:2208.08580, 2022.
  55. Semi-supervised 3d shape segmentation with multilevel consistency and part substitution. arXiv preprint arXiv:2204.08824, 2022.
  56. Mortonnet: Self-supervised learning of local features in 3d point clouds. arXiv preprint arXiv:1904.00230, 2019.
  57. Kpconv: Flexible and deformable convolution for point clouds. In Proceedings of the IEEE/CVF international conference on computer vision, pages 6411–6420, 2019.
  58. Language grounding with 3d objects. In Conference on Robot Learning, pages 1691–1701. PMLR, 2022.
  59. Softgroup for 3d instance segmentation on point clouds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2708–2717, 2022.
  60. Clip-nerf: Text-and-image driven manipulation of neural radiance fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3835–3844, 2022.
  61. Few-shot learning of part-specific probability space for 3d shape segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4504–4513, 2020.
  62. Ikea-manual: Seeing shape assembly step by step. In Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track.
  63. Sgpn: Similarity group proposal network for 3d point cloud instance segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2569–2578, 2018.
  64. Associatively segmenting instances and semantics in point clouds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4096–4105, 2019.
  65. Learning fine-grained segmentation of 3d shapes without part labels. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10276–10285, 2021.
  66. Dynamic graph cnn for learning on point clouds. Acm Transactions On Graphics (tog), 38(5):1–12, 2019.
  67. Approximate convex decomposition for 3d meshes with collision-aware concavity and tree search. arXiv preprint arXiv:2205.02961, 2022.
  68. Sapien: A simulated part-based interactive environment. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11097–11107, 2020.
  69. Weakly supervised semantic point cloud segmentation: Towards 10x fewer labels. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 13706–13715, 2020.
  70. Unsupervised kinematic motion detection for part-segmented 3d shape collections. In ACM SIGGRAPH 2022 Conference Proceedings, pages 1–9, 2022.
  71. Learning object bounding boxes for 3d instance segmentation on point clouds. Advances in neural information processing systems, 32, 2019.
  72. An mil-derived transformer for weakly supervised point cloud segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11830–11839, 2022.
  73. A scalable active framework for region annotation in 3d shape collections. ACM Transactions on Graphics (ToG), 35(6):1–12, 2016.
  74. Gspn: Generative shape proposal network for 3d instance segmentation in point cloud. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3947–3956, 2019.
  75. Partnet: A recursive part decomposition network for fine-grained and hierarchical shape segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9491–9500, 2019.
  76. Point cloud instance segmentation using probabilistic embeddings. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8883–8892, 2021.
  77. Glipv2: Unifying localization and vision-language understanding. arXiv preprint arXiv:2206.05836, 2022.
  78. Pointclip: Point cloud understanding by clip. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8552–8562, 2022.
  79. Weakly supervised semantic segmentation for large-scale point cloud. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 3421–3429, 2021.
  80. Perturbed self-distillation: Weakly supervised large-scale point cloud semantic segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 15520–15528, 2021.
  81. Few-shot 3d point cloud semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8873–8882, 2021.
  82. In-place scene labelling and understanding with implicit scene representation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 15838–15847, 2021.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Minghua Liu (22 papers)
  2. Yinhao Zhu (14 papers)
  3. Hong Cai (51 papers)
  4. Shizhong Han (26 papers)
  5. Zhan Ling (16 papers)
  6. Fatih Porikli (141 papers)
  7. Hao Su (217 papers)
Citations (54)