Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

PartSTAD: 2D-to-3D Part Segmentation Task Adaptation (2401.05906v3)

Published 11 Jan 2024 in cs.CV

Abstract: We introduce PartSTAD, a method designed for the task adaptation of 2D-to-3D segmentation lifting. Recent studies have highlighted the advantages of utilizing 2D segmentation models to achieve high-quality 3D segmentation through few-shot adaptation. However, previous approaches have focused on adapting 2D segmentation models for domain shift to rendered images and synthetic text descriptions, rather than optimizing the model specifically for 3D segmentation. Our proposed task adaptation method finetunes a 2D bounding box prediction model with an objective function for 3D segmentation. We introduce weights for 2D bounding boxes for adaptive merging and learn the weights using a small additional neural network. Additionally, we incorporate SAM, a foreground segmentation model on a bounding box, to improve the boundaries of 2D segments and consequently those of 3D segmentation. Our experiments on the PartNet-Mobility dataset show significant improvements with our task adaptation approach, achieving a 7.0%p increase in mIoU and a 5.2%p improvement in mAP@50 for semantic and instance segmentation compared to the SotA few-shot 3D segmentation model.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (73)
  1. Zero-shot 3D shape correspondence. In SIGGRAPH Asia, 2023a.
  2. SATR: Zero-shot semantic segmentation of 3D shapes. In ICCV, 2023b.
  3. Joint 2D-3D-semantic data for indoor scene understanding. arXiv preprint arXiv:1702.01105, 2017.
  4. Label-efficient semantic segmentation with diffusion models. In ICLR, 2022.
  5. SemanticKITTI: A dataset for semantic scene understanding of LiDAR sequences. In ICCV, 2019.
  6. Segment anything in 3D with nerfs. In NeurIPS, 2023.
  7. Matterport3D: Learning from rgb-d data in indoor environments. In 3DV, 2017.
  8. ShapeNet: An information-rich 3D model repository. arXiv preprint arXiv:1512.03012, 2015.
  9. CLIP2Scene: Towards label-efficient 3D scene understanding by clip. In CVPR, 2023.
  10. Box2Mask: Weakly supervised 3D semantic instance segmentation using bounding boxes. In ECCV, 2022.
  11. ICM-3D: Instantiated category modeling for 3d instance segmentation. IEEE Robotics and Automation Letters, 2021.
  12. ScanNet: Richly-annotated 3D reconstructions of indoor scenes. In CVPR, 2017.
  13. 3D Highlighter: Localizing regions on 3D shapes via text descriptions. In CVPR, 2023.
  14. SEMANTIC3D.NET: A new large-scale point cloud classification benchmark. In ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, 2017.
  15. 3D-SIS: 3D semantic instance segmentation of rgb-d scans. In CVPR, 2019.
  16. LoRA: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685, 2021.
  17. LSGCN: Long short-term traffic prediction with graph convolutional networks. In IJCAI, 2020.
  18. OpenIns3D: Snap and lookup for 3D open-vocabulary instance segmentation. arXiv preprint, 2023.
  19. PointGroup: Dual-set point grouping for 3D instance segmentation. In CVPR, 2020.
  20. Segment anything. arXiv preprint arXiv:2304.02643, 2023.
  21. PartGlot: Learning shape part segmentation from language reference games. In CVPR, 2022.
  22. LISA: Reasoning segmentation via large language model. arXiv preprint arXiv:2308.00692, 2023.
  23. CPFN: Cascaded primitive fitting networks for high-resolution point clouds. In ICCV, 2021.
  24. Supervised fitting of geometric primitives to 3D point clouds. In CVPR, 2019a.
  25. Grounded language-image pre-training. In CVPR, 2022a.
  26. Hybridcr: Weakly-supervised 3d point cloud semantic segmentation via hybrid contrastive regularization. In CVPR, 2022b.
  27. PointCNN: Convolution on x-transformed points. In NeurIPS, 2018.
  28. TGNet: Geometric graph cnn on 3D point cloud segmentation. IEEE Transactions on Geoscience and Remote Sensing, 2019b.
  29. 3D instance embedding learning with a structure-aware loss function for point cloud segmentation. IEEE Robotics and Automation Letters, 2020.
  30. Microsoft COCO: Common objects in context. In ECCV, 2014.
  31. Self-prediction for joint instance and semantic segmentation of point clouds. In ECCV, 2020.
  32. PartSLIP: Low-shot part segmentation for 3d point clouds via pretrained image-language models. In CVPR, 2023a.
  33. Grounding DINO: Marrying dino with grounded pre-training for open-set object detection. arXiv preprint arXiv:2303.05499, 2023b.
  34. DensePoint: Learning densely contextual representation for efficient point cloud processing. In ICCV, 2019.
  35. One thing one click: A self-training approach for weakly supervised 3D semantic segmentation. In CVPR, 2021.
  36. Rethinking network design and local geometry in point cloud: A simple residual mlp framework. In ICLR, 2022.
  37. PartNet: A large-scale benchmark for fine-grained and hierarchical part-level 3D object understanding. In CVPR, 2019.
  38. Panopticfusion: Online volumetric semantic mapping at the level of stuff and things. In IROS, 2019.
  39. OpenScene: 3D scene understanding with open vocabularies. In CVPR, 2023.
  40. JSIS3D: Joint semantic-instance segmentation of 3D point clouds with multi-task pointwise networks and multi-value conditional random fields. In CVPR, 2019.
  41. Intermediate-Task Transfer Learning with Pretrained Language Models: When and why does it work? In ACL, 2020.
  42. PointNet: Deep learning on point sets for 3d classification and segmentation. In CVPR, 2017a.
  43. PointNet++: Deep hierarchical feature learning on point sets in a metric space. In NeurIPS, 2017b.
  44. PointNext: Revisiting pointnet++ with improved training and scaling strategies. In NeurIPS, 2022.
  45. Learning transferable visual models from natural language supervision. In ICML, 2021.
  46. Mask3D: Mask transformer for 3D semantic instance segmentation. In ICRA, 2023.
  47. OpenMask3D: Open-vocabulary 3D instance segmentation. In NeurIPS, 2023.
  48. Emergent correspondence from image diffusion. In NeurIPS, 2023.
  49. Kpconv: Flexible and deformable convolution for point clouds. In ICCV, 2019.
  50. Softgroup for 3d instance segmentation on 3D point clouds. In CVPR, 2022.
  51. SGPN: Similarity group proposal network for 3D point cloud instance segmentation. In CVPR, 2018.
  52. Associatively segmenting instances and semantics in point clouds. In CVPR, 2019a.
  53. Dynamic graph cnn for learning on point clouds. ACM TOG, 2019b.
  54. Point transformer v2: Grouped vector attention and partition-based pooling. In NeurIPS, 2022.
  55. SAPIEN: A simulated part-based interactive environment. In CVPR, 2020.
  56. Walk in the Cloud: Learning curves for point clouds shape analysis. In ICCV, 2021.
  57. Open-vocabulary panoptic segmentation with text-to-image diffusion models. In CVPR, 2023.
  58. PAConv: Position adaptive convolution with dynamic kernel assembling on point clouds. In CVPR, 2021.
  59. PointANSL: Robust point clouds processing using nonlocal neural networks with adaptive sampling. In CVPR, 2020.
  60. Learning object bounding boxes for 3D instance segmentation on point clouds. In NeurIPS, 2019.
  61. Diffusion model as representation learner. In ICCV, 2023.
  62. SAM3D: Segment anything in 3D scenes. arXiv preprint arXiv:2306.03908, 2023.
  63. Learning to find good correspondences. In CVPR, 2018a.
  64. Deep part induction from articulated object pairs. ACM TOG, 2018b.
  65. GSPN: Generative shape proposal network for 3D instance segmentation in point cloud. In CVPR, 2019.
  66. Point-BERT: Pre-training 3D point cloud transformers with masked point modeling. In CVPR, 2022.
  67. Taskonomy: Disentangling task transfer learning. In CVPR, 2018.
  68. Point cloud instance segmentation using probabilistic embeddings. In CVPR, 2021.
  69. CLIP-FO3D: Learning free open-world 3D scene representations from 2D dense clip. arXiv preprint arXiv:2303.04748, 2023a.
  70. Parameter is not all you need: Starting from non-parametric networks for 3D point cloud analysis. In CVPR, 2023b.
  71. DatasetGAN: Efficient labeled data factory with minimal human effort. In CVPR, 2021.
  72. Point transformer. In ICCV, 2021a.
  73. Few-shot 3D point cloud semantic segmentation. In CVPR, 2021b.
Citations (5)

Summary

We haven't generated a summary for this paper yet.