Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

OpenIns3D: Snap and Lookup for 3D Open-vocabulary Instance Segmentation (2309.00616v5)

Published 1 Sep 2023 in cs.CV

Abstract: In this work, we introduce OpenIns3D, a new 3D-input-only framework for 3D open-vocabulary scene understanding. The OpenIns3D framework employs a "Mask-Snap-Lookup" scheme. The "Mask" module learns class-agnostic mask proposals in 3D point clouds, the "Snap" module generates synthetic scene-level images at multiple scales and leverages 2D vision-LLMs to extract interesting objects, and the "Lookup" module searches through the outcomes of "Snap" to assign category names to the proposed masks. This approach, yet simple, achieves state-of-the-art performance across a wide range of 3D open-vocabulary tasks, including recognition, object detection, and instance segmentation, on both indoor and outdoor datasets. Moreover, OpenIns3D facilitates effortless switching between different 2D detectors without requiring retraining. When integrated with powerful 2D open-world models, it achieves excellent results in scene understanding tasks. Furthermore, when combined with LLM-powered 2D models, OpenIns3D exhibits an impressive capability to comprehend and process highly complex text queries that demand intricate reasoning and real-world knowledge. Project page: https://zheninghuang.github.io/OpenIns3D/

Definition Search Book Streamline Icon: https://streamlinehq.com
References (48)
  1. 3d semantic parsing of large-scale indoor spaces. In CVPR, 2016.
  2. Look around and refer: 2d synthetic semantics knowledge distillation for 3d visual grounding. In Advances in Neural Information Processing Systems, 2022.
  3. Language models are few-shot learners. In NeurIPS, 2020.
  4. End-to-end object detection with transformers. In ECCV, 2020.
  5. Matterport3d: Learning from rgb-d data in indoor environments. In 3DV, 2017.
  6. Stpls3d: A large-scale synthetic and real aerial photogrammetry 3d point cloud dataset. In BMVA, 2022.
  7. Clip2scene: Towards label-efficient 3d scene understanding by clip. In CVPR, 2023a.
  8. Open-vocabulary panoptic segmentation with embedding modulation. In ICCV, 2023b.
  9. Masked-attention mask transformer for universal image segmentation. In CVPR, 2022.
  10. Scannet: Richly-annotated 3d reconstructions of indoor scenes. In CVPR, 2017.
  11. Pla: Language-driven open-vocabulary 3d scene understanding. In CVPR, 2023a.
  12. Lowis3d: Language-driven open-world instance-level 3d scene nderstanding, 2023b.
  13. Open-vocabulary panoptic segmentation with maskclip. In arXiv, 2022.
  14. Computer Graphics and Mathematics. Springer Verlag, 1992.
  15. SynthCity: A large-scale synthetic point cloud. In arXiv, 2019.
  16. SEMANTIC3D.NET: A new large-scale point cloud classification benchmark. In ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, 2017.
  17. Sensaturban: Learning semantics from urban-scale photogrammetric point clouds. In International booktitle of Computer Vision, 2022.
  18. Clip2point: Transfer clip to point cloud classification with image-depth pre-training. In ICCV, 2023.
  19. Mask scoring R-CNN. In arXiv, 2019.
  20. Segment anything. In ICCV, 2023.
  21. Virtual multi-view fusion for 3d semantic segmentation. In ECCV, 2020.
  22. Lisa: Reasoning segmentation via large language model. In arXiv, 2023.
  23. Mseg: A composite dataset for multi-domain semantic segmentation. In TPAMI, 2021.
  24. Campus3d: A photogrammetry point cloud benchmark for hierarchical understanding of outdoor scene. In ACM MM, 2020.
  25. Grounding dino: Marrying dino with grounded pre-training for open-set object detection. In arXiv, 2023.
  26. Open-vocabulary point-cloud object detection without 3d annotation. In CVPR, 2023.
  27. Generative zero-shot learning for semantic segmentation of 3D point cloud. In 3DV, 2021.
  28. An end-to-end transformer model for 3d object detection. In ICCV, 2021.
  29. PartNet: A large-scale benchmark for fine-grained and hierarchical part-level 3D object understanding. In CVPR, 2019.
  30. Openscene: 3d scene understanding with open vocabularies. In CVPR, 2023.
  31. Learning transferable visual models from natural language supervision. In ICML, 2021.
  32. Paris-lille-3d: A large and high-quality ground-truth urban point cloud dataset for automatic segmentation and classification. In The International booktitle of Robotics Research, 2018.
  33. Language-grounded indoor 3d semantic segmentation in the wild. In ECCV, 2022.
  34. Unscene3d: Unsupervised 3d instance segmentation for indoor scenes. In arXiv, 2023.
  35. Mask3D: Mask Transformer for 3D Semantic Instance Segmentation. In ICRA, 2023.
  36. Indoor segmentation and support inference from rgbd images. In ECCV, 2012.
  37. Multi-view convolutional neural networks for 3d shape recognition. In ICCV, 2015.
  38. Openmask3d: Open-vocabulary 3d instance segmentation, 2023.
  39. Toronto-3D: A large-scale mobile lidar dataset for semantic segmentation of urban roadways. In CVPRW, 2020.
  40. Point transformer v2: Grouped vector attention and partition-based pooling. In NeurIPS, 2022.
  41. Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models. In CVPR, 2023.
  42. Regionplc: Regional point-language contrastive learning for open-world 3d scene understanding, 2023a.
  43. Sam3d: Segment anything in 3d scenes. In arXiv, 2023b.
  44. Pointclip: Point cloud understanding by CLIP. In CVPR, 2022.
  45. Denseclip: Extract free dense labels from clip. In arXiv, 2021.
  46. Extract free dense labels from clip. In ECCV, 2022.
  47. Pointclip v2: Prompting clip and gpt for powerful 3d open-world learning. In ICCV, 2023.
  48. Deformable detr: Deformable transformers for end-to-end object detection. In ICLR, 2020.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Zhening Huang (6 papers)
  2. Xiaoyang Wu (27 papers)
  3. Xi Chen (1035 papers)
  4. Hengshuang Zhao (117 papers)
  5. Lei Zhu (280 papers)
  6. Joan Lasenby (32 papers)
Citations (28)
Github Logo Streamline Icon: https://streamlinehq.com

GitHub

X Twitter Logo Streamline Icon: https://streamlinehq.com