iSeg: Interactive 3D Segmentation via Interactive Attention (2404.03219v2)
Abstract: We present iSeg, a new interactive technique for segmenting 3D shapes. Previous works have focused mainly on leveraging pre-trained 2D foundation models for 3D segmentation based on text. However, text may be insufficient for accurately describing fine-grained spatial segmentations. Moreover, achieving a consistent 3D segmentation using a 2D model is highly challenging, since occluded areas of the same semantic region may not be visible together from any 2D view. Thus, we design a segmentation method conditioned on fine user clicks, which operates entirely in 3D. Our system accepts user clicks directly on the shape's surface, indicating the inclusion or exclusion of regions from the desired shape partition. To accommodate various click settings, we propose a novel interactive attention module capable of processing different numbers and types of clicks, enabling the training of a single unified interactive segmentation model. We apply iSeg to a myriad of shapes from different domains, demonstrating its versatility and faithfulness to the user's specifications. Our project page is at https://threedle.github.io/iSeg/.
- Zero-Shot 3D Shape Correspondence. SIGGRAPH Asia 2023 Conference Papers (2023).
- SATR: Zero-Shot Semantic Segmentation of 3D Shapes. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV).
- Joint 2D-3D-Semantic Data for Indoor Scene Understanding. ArXiv e-prints (Feb. 2017). arXiv:1702.01105 [cs.CV]
- Yuri Y Boykov and M-P Jolly. 2001. Interactive graph cuts for optimal boundary & region segmentation of objects in ND images. In Proceedings eighth IEEE international conference on computer vision. ICCV 2001, Vol. 1. IEEE, 105–112.
- Segment Anything in 3D with NeRFs. arXiv:2304.12308 [cs.CV]
- Shapenet: An information-rich 3d model repository. arXiv preprint arXiv:1512.03012 (2015).
- Interactive Segment Anything NeRF with Feature Imitation. arXiv:2305.16233 [cs.CV]
- Bridging the Domain Gap: Self-Supervised 3D Scene Understanding with Foundation Models. arXiv:2305.08776 [cs.CV]
- Bae-net: Branched autoencoder for shape co-segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 8490–8499.
- Curve-skeleton properties, applications, and algorithms. IEEE Transactions on visualization and computer graphics 13, 3 (2007), 530.
- 3D Paintbrush: Local Stylization of 3D Shapes with Cascaded Score Distillation. arXiv preprint arXiv:2311.09571 (2023).
- 3D Highlighter: Localizing Regions on 3D Shapes via Text Descriptions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 20930–20939.
- Cvxnet: Learnable convex decomposition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 31–44.
- Tamal K Dey and Wulue Zhao. 2004. Approximating the medial axis from the Voronoi diagram with a convergence guarantee. Algorithmica 38, 1 (2004), 179–200.
- Interactive Segmentation of Radiance Fields. arXiv:2212.13545 [cs.CV]
- Interactive Segmentation of Radiance Fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 4201–4211.
- Huy Ha and Shuran Song. 2022. Semantic Abstraction: Open-World 3D Scene Understanding from 2D Vision-Language Models. In Proceedings of the 2022 Conference on Robot Learning.
- MeshCNN: A Network with an Edge. ACM Transactions on Graphics (TOG) 38, 4 (2019), 90:1–90:12.
- Donald D Hoffman and Whitman A Richards. 1984. Parts of recognition. Cognition 18, 1-3 (1984), 65–96.
- 3D Concept Grounding on Neural Fields. In Annual Conference on Neural Information Processing Systems.
- Subdivision-based Mesh Convolution Networks. ACM Trans. Graph. 41, 3 (2022), 25:1–25:16. https://doi.org/10.1145/3506694
- LERF: Language Embedded Radiance Fields. arXiv:2303.09553 [cs.CV]
- Segment Anything. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 4015–4026.
- Decomposing NeRF for Editing via Feature Field Distillation. arXiv (2022).
- Interactive Object Segmentation in 3D Point Clouds. arXiv:2204.07183 [cs.CV]
- Virtual Multi-view Fusion for 3D Semantic Segmentation. arXiv e-prints, Article arXiv:2007.13138 (July 2020), arXiv:2007.13138 pages. https://doi.org/10.48550/arXiv.2007.13138 arXiv:2007.13138 [cs.CV]
- Alon Lahav and Ayellet Tal. 2020. Meshwalker: Deep mesh understanding by random walks. ACM Transactions on Graphics (TOG) 39, 6 (2020), 1–13.
- A closed-form solution to natural image matting. IEEE transactions on pattern analysis and machine intelligence 30, 2 (2007), 228–242.
- Jyh-Ming Lien and Nancy M Amato. 2007. Approximate convex decomposition of polyhedra. In Proceedings of the 2007 ACM symposium on Solid and physical modeling. 121–131.
- Primal-dual mesh convolutional neural networks. Advances in Neural Information Processing Systems 33 (2020), 952–963.
- NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis. arXiv e-prints, Article arXiv:2003.08934 (March 2020), arXiv:2003.08934 pages. https://doi.org/10.48550/arXiv.2003.08934 arXiv:2003.08934 [cs.CV]
- V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation. arXiv:1606.04797 [cs.CV]
- Automatic differentiation in PyTorch. In NIPS-W.
- PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation. arXiv:1612.00593 [cs.CV]
- James Matthew Rehg. 2022. Toys4K 3D Object Dataset. https://github.com/rehg-lab/lowshot-shapebias/tree/main/toys4k.
- GrabCut -Interactive Foreground Extraction using Iterated Graph Cuts. ACM Transactions on Graphics (SIGGRAPH) (August 2004). https://www.microsoft.com/en-us/research/publication/grabcut-interactive-foreground-extraction-using-iterated-graph-cuts/
- Ariel Shamir. 2008. A survey on mesh segmentation techniques. Computer graphics forum 27, 6 (2008), 1539–1556.
- Diffusionnet: Discretization agnostic learning on surfaces. ACM Transactions on Graphics (TOG) 41, 3 (2022), 1–16.
- Graph Cut Based Multiple View Segmentation for 3D Reconstruction. In Third International Symposium on 3D Data Processing, Visualization, and Transmission (3DPVT’06). 1085–1092. https://doi.org/10.1109/3DPVT.2006.70
- Canonical Capsules: Self-Supervised Capsules in Canonical Pose. In Advances in Neural Information Processing Systems, M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan (Eds.), Vol. 34. Curran Associates, Inc., 24993–25005. https://proceedings.neurips.cc/paper/2021/file/d1ee59e20ad01cedc15f5118a7626099-Paper.pdf
- TurboSquid. 2021. TurboSquid 3D Model Repository. https://www.turbosquid.com/.
- Prior knowledge for part correspondence. Computer Graphics Forum 30, 2 (2011), 553–562. https://doi.org/10.1111/j.1467-8659.2011.01893.x
- Attention Is All You Need. Advances in neural information processing systems 30 (2017).
- 3d shapenets: A deep representation for volumetric shapes. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1912–1920.
- SAM3D: Segment Anything in 3D Scenes. arXiv e-prints, Article arXiv:2306.03908 (June 2023), arXiv:2306.03908 pages. https://doi.org/10.48550/arXiv.2306.03908 arXiv:2306.03908 [cs.CV]
- FeatureNeRF: Learning Generalizable NeRFs by Distilling Foundation Models. arXiv:2303.12786 [cs.CV]
- Syncspeccnn: Synchronized spectral cnn for 3d shape segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2282–2290.
- AGILE3D: Attention Guided Interactive Multi-object 3D Segmentation. arXiv:2306.00977 [cs.CV]
- SAM3D: Zero-Shot 3D Object Detection via Segment Anything Model. arXiv e-prints, Article arXiv:2306.02245 (June 2023), arXiv:2306.02245 pages. https://doi.org/10.48550/arXiv.2306.02245 arXiv:2306.02245 [cs.CV]
- Learning 3D Representations from 2D Pre-trained Models via Image-to-Point Masked Autoencoders. arXiv:2212.06785 [cs.CV]
- Skeleton-Intrinsic Symmetrization of Shapes. Computer Graphics Forum 34, 2 (2015), 275–286.
- Qingnan Zhou and Alec Jacobson. 2016. Thingi10K: A Dataset of 10,000 3D-Printing Models. arXiv preprint arXiv:1605.04797 (2016).
- AdaCoSeg: Adaptive Shape Co-Segmentation with Group Consistency Loss. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8543–8552.
- Itai Lang (17 papers)
- Fei Xu (117 papers)
- Dale Decatur (4 papers)
- Sudarshan Babu (4 papers)
- Rana Hanocka (32 papers)