ZeroPS: High-quality Cross-modal Knowledge Transfer for Zero-Shot 3D Part Segmentation (2311.14262v3)

Published 24 Nov 2023 in cs.CV

Abstract: Zero-shot 3D part segmentation is a challenging and fundamental task. In this work, we propose a novel pipeline, ZeroPS, which achieves high-quality knowledge transfer from 2D pretrained foundation models (FMs), SAM and GLIP, to 3D object point clouds. We aim to explore the natural relationship between multi-view correspondence and the FMs' prompt mechanism and build bridges on it. In ZeroPS, the relationship manifests as follows: 1) lifting 2D to 3D by leveraging co-viewed regions and SAM's prompt mechanism, 2) relating 1D classes to 3D parts by leveraging 2D-3D view projection and GLIP's prompt mechanism, and 3) enhancing prediction performance by leveraging multi-view observations. Extensive evaluations on the PartNetE and AKBSeg benchmarks demonstrate that ZeroPS significantly outperforms the SOTA method across zero-shot unlabeled and instance segmentation tasks. ZeroPS does not require additional training or fine-tuning for the FMs. ZeroPS applies to both simulated and real-world data. It is hardly affected by domain shift. The project page is available at https://luis2088.github.io/ZeroPS_page.

PDF HTML Abstract

"ZeroPS: High-quality Cross-modal Knowledge Transfer for Zero-Shot 3D Part Segmentation" presents a novel approach to utilize 2D pretrained foundational models for the advancement of zero-shot 3D part segmentation. The introduction of ZeroPS is predicated on leveraging the inherent relationship between multi-view correspondences of 2D images and the prompting mechanisms within foundational models to enable effective knowledge transfer to 3D point clouds.

Methodology Breakdown:

Self-Extension Component: This component facilitates the extension of 2D image groups into 3D space. By starting from a single viewpoint, it generates spatially coherent global-level 3D groups that effectively correspond to the 2D observations. This step ensures an accurate and holistic representation of 3D structures by leveraging the multi-view consistency.
Multi-Modal Labeling Component:
- Two-Dimensional Checking Mechanism: This mechanism employs two-dimensional voting to determine which 2D predicted bounding boxes best correspond to the various parts of the 3D structure. Votes are aggregated to ensure that the best matches get selected.
- Class Non-Highest Vote Penalty Function: This function aims to refine the Vote Matrix by penalizing votes that do not correspond to the highest confidence predictions, thereby improving the overall accuracy of part segmentation.
- A final merging algorithm is used to consolidate part-level 3D groups, creating a seamless integration of the 2D and 3D information.

Evaluation and Results:

Extensive experimentation was conducted on three distinct zero-shot segmentation tasks using the PartnetE datasets. The results demonstrated substantial performance gains, showing improvements of 19.6%, 5.2%, and 4.9% over existing state-of-the-art methods.

Highlights:

Zero Training/Fine-tuning Required: A significant advantage of ZeroPS is that it operates entirely without the need for training, fine-tuning, or any learnable parameters, making it exceptionally efficient and easy to deploy.
Robustness to Domain Shift: ZeroPS demonstrates high resilience to domain shifts, maintaining its performance across different data variations and scenarios.
Code Availability: With a commitment to open science, the authors have indicated that the code for ZeroPS will be released, facilitating further research and replication efforts.

In summary, ZeroPS stands out as a pioneering method for zero-shot 3D part segmentation, leveraging the strengths of pretrained 2D models in a cross-modal fashion to achieve unparalleled accuracy and robustness in 3D environments.

PDF Markdown Bookmark Chat (Pro)

References (58)

Authors (4)

Yuheng Xue (1 paper)
Nenglun Chen (17 papers)
Jun Liu (606 papers)
Wenyun Sun (1 paper)

Citations (4)

View on Semantic Scholar

ZeroPS: High-quality Cross-modal Knowledge Transfer for Zero-Shot 3D Part Segmentation (2311.14262v3)

Related Papers