HoloPart: Generative 3D Part Amodal Segmentation (2504.07943v1)

Published 10 Apr 2025 in cs.CV

Abstract: 3D part amodal segmentation--decomposing a 3D shape into complete, semantically meaningful parts, even when occluded--is a challenging but crucial task for 3D content creation and understanding. Existing 3D part segmentation methods only identify visible surface patches, limiting their utility. Inspired by 2D amodal segmentation, we introduce this novel task to the 3D domain and propose a practical, two-stage approach, addressing the key challenges of inferring occluded 3D geometry, maintaining global shape consistency, and handling diverse shapes with limited training data. First, we leverage existing 3D part segmentation to obtain initial, incomplete part segments. Second, we introduce HoloPart, a novel diffusion-based model, to complete these segments into full 3D parts. HoloPart utilizes a specialized architecture with local attention to capture fine-grained part geometry and global shape context attention to ensure overall shape consistency. We introduce new benchmarks based on the ABO and PartObjaverse-Tiny datasets and demonstrate that HoloPart significantly outperforms state-of-the-art shape completion methods. By incorporating HoloPart with existing segmentation techniques, we achieve promising results on 3D part amodal segmentation, opening new avenues for applications in geometry editing, animation, and material assignment.

Summary

The paper introduces a novel two-stage approach combining initial segmentation and diffusion-based part completion to reconstruct occluded 3D geometry.
The method leverages context-aware and local attention mechanisms within a VAE-based latent space to ensure global consistency and preserve fine details.
Experimental results demonstrate that HoloPart significantly outperforms baselines on benchmarks like ABO, enhancing applications in 3D editing, animation, and material assignment.

This paper introduces the task of 3D part amodal segmentation, which aims to decompose a 3D shape into its complete, semantically meaningful parts, even reconstructing occluded geometry. This contrasts with standard 3D part segmentation that only identifies visible surface patches. The motivation is that complete parts are more useful for downstream tasks like geometry editing, animation, and material assignment, especially for single-mesh shapes common in 3D generation or photogrammetry.

The core challenges identified are inferring occluded 3D geometry, maintaining global shape consistency, and handling diverse shapes with limited part-annotated training data.

To address this, the paper proposes a two-stage approach:

Part Segmentation: Use an existing method (like SAMPart3D (2411.07184)) to get initial, potentially incomplete, surface part segments ( $s_i$ ).
Part Completion: Introduce HoloPart, a novel diffusion-based generative model, to take an incomplete part segment ( $s_i$ ) and the original whole shape ( $m$ ) as input and generate the corresponding complete part ( $p_i$ ).

HoloPart Architecture and Training:

Foundation: HoloPart is built upon a 3D diffusion model architecture, leveraging techniques from recent works like 3DShape2VecSet (2305.16679) and CLAY (2404.17426). It uses a Variational Autoencoder (VAE) to encode shapes into a latent space where the diffusion process operates.
Pretraining: To overcome data scarcity for complete parts, the diffusion model is first pretrained on a large dataset of whole 3D shapes (Objaverse (2305.16797)) to learn a strong 3D generative prior.
Finetuning for Part Completion: The pretrained model is then finetuned specifically for the part completion task using curated part-whole pairs.
Key Mechanisms: HoloPart incorporates two novel attention mechanisms to condition the diffusion process:
- Context-aware Attention ( $c_o$ ): Uses cross-attention between the incomplete part segment ( $S_0$ ) and the entire shape ( $X$ ) along with a mask ( $M$ ) indicating the segment's location. This helps capture global context and ensure the completed part fits coherently within the whole object.
  
  $c_o = \text{CrossAttn}(\text{PosEmb}(\mathbf{S_0}), \text{PosEmb}(\mathbf{X} \text{\#\#} \mathbf{M}))$
- Local Attention ( $c_l$ ): Uses cross-attention between subsampled points ( $S_0$ ) and denser points ( $S$ ) from the incomplete part segment itself. This helps preserve fine-grained geometric details and positional information of the visible surface.
  
  $c_l = \text{CrossAttn}(\text{PosEmb}(\mathbf{S_0}), \text{PosEmb}(\mathbf{S}))$
Diffusion Objective: The model is trained to predict the noise vector added during the diffusion process, conditioned on the context ( $c_o$ ) and local ( $c_l$ ) attention outputs.

$\mathbb{E}_{z \in \mathcal{E}(K), t, \epsilon \sim \mathcal{N}(0, I)} \left[ \left\| v_\theta (z_t, t, c_o, c_l) - (\epsilon - z_0) \right\|_2^2 \right]$

where $K$ is the ground truth complete part.
Inference: During inference, the model iteratively denoises a Gaussian noise vector, conditioned on the input incomplete part and whole shape features, to generate the latent representation of the complete part. This latent is then decoded into an occupancy field, and the final mesh is extracted using Marching Cubes [lorensen1998marching].

Data Curation:

Used ABO (2203.11411) (with existing part annotations) and Objaverse (2305.16797).
Developed filtering rules (mesh count, connected components, volume distribution) to select high-quality, part-decomposable shapes from Objaverse.
Created a pipeline to generate training data pairs: whole shapes ( $m$ ), incomplete surface segments ( $s_i$ ), and ground truth complete parts ( $p_i$ ) by simulating occlusion via ray casting and mesh processing.

Experiments and Results:

Benchmarks: Introduced new evaluation benchmarks based on ABO and a curated subset called PartObjaverse-Tiny.
Metrics: Used Chamfer Distance (CD), Intersection over Union (IoU), F-Score (calculated on voxel grids), and reconstruction success rate.
Baselines: Compared against PatchComplete (2211.11318), DiffComplete (2312.12808), SDFusion (2304.11477), and a fine-tuned VAE baseline.
Findings: HoloPart significantly outperformed baselines on both ABO and the more diverse PartObjaverse-Tiny datasets, demonstrating better handling of large missing regions, complex structures, and fine details. Ablation studies confirmed the necessity of both context-aware and local attention mechanisms. Optimal performance was found with a classifier-free guidance (CFG) scale of 3.5.
Zero-shot Generalization: Showcased the ability to perform amodal segmentation on novel shapes (including generated ones) by combining SAMPart3D for initial segmentation and HoloPart for completion.

Applications:

Demonstrated potential for downstream tasks like geometry editing (resizing/moving parts), material assignment to complete parts, animation (rigging complete parts), and improved geometry processing (e.g., remeshing).
Suggested potential for geometry super-resolution by focusing the model's representational capacity on a single part.
Proposed use as a data creation tool for training part-aware generative models.

Limitations:

The quality of HoloPart's output depends on the quality of the initial surface segmentation mask. Poor masks can lead to poor completion.

In summary, the paper introduces the novel task of 3D part amodal segmentation, proposes a practical two-stage solution centered around the HoloPart diffusion model with specialized attention mechanisms, and demonstrates significant improvements over existing shape completion methods on newly established benchmarks, highlighting its applicability to various 3D content creation workflows.

PDF Markdown

Related Papers

Find Related Papers

Tweets

https://twitter.com/XihuiLiu/status/1910636940732621243

https://twitter.com/tripoai/status/1914518532118995084

https://twitter.com/arxivsanitybot/status/1910885969034940676