HoloPart Diffusion: 3D Part Amodal Segmentation
- HoloPart Diffusion Model is a conditional generative framework that infers full 3D part geometry from partial, occluded observations.
- It employs a dual-stage architecture combining off-the-shelf surface segmentation with a specialized diffusion-based completion module using global and local attention.
- Empirical evaluations demonstrate significant improvements in Chamfer Distance and IoU, enabling advanced applications in geometry editing, animation, and material assignment.
HoloPart Diffusion Model denotes a conditional generative framework for 3D part amodal segmentation, designed to infer the complete geometry of semantic object parts—including their occluded regions—from visible partial observations. HoloPart addresses major gaps in 3D content pipelines where the limitation of surface-only part segmentation inhibits tasks such as geometry editing, animation, and part-level material assignment. The method introduces a two-stage architecture that combines off-the-shelf visible-part segmentation with a specialized diffusion-based part completion module incorporating global and local attention mechanisms (Yang et al., 10 Apr 2025).
1. 3D Part Amodal Segmentation: Task Definition and Motivation
3D part amodal segmentation entails decomposing a complete 3D object (expressed as a mesh or point cloud) into a set of semantically meaningful constituent parts , where each part comprises both visible and occluded (unobservable) geometry. Unlike conventional 3D part segmentation, which produces only surface masks limited to observed geometry, amodal segmentation allows for whole-object editing, robust animation rigging, and localized material transfers by providing the full geometry of every part.
The primary challenges are:
- Inferring Occluded Geometry: Only partial surface patches are visible; predicting the full part requires substantial shape priors.
- Ensuring Global Consistency: Inferred parts must fit together seamlessly within the object's overall geometry.
- Limited Annotated Data: Datasets with exhaustive part-level full geometry are scarce, yet the approach must generalize widely.
HoloPart addresses these concerns by decomposing the pipeline into two explicit stages:
- Surface Segmentation: Extraction of incomplete part masks using external segmenters (e.g., SAMPart3D).
- Part Completion: Generation of the full part geometry from conditioned on both local detail and global context using a diffusion process.
2. Model Architecture: Dual-Stream Conditional Diffusion
At the core is a U-Net–style diffusion transformer (DiT) operating in the latent space of a variational autoencoder (VAE). For each incomplete part segment, the HoloPart architecture computes two complementary conditioning streams via cross-attention:
- Global Shape-Context Attention (): Encodes spatial relationships and overall shape layout by cross-attending from sampled part queries (using FPS) to the masked global object point cloud with mask 0.
- Local Attention (1): Encodes fine-grained geometric detail by attending from 2 to points 3 on the visible surface patch.
These conditioning streams are concatenated (or summed) and injected into each cross-attention layer in the denoising U-Net throughout the diffusion process.
Input Encoding Schema:
- 4: Object surface point cloud.
- 5: Part mask.
- 6: Visible part-surface points.
- 7: Subsampled query points for attention.
Position and normal vectors are embedded and concatenated to all 3D points pre-attention.
Global Attention:
8
Local Attention:
9
3. Diffusion Process Formulation
HoloPart leverages the latent diffusion paradigm:
- Latent Representation: Point clouds are mapped to latent vectors 0 via VAE encoder 1, and decoded via 2.
- Forward (Noising) Process: Standard DDPM schedule with 3 steps. At each 4, noise is added as:
5
with 6.
- Reverse (Denoising) Process: The network 7 predicts the added noise, defining:
8
The mean 9 matches DDPM parameterization.
- Training Objective: Minimize
0
Classifier-free guidance is applied by randomly dropping 1 and/or 2 in training.
No auxiliary geometry or segmentation regularization losses are used beyond VAE pretraining.
4. Sampling Pipeline and Implementation Specifics
The conditional sampling proceeds as follows:
Input: incomplete segment S, full shape X, mask M
1. Encode shape context: c_o, c_l
2. Sample z_T ∼ N(0, I)
3. For t = T … 1:
z_{t-1} = DDIM_Step(z_t, t, ε_θ(·; c_o, c_l), guidance = S)
4. Decode: ŷ = D(z_0) → occupancy in local bounding box
5. Extract mesh via Marching Cubes
- Guidance scale: 3 yields optimal part fidelity.
- DDIM sampling (20–50 steps) accelerates sampling.
- Bounding box: Expanded to 4 the segmented patch’s extents to accommodate occluded geometry.
5. Empirical Evaluation and Comparative Analysis
Datasets and Benchmarks
- ABO (bed, table, lamp, chair): 20K training part instances; 60 test shapes (~1K parts).
- PartObjaverse-Tiny (8 categories): 160K object parts (train); 200 shapes (3K parts in test).
Metrics
- Chamfer Distance (↓)
- Intersection-over-Union (IoU, ↑)
- F-Score@1% (↑)
- Reconstruction Success Rate (↑)
Results
| Method / Dataset | Chamfer ↓ | IoU ↑ | F-Score ↑ |
|---|---|---|---|
| ABO (HoloPart) | 0.026 | 0.764 | 0.843 |
| PatchComplete | 0.122 | 0.159 | 0.259 |
| DiffComplete | 0.087 | 0.235 | 0.371 |
| Finetune-VAE | 0.037 | 0.565 | 0.689 |
| PartObjaverse-Tiny (HoloPart) | 0.034 | 0.688 | 0.801 |
| PatchComplete | 0.144 | 0.137 | 0.232 |
| DiffComplete | 0.133 | 0.142 | 0.239 |
| SDFusion | 0.137 | 0.235 | 0.365 |
| Finetune-VAE | 0.064 | 0.502 | 0.638 |
HoloPart reduces Chamfer error by approximately half and improves IoU/F-Score by 20–30 points relative to baselines.
Ablation studies reveal:
- Removal of context attention increases Chamfer by 5 and decreases IoU by 6.
- Removal of local attention increases Chamfer by 7 and decreases F-Score by 8.
Zero-Shot Generalization
Combined use of SAMPart3D and HoloPart yields complete part instances on previously unseen objects and generative meshes, confirming robust generalization (Yang et al., 10 Apr 2025).
6. Applications and Integration Scenarios
HoloPart enables a variety of downstream 3D content creation and manipulation tasks:
- Geometry Editing: Direct manipulation, resizing, or replacement of individual parts without mesh artifacts.
- Animation: Per-part rigging of fully reconstructed shapes (e.g., animating occluded wheels).
- Material Assignment: Unique textures can be applied to semantically coherent and geometrically complete parts.
- Geometry Processing: Enhanced remeshing and smoothing from watertight, complete part geometry.
- Super-Resolution: By distributing token budgets at the part level, HoloPart achieves greater part detail compared to monolithic VAE-based approaches.
7. Limitations and Contributions
Contributions
- Introduction of the 3D part amodal segmentation problem and two new corresponding benchmarks (ABO, PartObjaverse-Tiny).
- Proposal of a dual-conditioned latent diffusion model for part completion, integrating global and local attention.
- Demonstrated improvements over leading shape completion methods and generalizability to novel categories.
- Enablement of applications in practical 3D content creation.
Limitations
- Model accuracy depends strongly on the quality of input segmentation masks; errors in initial part extraction propagate to completion.
- Requirement for pretrained 3D generative VAEs increases overall system complexity.
HoloPart constitutes a step forward in bridging perceptual segmentation and practical, high-fidelity 3D part completion for content creation, editing, and analysis, establishing a new paradigm in 3D shape understanding (Yang et al., 10 Apr 2025).