Papers
Topics
Authors
Recent
Search
2000 character limit reached

HoloPart Diffusion: 3D Part Amodal Segmentation

Updated 1 April 2026
  • HoloPart Diffusion Model is a conditional generative framework that infers full 3D part geometry from partial, occluded observations.
  • It employs a dual-stage architecture combining off-the-shelf surface segmentation with a specialized diffusion-based completion module using global and local attention.
  • Empirical evaluations demonstrate significant improvements in Chamfer Distance and IoU, enabling advanced applications in geometry editing, animation, and material assignment.

HoloPart Diffusion Model denotes a conditional generative framework for 3D part amodal segmentation, designed to infer the complete geometry of semantic object parts—including their occluded regions—from visible partial observations. HoloPart addresses major gaps in 3D content pipelines where the limitation of surface-only part segmentation inhibits tasks such as geometry editing, animation, and part-level material assignment. The method introduces a two-stage architecture that combines off-the-shelf visible-part segmentation with a specialized diffusion-based part completion module incorporating global and local attention mechanisms (Yang et al., 10 Apr 2025).

1. 3D Part Amodal Segmentation: Task Definition and Motivation

3D part amodal segmentation entails decomposing a complete 3D object mm (expressed as a mesh or point cloud) into a set of semantically meaningful constituent parts P={p1,...,pn}P = \{p_1, ..., p_n\}, where each part pip_i comprises both visible and occluded (unobservable) geometry. Unlike conventional 3D part segmentation, which produces only surface masks sis_i limited to observed geometry, amodal segmentation allows for whole-object editing, robust animation rigging, and localized material transfers by providing the full geometry of every part.

The primary challenges are:

  • Inferring Occluded Geometry: Only partial surface patches are visible; predicting the full part requires substantial shape priors.
  • Ensuring Global Consistency: Inferred parts must fit together seamlessly within the object's overall geometry.
  • Limited Annotated Data: Datasets with exhaustive part-level full geometry are scarce, yet the approach must generalize widely.

HoloPart addresses these concerns by decomposing the pipeline into two explicit stages:

  1. Surface Segmentation: Extraction of incomplete part masks sis_i using external segmenters (e.g., SAMPart3D).
  2. Part Completion: Generation of the full part geometry pip_i from sis_i conditioned on both local detail and global context using a diffusion process.

2. Model Architecture: Dual-Stream Conditional Diffusion

At the core is a U-Net–style diffusion transformer (DiT) operating in the latent space of a variational autoencoder (VAE). For each incomplete part segment, the HoloPart architecture computes two complementary conditioning streams via cross-attention:

  • Global Shape-Context Attention (coc_o): Encodes spatial relationships and overall shape layout by cross-attending from sampled part queries S0S_0 (using FPS) to the masked global object point cloud XX with mask P={p1,...,pn}P = \{p_1, ..., p_n\}0.
  • Local Attention (P={p1,...,pn}P = \{p_1, ..., p_n\}1): Encodes fine-grained geometric detail by attending from P={p1,...,pn}P = \{p_1, ..., p_n\}2 to points P={p1,...,pn}P = \{p_1, ..., p_n\}3 on the visible surface patch.

These conditioning streams are concatenated (or summed) and injected into each cross-attention layer in the denoising U-Net throughout the diffusion process.

Input Encoding Schema:

  • P={p1,...,pn}P = \{p_1, ..., p_n\}4: Object surface point cloud.
  • P={p1,...,pn}P = \{p_1, ..., p_n\}5: Part mask.
  • P={p1,...,pn}P = \{p_1, ..., p_n\}6: Visible part-surface points.
  • P={p1,...,pn}P = \{p_1, ..., p_n\}7: Subsampled query points for attention.

Position and normal vectors are embedded and concatenated to all 3D points pre-attention.

Global Attention:

P={p1,...,pn}P = \{p_1, ..., p_n\}8

Local Attention:

P={p1,...,pn}P = \{p_1, ..., p_n\}9

3. Diffusion Process Formulation

HoloPart leverages the latent diffusion paradigm:

  • Latent Representation: Point clouds are mapped to latent vectors pip_i0 via VAE encoder pip_i1, and decoded via pip_i2.
  • Forward (Noising) Process: Standard DDPM schedule with pip_i3 steps. At each pip_i4, noise is added as:

pip_i5

with pip_i6.

  • Reverse (Denoising) Process: The network pip_i7 predicts the added noise, defining:

pip_i8

The mean pip_i9 matches DDPM parameterization.

  • Training Objective: Minimize

sis_i0

Classifier-free guidance is applied by randomly dropping sis_i1 and/or sis_i2 in training.

No auxiliary geometry or segmentation regularization losses are used beyond VAE pretraining.

4. Sampling Pipeline and Implementation Specifics

The conditional sampling proceeds as follows:

Input: incomplete segment S, full shape X, mask M

1. Encode shape context: c_o, c_l

2. Sample z_T ∼ N(0, I)

3. For t = T … 1:
      z_{t-1} = DDIM_Step(z_t, t, ε_θ(·; c_o, c_l), guidance = S)

4. Decode: ŷ = D(z_0) → occupancy in local bounding box

5. Extract mesh via Marching Cubes
  • Guidance scale: sis_i3 yields optimal part fidelity.
  • DDIM sampling (20–50 steps) accelerates sampling.
  • Bounding box: Expanded to sis_i4 the segmented patch’s extents to accommodate occluded geometry.

5. Empirical Evaluation and Comparative Analysis

Datasets and Benchmarks

  • ABO (bed, table, lamp, chair): 20K training part instances; 60 test shapes (~1K parts).
  • PartObjaverse-Tiny (8 categories): 160K object parts (train); 200 shapes (3K parts in test).

Metrics

  • Chamfer Distance (↓)
  • Intersection-over-Union (IoU, ↑)
  • F-Score@1% (↑)
  • Reconstruction Success Rate (↑)

Results

Method / Dataset Chamfer ↓ IoU ↑ F-Score ↑
ABO (HoloPart) 0.026 0.764 0.843
PatchComplete 0.122 0.159 0.259
DiffComplete 0.087 0.235 0.371
Finetune-VAE 0.037 0.565 0.689
PartObjaverse-Tiny (HoloPart) 0.034 0.688 0.801
PatchComplete 0.144 0.137 0.232
DiffComplete 0.133 0.142 0.239
SDFusion 0.137 0.235 0.365
Finetune-VAE 0.064 0.502 0.638

HoloPart reduces Chamfer error by approximately half and improves IoU/F-Score by 20–30 points relative to baselines.

Ablation studies reveal:

  • Removal of context attention increases Chamfer by sis_i5 and decreases IoU by sis_i6.
  • Removal of local attention increases Chamfer by sis_i7 and decreases F-Score by sis_i8.

Zero-Shot Generalization

Combined use of SAMPart3D and HoloPart yields complete part instances on previously unseen objects and generative meshes, confirming robust generalization (Yang et al., 10 Apr 2025).

6. Applications and Integration Scenarios

HoloPart enables a variety of downstream 3D content creation and manipulation tasks:

  • Geometry Editing: Direct manipulation, resizing, or replacement of individual parts without mesh artifacts.
  • Animation: Per-part rigging of fully reconstructed shapes (e.g., animating occluded wheels).
  • Material Assignment: Unique textures can be applied to semantically coherent and geometrically complete parts.
  • Geometry Processing: Enhanced remeshing and smoothing from watertight, complete part geometry.
  • Super-Resolution: By distributing token budgets at the part level, HoloPart achieves greater part detail compared to monolithic VAE-based approaches.

7. Limitations and Contributions

Contributions

  1. Introduction of the 3D part amodal segmentation problem and two new corresponding benchmarks (ABO, PartObjaverse-Tiny).
  2. Proposal of a dual-conditioned latent diffusion model for part completion, integrating global and local attention.
  3. Demonstrated improvements over leading shape completion methods and generalizability to novel categories.
  4. Enablement of applications in practical 3D content creation.

Limitations

  • Model accuracy depends strongly on the quality of input segmentation masks; errors in initial part extraction propagate to completion.
  • Requirement for pretrained 3D generative VAEs increases overall system complexity.

HoloPart constitutes a step forward in bridging perceptual segmentation and practical, high-fidelity 3D part completion for content creation, editing, and analysis, establishing a new paradigm in 3D shape understanding (Yang et al., 10 Apr 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to HoloPart Diffusion Model.