3D Part Amodal Segmentation
- 3D part amodal segmentation is a process that decomposes a 3D shape into semantically meaningful parts by inferring both visible and occluded regions.
- HoloPart and PartSAM exemplify cutting-edge methods that combine generative diffusion and promptable 3D field segmentation to complete occluded geometries.
- This approach enables practical applications in geometry editing, animation, and data augmentation while addressing challenges like occlusion and data scarcity.
3D part amodal segmentation is the process of decomposing a 3D shape into semantically meaningful parts such that the complete geometry of each part—including both visible and occluded (hidden or internal) regions—is predicted. Unlike standard 3D (modal) part segmentation, which assigns semantic labels only to visible surfaces, amodal segmentation seeks to infer the entire volumetric extent of each part, even where it is not directly observed. This task is central for open-world 3D understanding, content creation, and modeling workflows where complete part meshes are required for editing, animation, and downstream generative applications (Yang et al., 10 Apr 2025, Zhu et al., 26 Sep 2025).
1. Problem Statement and Distinctions
3D part amodal segmentation fundamentally diverges from traditional part segmentation by its requirement for “completion” of part geometry:
- Modal segmentation detects and classifies only those surface patches observable in the raw input, typically using mesh or point-cloud representations.
- Amodal segmentation (in 2D) predicts full silhouettes of objects behind occluders; extending this to 3D introduces additional complexities, since the algorithm must plausibly reconstruct unobserved volumes, not merely extrapolate outlines.
- 3D Amodal part segmentation specifically demands both semantic decomposition and geometric inference of occluded regions for each part.
Key technical challenges include: (1) inferring occluded or interior 3D structure (“shape completion”), (2) ensuring global consistency between completed parts so that their union restores the original whole, and (3) overcoming data scarcity, since few large-scale datasets contain paired partial and complete part annotations (Yang et al., 10 Apr 2025, Zhu et al., 26 Sep 2025).
2. Methodological Advances: Two-Stage and Promptable Pipelines
Two major paradigms address 3D part amodal segmentation:
HoloPart (Generative Segmentation via Shape Completion)
HoloPart proposes a two-stage generative pipeline (Yang et al., 10 Apr 2025):
- Surface Segmentation: An external 3D segmentation model (e.g., SAMPart3D) produces initial part masks for the visible mesh or point cloud. These masks () are converted into sampled point sets ().
- Amodal Part Completion: For each part, HoloPart takes and a mask (indicating membership of global points to part ) and infers a full 3D occupancy field for the part via a generative diffusion model. Resulting parts are extracted as meshes via Marching Cubes.
PartSAM (Promptable, Native 3D Field Segmentation)
PartSAM utilizes a promptable encoder-decoder design, trained natively on large-scale 3D data (Zhu et al., 26 Sep 2025):
- Encoder: A dual-branch, triplane-based transformer processes dense input point clouds , combining a frozen branch (incorporating image priors) and a learnable branch (adapting to full 3D semantics).
- Decoder: A cross-attention mask head processes user or automatic prompts (3D coordinates), yielding mask logits per input point. Automatic segmentation generates candidate masks for all parts with Non-Maximum Suppression (NMS) via predicted IoUs.
Both approaches enable amodal decomposition, but HoloPart uses explicit generative completion per part, while PartSAM achieves open-world amodal segmentation through a learned 3D feature field that can represent both surface and internal structures.
3. Architectural Components and Technical Innovations
HoloPart: Diffusion-Based Shape Completion
- Latent Space Operating Principle: HoloPart leverages a VAE backbone to encode input point clouds into latent embeddings ; decoder predicts occupancy for query locations in 3D, enabling mesh reconstruction.
- Diffusion Model: A DiT-style U-Net (0) is attached to the part latent vectors. The forward noise schedule is 1, with 2. Training minimizes 3, with 4 (context-aware global attention) and 5 (local attention) injected from part and whole shape features.
- Attention Mechanisms: Dual attention balances local detail (fine structure near visible regions) and global context (overall part compatibility). Multi-head attention operations are integrated in each block.
- Classifier-Free Guidance: Implemented in sampling to steer results (optimal guidance scale 6).
PartSAM: Triplane Dual-Branch Field and Promptable Masking
- Triplane Encoder: Dense point cloud features are projected onto three 2D planes before dual-branch processing; one branch preserves 2D SAM priors, the other adapts to native 3D part-aware semantics using coordinate, normal, and color inputs.
- Token Sampling: Local feature tokens 7 are created via farthest point sampling and KNN grouping; prompts 8 are augmented with position embeddings.
- Promptable Decoder: Bidirectional cross-attention combines patch tokens (9) with prompt tokens and special output/IoU tokens; per-point logits computed as 0 yield binary masks after thresholding.
- Automatic Segmentation: Multi-output tokens produce a candidate set; NMS prunes overlaps using predicted IoUs.
- Supervisory Data: PartSAM is trained on over 5 million shape-part pairs via a model-in-the-loop annotation pipeline, combining closed-world and automatically segmented open-world data.
4. Benchmark Datasets and Quantitative Results
Datasets:
| Dataset | Characteristics | Used By |
|---|---|---|
| ABO | Large-scale, with part–whole annotations | HoloPart |
| PartObjaverse-Tiny | Filtered Objaverse subset, part–whole labeled | Both |
| PartNet, PartNetE | Standard part segmentation, with occlusion benchmarks | PartSAM |
Metrics:
- Chamfer Distance (CD): 1 distance between sampled prediction/ground truth.
- IoU: Occupancy grid intersection-over-union.
- F-Score: Fraction of matched occupancy points at a threshold.
- Reconstruction Success Rate: Fraction of predicted meshes extractable from occupancy field.
Performance:
On ABO, HoloPart reduces CD from ~0.087 (DiffComplete) to 0.026, boosts IoU from 0.235 to 0.764, and F-score from 0.371 to 0.843. On PartObjaverse-Tiny, HoloPart achieves CD ~0.034, IoU 0.688, F-score 0.801, outperforming prior methods (Yang et al., 10 Apr 2025).
PartSAM, on PartObjaverse-Tiny and PartNetE, achieves IoU@1 (interactive) of 56.1% and 89.9% at 10 prompts, respectively, and automatic IoU of 69.5%, an improvement of over +20 points IoU on automatic decomposition versus clustering-based and lifting-based pipelines (Zhu et al., 26 Sep 2025).
5. Handling Occlusions and Internal Structures
Both approaches explicitly address occluded and internal part prediction:
- HoloPart: By leveraging dual-attention conditioning, HoloPart plausibly hallucinates missing geometry even under heavy occlusion (e.g., reconstructing interior chair legs, lamp wiring, or strawberry seeds). Ablation demonstrates that context-aware attention reduces CD by 40–50% and increases IoU by 20–30%. Exclusion of local attention results in loss of fine structures (Yang et al., 10 Apr 2025).
- PartSAM: The encoder’s continuous 3D feature field enables direct querying of internal volume; positive prompts within cavities or non-surface regions propagate through the decoder to segment the full amodal extent of internal parts. Qualitative examples demonstrate mask completion for concealed water-tank walls and car door interiors; performance is limited primarily by resolution for extremely thin structures (Zhu et al., 26 Sep 2025).
6. Limitations and Future Directions
Identified limitations include:
- HoloPart: Strong dependence on initial segmentation; errors or fragmented input masks can cause mis-completion of parts. The method struggles with highly complex scenes, numerous small parts, or nonrigid/deformable objects. Proposed future directions are joint mask refinement, training with real-world noisy scans, extending to articulated/deformable objects, and end-to-end part-aware generation (Yang et al., 10 Apr 2025).
- PartSAM: While strong in geometry-aware amodal decomposition, it does not assign semantic part labels (remaining class-agnostic), and rare categories with few examples or fine/thin internal structures are problematic due to feature resolution. Suggested improvements are enrichment with user-tagged semantic labels, increased field resolution, and a joint segmentation–labeling head with vision–language modules (Zhu et al., 26 Sep 2025).
7. Practical Applications
Complete amodal part segmentation enables:
- Geometry Editing: Modifying, duplicating, or removing parts in 3D modeling software (e.g., Blender) with consistent geometry.
- Animation & Rigging: Attaching pivot/rig parameters to completed parts for kinematic tasks.
- Material Assignment: Assigning unique materials or UV maps to fully reconstructed part meshes.
- Data Augmentation: Producing enlarged datasets of part-aware, complete shapes for training generative or discriminative 3D models.
A plausible implication is that high-fidelity amodal segmentation unlocks scalable, automation-driven workflows in content creation, digital twin modeling, and simulation (Yang et al., 10 Apr 2025, Zhu et al., 26 Sep 2025).