UniPart: Unified 3D & Simulation Frameworks
- UniPart is a family of frameworks for unified part-level modeling, enabling explicit control in 3D object generation and multiphysics simulations.
- It employs dual-stage latent diffusion and a unified encoder-decoder architecture to integrate geometric and segmentation cues without external models.
- Its multiphysics extension uses Partition of Unity and peridynamic enrichment to achieve high accuracy in fracture mechanics and effective local-global coupling.
UniPart is a family of frameworks developed for unified part-level modeling and synthesis in both 3D object generation and multiphysics simulation domains. Two prominent instantiations are documented: (1) a variational–peridynamic enrichment of the Partition of Unity Method (PUM) for fracture mechanics (“UniPart” in numerical PDEs) (Birner et al., 2021) and (2) UniPart for image-guided decomposable 3D shape synthesis via a unified geometry–segmentation latent (“Geom-Seg VecSet”) and dual-space latent diffusion (He et al., 10 Dec 2025). Both leverage a global infrastructure in which part-awareness is directly embedded into the representation and computation, enabling explicit localized control (fracture in simulation, part specification and alignment in synthesis) without reliance on external segmentation models or expensive fully global solvers.
1. Unified Geom-Seg Latent Representation (Geom-Seg VecSet)
UniPart for part-level 3D generation is built on top of the VecSet VAE architecture, augmenting it to produce a latent code in which each vector encodes both geometric and part-label information. For a surface mesh , sampled points are annotated with normals and part-ID , forming the input . The encoder employs a cross-attention block between the input points and learnable queries, followed by several self-attention layers.
Decoders are structured as:
- Geometry decoder predicting SDF or occupancy at query .
- Segmentation decoder (promptable, following [ravi2024sam2]), mapping latent queries with respect to part prompts to segmentation masks .
Training minimizes the composite loss:
where
- ,
- cross entropy between predicted and ground-truth mask,
- the latent KL-divergence. The fine-tuning requires only the segmentation decoder; latent capacity (, ) remains unchanged, preserving geometric quality while enriching for part structure (He et al., 10 Dec 2025).
2. Two-Stage Latent Diffusion Framework
The UniPart synthesis pipeline uses a cascade of diffusion transformers (DiT), each stage operating in structured latent space.
Stage 1 (Whole-object): Joint geometry and part segmentation diffusion is performed in . The forward noising step is , with . Conditional flow matching [lipman2022flowmatching] is used to train the denoising velocity field with the objective
where is an RGB image, encoded and cross-attended in each DiT block. Classifier-free guidance is applied by randomly omitting with probability 0.1.
After denoising, the segmentation decoder assigns a part label to each latent; a position head infers anchor points, and the latents are grouped into soft clusters for each part by FPS and NMS on pairs .
Stage 2 (Part-level): Independent latent diffusion is conducted for each part in dual spaces—global coordinate space (gcs) and normalized canonical space (ncs)—with input
and conditions (i) image, (ii) whole-object latent , (iii) coarse part cluster . Space-specific embeddings are added to tokens before transformer layers.
Attention is interleaved as follows:
- Local: only within tokens of gcs or ncs subspace.
- Global: across all part tokens, enforcing cross-space consistency.
3. Dual-Space Generation and Assembled Placement
Decoding both global and canonical latent views for every part yields
The ncs mesh is a canonical [0,1]³ shape; the gcs mesh determines global pose. Similarity transform is computed from the gcs bounding box and centroid . Final part meshes are mapped as and the full object is . This dual-space approach mitigates collapse of fine detail in part meshes and ensures precise part placement (He et al., 10 Dec 2025).
4. Image Conditioning and Transformer-based Cross-Modal Coupling
In both UniPart diffusion stages, DiT blocks alternately apply:
- Latent token self-attention,
- Cross-attention to image tokens ,
- Feed-forward layers,
- (Part stage) Cross-attention to whole-object and part-level latents.
The image encoder is a ViT backbone (here matching Hunyuan3D-2.1). Classifier-free guidance is trained for robustness to missing conditions and is leveraged at inference by linear velocity blending. This paradigm effectively utilizes 2D semantic priors during 3D generation, without external segmenters (He et al., 10 Dec 2025).
5. Quantitative Evaluation and Ablation Studies
On a held-out dataset of 100 shapes, UniPart demonstrates state-of-the-art performance on part-level Chamfer Distance (CD↓) and F-Score (F↑) metrics:
| Method | CD↓ | F₀.₀₅↑ | F₀.₁₀↑ |
|---|---|---|---|
| HoloPart | 0.1492 | 0.5208 | 0.7450 |
| OmniPart | 0.1453 | 0.5273 | 0.7656 |
| PartCrafter | 0.1778 | 0.4749 | 0.7120 |
| PartPacker | 0.1654 | 0.4715 | 0.7226 |
| X-Part | 0.1533 | 0.5242 | 0.7523 |
| UniPart | 0.1311 | 0.5565 | 0.8052 |
Segmentation mIoU over generated objects:
| Method | mIoU↑ |
|---|---|
| SAMesh | 0.3608 |
| PartField | 0.4167 |
| P3-SAM | 0.7046 |
| UniPart | 0.7222 |
Ablation studies reveal that removing the ncs diffusion degrades CD to ≈0.145 and F₀.₀₅ drops by ~4%; omitting local-only attention increases CD to ~0.140; skipping space embeddings results in frequent misassemblies. These findings validate the importance of the staged diffusion, dual-space modeling, and explicit token structuring (He et al., 10 Dec 2025).
6. UniPart in Multiphysics Simulation (PUM–Peridynamic Enrichment)
A structurally separate framework under the UniPart umbrella addresses variational simulation of fracture via multiscale Partition of Unity Methods:
- The computational domain is covered by overlapping patches with partition functions , .
- The global trial space , with basis for polynomials and optional enrichment (Heaviside, Westergaard).
- Fracture-prone subdomains are modeled by peridynamics (PD), with strong form:
- A global–local enrichment algorithm solves for over , hands off subdomain data to PD for local fracture evolution, and reinjects fine-scale crack response and geometry as real-time enrichment functions.
- This approach enables a unified variational framework accommodating linear elasticity (PU), nonlocal PD fracture, and seamless patch-wise up-/downscaling of solution accuracy (Birner et al., 2021).
7. Theoretical Analysis, Performance, and Implications
UniPart’s unified latent and solver representations yield key technical benefits:
- The Geom-Seg VecSet demonstrates that joint geometric and segmentation encoding can be trained with no loss in geometric quality, enabling explicit part control and cross-modal conditioning without large annotated part segmentation datasets.
- Dual-space and hierarchical diffusion strategies result in high-fidelity generation and superior part-level correspondence not achievable with single-stage or global-only methods.
- In multiphysics simulation, global–local enrichment via PU and PD achieves optimal error rates for polynomials of degree :
- , , including near singularities (e.g., cracks).
- Numerical studies confirm that PU and PD models match closely (maximum displacement error <3%, up to – m for stationary cracks), with local PD computations offering drastic reductions in compute time relative to global PD (Birner et al., 2021).
This suggests that UniPart paradigms—whether for generative modeling or physical simulation—provide a scalable, explicit, and interpretable methodology for decomposable, part-aware computation and synthesis, eliminating reliance on monolithic black-box systems or expensive end-to-end training for segmentation and local detail.