Papers

Topics

Authors

Recent

View all

Gemini 2.5 Flash

127 tokens/sec

GPT-4o

11 tokens/sec

Gemini 2.5 Pro Pro

53 tokens/sec

o3 Pro

5 tokens/sec

GPT-4.1 Pro

10 tokens/sec

DeepSeek R1 via Azure Pro

33 tokens/sec

2000 character limit reached

PhysXGen: Physics-Informed 3D Asset Generation

Updated 17 July 2025

PhysXGen is a physics-grounded framework that fuses 3D generative modeling with explicit physical property annotation.
It features a dual-branch architecture separately encoding geometric and physical attributes to ensure realistic and simulation-compatible outputs.
Leveraging the extensive PhysXNet dataset and joint diffusion-based training, it markedly reduces property prediction errors compared to prior approaches.

PhysXGen is a feed-forward framework for physics-grounded 3D asset generation, designed to couple state-of-the-art 3D generative models with explicit modeling and prediction of physical properties. The goal is to produce 3D assets that are not only visually and structurally plausible but also annotated and constrained with meaningful physical attributes—including absolute scale, material properties, affordance, part-level kinematics, and functional descriptions—thereby facilitating real-world simulation, embodied AI, and other physically-aware applications (2507.12465).

1. Fundamental Principles and Objectives

PhysXGen addresses a significant gap in the domain of 3D generation: most prior methods focus on geometric detail and texture while neglecting attributes essential for physical realism and downstream simulation. By explicitly injecting physical knowledge into the generative process, PhysXGen enables the end-to-end creation of 3D assets that are natively compatible with simulation engines and robotics systems. This is achieved through dual-branch latent modeling and the integration of a physics-annotated 3D dataset, PhysXNet, supporting extensible physical annotation and property learning.

2. Dual-Branch Architecture and Model Design

A central distinguishing feature of PhysXGen is its dual-branch architecture, which learns the structural–physical correlations at latent representation level:

Structural Branch: Encodes geometric and appearance features (e.g., using pre-trained VAE modules such as those from DINOv2/TRELLIS), producing the structural latent:

$P_{\text{slat}} = \mathcal{E}_{\text{aes}}(P_{\text{aes}})$

where $P_{\text{aes}}$ encodes input asset appearance and geometry, and $\mathcal{E}_{\text{aes}}$ is the pre-trained VAE encoder.

Physical Branch: Compresses a wide range of physical properties to a latent code:

$P_{\text{plat}} = \mathcal{E}_{\text{phy}}(P_{\text{phy}}, P_{\text{sem}})$

where $P_{\text{phy}}$ aggregates physical metrics (scale, density, kinematics, affordance) and $P_{\text{sem}}$ is a function/utility text embedding (via CLIP or similar models).

These latents are further coupled by learnable residual connections—ensuring that, for example, physical constraints influence the fine-scale details synthesized by the structural decoder. The entire system is optimized via joint diffusion-based training. The structural branch and the physical branch have separate loss functions (e.g., geometry and property prediction errors) but are jointly aligned in the diffusion latent space using a combined loss:

$\mathcal{L}_{\text{vae}} = \mathcal{L}_{\text{aes}}^{(\text{color})} + \mathcal{L}_{\text{aes}}^{(\text{geometry})} + \mathcal{L}_{\text{phy}} + \mathcal{L}_{\text{sem}} + \mathcal{L}_{\text{kl}} + \mathcal{L}_{\text{reg}}$

$\mathcal{L}_{\text{diff}} = \mathcal{L}_{\text{aes}} + \mathcal{L}_{\text{phy}}$

where $\mathcal{L}_{\text{phy}}$ enforces accuracy of physical property outputs and $\mathcal{L}_{\text{reg}}$ supports mesh structure regularization.

3. PhysXNet: Physics-Grounded 3D Dataset

PhysXNet is the foundational dataset for PhysXGen and represents the first large-scale resource of 3D object/part assets systematically annotated across five “physics-first” dimensions:

Dimension	Examples/Notes
Absolute Scale	Object-part dimensions in standard units (e.g., cubic centimeters)
Material	Explicit material class, Young’s modulus, Poisson’s ratio, density
Affordance	Part likeliness for grasp/touch, ranked at part granularity
Kinematics	Joint type, axis, parent–child mesh, movement range and direction
Function Desc.	CLIP-based multi-level textual annotation (basic, function, kinematic)

PhysXNet and its extended version PhysXNet-XL (containing 6M procedurally-annotated objects) enable not only the supervised learning of property prediction but also the modeling of high-level structure–property dependencies. Annotation utilizes a scalable human-in-the-loop process: visual isolation and rendering of object parts, automatic VLM (e.g., GPT-4o) annotation, followed by expert refinement using mesh analysis and clustering (e.g., k-means for revolute joint axis).

4. Learning and Property Prediction

PhysXGen’s training strategy emphasizes not only generative fidelity but also the accurate synthesis of physical attributes, as measured by mean absolute error (MAE) on individual property dimensions. This is facilitated by:

Rendering and evaluating properties like absolute scale, material, and affordance from multiple random viewpoints, enforcing viewpoint-invariant property predictions.
Using a combination of property regression, semantic text–geometry alignment, and explicit geometry–physics latent coupling.
Evaluating geometry via Chamfer Distance and F-Score, and visuals with PSNR across rendered viewpoints.

Experimental results indicate that PhysXGen surpasses prior pipelines such as TRELLIS + PhysPre, halving property prediction errors across several attributes (e.g., absolute scale MAE drops from 12.46 to 6.63, material MAE from 0.262 to 0.141).

5. Human-in-the-Loop Data Annotation

A structured data annotation pipeline underpins PhysXNet’s corpus quality:

Visual Isolation: Alpha compositing for clean component renders.
Automated VLM Labeling: Vision-LLMs annotate fundamental part properties and semantic utility.
Expert Refinement: Mesh point clouds (from child–parent relationships) enable algorithmic extraction of kinematics (rotational axes, movement range), with clustering and plane-fitting for revolute joints.

This yields consistent, fine-grained annotations, enabling robust learning of physics-grounded representations.

6. Applications and Future Developments

PhysXGen’s physically-grounded 3D assets have broad utility in:

Simulation environments (robotics, virtual/augmented reality, digital twins) where assets must possess both realistic geometry and simulation-ready physical labels.
Training embodied AI and manipulation policies, where affordance maps, material properties, and kinematics are essential for learning robust sensorimotor skills.
Industrial workflows requiring digital replicas with empirically meaningful property annotations.
Asset design for simulation in physics-based gaming, engineering, and model-based control.

Anticipated advancements include addressing fine-grained property prediction challenges, reducing geometric/physical inconsistencies during generation, enriching the framework with additional material and kinematic property types, and improving semantic-text/geometry alignment for more nuanced functional annotation.

7. Comparative Performance

PhysXGen has been empirically validated to outperform standalone property predictors and prior baselines, especially in learning cross-property consistency. Its dual-branch VAE/diffusion design is a critical factor in achieving low structural and physical property error. Ablation studies corroborate that removing cross-branch latent connections or replacing the joint model with independent predictors leads to a measurable drop in property accuracy and geometry quality.

PhysXGen thus establishes a paradigm for explicitly physical-grounded 3D asset generation, merging generative structural priors and detailed property modeling via a robust dual-branch architecture and supported by a scalable annotation pipeline (2507.12465).

PDF Markdown Chat (Upgrade)

References (1)

PhysX: Physical-Grounded 3D Asset Generation (2025)