PhysXNet: Cloth Dynamics & 3D Assets

Updated 14 August 2025

PhysXNet is a family of datasets characterized by detailed cloth simulation and physically-annotated 3D objects for realistic dynamic modeling.
It employs synthetic generation via Avatar Blender and meticulous human-in-the-loop annotation to capture precise kinematic and physical parameters.
The dataset enables robust simulation and deep learning integration, advancing garment simulation and 3D object manipulation beyond traditional methods.

PhysXNet is a comprehensive family of datasets originating in two distinct but conceptually aligned settings: deformable cloth simulation from human motion (PhysXNet from “PhysXNet: A Customizable Approach for Learning Cloth Dynamics on Dressed People” (Sanchez-Riera et al., 2021)) and physically-annotated 3D object assets (PhysXNet from “PhysX-3D: Physical-Grounded 3D Asset Generation” (Cao et al., 16 Jul 2025)). Both instances address the critical need for data that parameterizes not only geometry but physical properties, enabling data-driven models to reason about physical dynamics—whether for garment simulation or interaction-rich object manipulation.

1. Dataset Generation Methodologies

PhysXNet for cloth dynamics is synthetically constructed through the Avatar Blender add-on utilizing Makehuman parametric models in Blender. The dataset captures clothed human avatars performing 50 distinct skeletal action sequences sourced from repositories such as CMU and Mixamo. Each simulation encompasses three garment templates—tops, bottoms, and dresses—mapped into UV representations. The simulation pipeline generates, for each frame, both the undressed body mesh and the dynamically deformed garment mesh, enabling precise correspondence between kinematic input and cloth output.

In PhysXNet for 3D asset physical annotation, the dataset is constructed atop repositories such as PartNet. The data pipeline decomposes over 26,000 objects into an average of 5 parts per object, each annotated with fine-grained physical properties. PhysXNet-XL further procedurally generates more than 6 million object assets, scaling diversity and annotation coverage. Annotation employs a human-in-the-loop methodology leveraging vision-LLMs (e.g., GPT-4o), alpha compositing for part isolation, automatic raw physical annotation generation, and expert refinement especially on kinematic parameters via geometric processing.

2. Data Structure, Annotation, and Physical Parameterization

The PhysXNet cloth dynamics dataset comprises:

3D Skeleton Motion Sequences: Each frame stores body mesh configured by shape/pose parameters with derived velocity and acceleration on per-point basis.
Body UV Maps ( $I^\mathrm{b}_{v,k}$ , $I^\mathrm{b}_{a,k}$ ): Encodes per-pixel displacement and their temporal differences, representing velocity and acceleration as 2D rasterized signals from the mesh surfaces.
Garment UV Maps ( $I^\mathrm{c}_{o,k}$ ): Encodes the offset

$I^\mathrm{c}_{o,k}(u,v) = T^{BC}(u,v) M^\mathrm{c}_k - M^\mathrm{b}_k$

where $T^{BC}$ is the body-to-cloth transference, linking body UV coordinates to nearest garment surface locations.

The PhysXNet-3D asset dataset features annotation on five key physics-grounded dimensions:

Dimension	Nature of Annotation	Role in Simulation/Reasoning
Absolute Scale	Real-world length/width/height	Drives gravitational, mass, and volumetric effects
Material	Material name, Young’s modulus, Poisson’s ratio, density	Models elasticity, stiffness, response to force
Affordance	1–10 priority rank for interaction likelihood	Identifies actionable, manipulable parts
Kinematics	Discrete types (fixed, prismatic, revolute, etc.) and joint parameters	Encodes movement constraints and degrees of freedom
Function Description	Multi-level textual annotation	Enables semantic–physical multimodal reasoning

Annotation is performed by rendering each part separately for VLM input, extracting physical parameters, and for kinematic parts, fitting planes and clustering movement axes using point clouds. Human experts provide final validation, especially correcting ambiguous VLM outputs or kinematic axes.

3. Model Integration and Usage

For cloth dynamics, the PhysXNet dataset enables a direct regression problem mapping dynamic body kinematics to cloth displacement, formalized as $X = \{I^\mathrm{b}_{v,k-2:k}, I^\mathrm{b}_{a,k-2:k}\}$ , $Y = I^\mathrm{c}_{o,k}$ . A conditional GAN architecture ingests triplets of body UV maps and predicts per-pixel cloth offset UV maps, branching output decoders for each template garment. The generator is regularized by adversarial and L1 terms:

$\mathcal{L}_{G} = \mathbb{E}_x[1 - \log D(G(X))] + \lambda_{l1} \|I^\mathrm{c}_{o,k} - \hat{I}^\mathrm{c}_{o,k}\|_1$

while the PatchGAN discriminator enforces mesh realism.

For 3D asset annotation, PhysXGen—the generative model—leverages PhysXNet’s annotation to disentangle latent spaces:

Structural latent space ( $P_{aes}$ ): Geometry and appearance, extracted via DINOv2.
Physical latent space ( $P_{phy}$ ): Physical parameters, encoded and concatenated. Separate encoders process these latents, which are decoded and interconnected (with physical knowledge injected via residual connections). A transformer-based diffusion model jointly optimizes geometry and physics, employing Conditional Flow Matching.

4. Evaluation Metrics and Benchmarks

PhysXNet for cloth simulation quantifies prediction quality using:

Mean Squared Error (MSE): On per-pixel UV garment maps and garment mesh vertices, versus Blender physics simulation ground truth.
Qualitative comparison: Visual matches with spring-mass physics engines, Linear Blend Skinning (LBS), and TailorNet. PhysXNet demonstrates closer inertial and dynamic fidelity to physics-based reference, particularly in challenging action types (e.g., rapid movements).

In PhysXNet-3D asset annotation, evaluation covers both geometric and physical properties:

Geometry/appearance: PSNR, Chamfer Distance (CD), F-Score.
Physical properties: Mean Absolute Error (MAE) for absolute scale, material, affordance, kinematics, and function description, computed over multi-view renderings. PhysXGen outperforms hybrid baselines (TRELLIS + PhysPre) on both axes, with explicit latent structure cross-domain encoding yielding lower MAE and better geometric realism on generated models.

5. Applications and Model Adaptability

PhysXNet datasets address previously intractable simulation and reasoning problems:

Cloth simulation: Enables rapid, differentiable computation of deformable cloth meshes for downstream deep learning pipelines.
3D asset generation: Physically-grounded assets now support downstream tasks in physical simulation, robotics, embodied AI, semantic reasoning, and interaction planning.

Garments can be simulated without template-specific retraining due to UV parameterization’s topology-agnostic nature. PhysXNet-3D asset annotation supports fine-grained part-level physical reasoning essential for robotic manipulation and simulation of realistic physical object behaviors.

6. Extension, Generalization, and Future Research

PhysXNet-XL expands coverage to >6M procedurally generated assets, increasing diversity and supporting generalization over unseen object categories. Directions mentioned for further research include integration with real-world scans, refinement of annotation (e.g., for scale normalization or affordance ranking), and broadening of modeled kinematic constraints to capture more complex dynamic object scenarios.

A plausible implication is that, by incorporating multimodal textual descriptions, future applications may enable joint semantic–physical reasoning, beneficial for virtual reality, content creation, robotics, and embodied AI. Additionally, extension of annotation granularity and accuracy is expected to further improve model understanding and simulation fidelity in physical interaction contexts.

7. Significance and Comparative Context

PhysXNet’s cloth simulation variant allows fully differentiable, sub-millisecond inference for dense garment geometry, bypassing traditional physics engine computational costs and expertise requirements. Its UV parameterization and conditional GAN framework mark a step toward integrated, simulation-aware learning architectures. PhysXNet’s 3D object asset variant establishes part-level physical annotation as foundational for physics-aware generation; its annotation methodology employing VLMs and human-expert validation sets a standard for scalable dataset creation. Comparative studies indicate PhysXNet provides results superior to prior geometry-only or skinning-based methods in both fidelity and physical plausibility, and its data-centric modeling paradigm is broadly applicable to simulation, robotics, and interactive environments.

PDF Markdown Chat (Pro)

References (2)

PhysXNet: A Customizable Approach for LearningCloth Dynamics on Dressed People (2021)

PhysX-3D: Physical-Grounded 3D Asset Generation (2025)

Whiteboard

Generate a whiteboard explanation of this topic.

Follow Topic

Get notified by email when new papers are published related to PhysXNet Dataset.