Physics-Driven Neural Retargeting

Updated 6 March 2026

Physics-driven neural retargeting is a framework that maps demonstration data to target embodiments by blending semantic insights with strict physical constraints.
It leverages neural networks and reinforcement learning to refine retargeted motions, using techniques like inverse kinematics and physics-based losses for realism.
The approach underpins applications such as avatar animation, robotic control, and garment simulation, achieving metrics like 2–3 cm landmark errors and up to 88% success rates.

Physics-driven neural retargeting refers to a class of methods that map data describing a source behavior—such as human motion, garment pose, or multi-agent interactions—to distinct target embodiments or morphologies, while enforcing adherence to physical constraints and dynamical feasibility using neural networks and/or reinforcement learning. This approach is foundational to modern avatar animation, dexterous robot control, clothes fitting, and simulation of multi-entity scenes, as it combines global task semantics (often from demonstration data) with dynamic feasibility, physical realism, and cross-morphological generalization enforced by differentiable or sampled physics simulation and neural policy architectures.

1. Formulation and Key Principles

Physics-driven neural retargeting is defined by the requirement that the retargeted outputs—be they robot trajectories, body poses, character joint angles, or mesh deformations—are both semantically faithful to input demonstrations and physically plausible under the constraints of dynamics, contact, and environmental interaction.

The general workflow consists of:

Extracting task-relevant semantic information from demonstrations (kinematic motion, landmarks, deformation patterns, contact events);
Mapping these to the target embodiment or morphology through an initial kinematic or geometric fitting (e.g., inverse kinematics, rigid alignment, or sparse keypoint correspondences);
Refining or generating target behaviors via neural architectures, reinforcement learning, or sampling-based control, while enforcing physical realism using contact-aware physics simulation, energy minimization, or robust policy training;
Optionally enforcing further physical plausibility via self-supervision or physics-driven losses on real or unlabeled data.

This paradigm is now found in garment fitting from images (Yoon et al., 2021), motion retargeting to avatars and robots (Pan et al., 12 Nov 2025, Reda et al., 2023), and simulation of complex multi-agent or multi-object scenes (Zhang et al., 2023).

2. Mathematical Models and Physics-Based Constraints

Physics-driven retargeting incorporates mathematical models of dynamics, contact, and deformation throughout the learning and inference process:

Cloth Deformation: Garment mesh deformation is solved as minimization of an energy function

$E(\Delta M; S) = E_c(\Delta M; S) + \lambda_r E_r(\Delta M) + \lambda_s E_s(\Delta M)$

where $E_c$ is a contact-driven term enforcing body–cloth proximity, $E_r$ enforces local rigidity (corotational), and $E_s$ is a Laplacian smoothness term. Solutions are obtained via implicit contact-dynamics solvers (Yoon et al., 2021).

Rigid-body Dynamics and Contacts: For retargeting hand or humanoid control, trajectory optimization is performed in simulation under the discrete-time update rule:

$x_{t+1} = f(x_t, u_t)$

and the underlying continuous dynamics:

$M(q)\ddot{q} + C(q, \dot{q})\dot{q} + g(q) = \tau + J^T(q) \lambda$

with complementarity constraints $0 \leq \lambda_i \perp \phi_i(q) \geq 0$ for contact enforcement (Pan et al., 12 Nov 2025).

Graph-Based Rewards for Multi-Agent Interactions: Multi-character interactions are captured by constructing an Interaction Graph over semantic markers, and edge errors (both position and velocity) are penalized via normalized, weighted distances, producing physically and semantically robust imitation rewards (Zhang et al., 2023).
Physical Consistency in Learning: Self-supervised losses impose body–cloth contact and silhouette matching in clothes retargeting, preventing physically implausible deformations even when ground-truth data is unavailable (Yoon et al., 2021).

3. Core Architectures and Learning Frameworks

Several architectures instantiate physics-driven neural retargeting, tailored to the particular domain:

Domain	Architecture Example	Simulation/Physics Layer
Cloth retargeting	CRNet (feature encoder, decoder)	Physics-based synthetic data; contact–silhouette self-supervision
Dexterous hands/humanoids	Parallel control sampling (SPIDER)	Embodied physics engine (MuJoCo/IsaacGym) with curriculum virtual contacts
Sparse-input avatars	PPO-based NN polices (asymmetric actor–critic)	Physics-based simulator (PhysX/Isaac Gym)
Multi-character scenes	Encoder-decoder RL with IG-based reward	Rigid-body simulation, PD servo actuation, Delaunay-interaction graph

Convolutional and VGG-like architectures are effective for image-to-mesh deformation inference in clothes retargeting (Yoon et al., 2021).
Large-scale parallel sampling with annealed noise and curriculum virtual springs enables sample-efficient, contact-faithful robotic retargeting (Pan et al., 12 Nov 2025).
Actor–critic neural policies trained in simulation track sparse inputs, filling in unobserved degrees of freedom through proprioceptive and physics priors (Reda et al., 2023).
Encoder–decoder RL policies learn to coordinate interacting characters or agents, exploiting IG-based rewards for morphology-invariant interaction fidelity (Zhang et al., 2023).

4. Physical Supervision and Self-Consistency

Physical supervision is enforced both during synthetic data generation and through loss functions and rewards:

Synthetic Data with Ground-Truth Physics: Simulation yields paired data $(s_i, \Delta M_i, C_i)$ , with exact deformation/pose/contact for each synthetic body configuration (Yoon et al., 2021, Pan et al., 12 Nov 2025).
Semi-Supervised Physical-Consistency Loss: On real images without ground-truth 3D mesh, body–cloth contact loss and silhouette Chamfer loss enforce physically plausible garment shapes (Yoon et al., 2021).
Contact and Imitation Rewards: Motion controllers incur rewards for foot–contact synchronization, pose/velocity imitation, and action smoothness/energy, which ensure physics-consistent avatar motion even from sparse input (Reda et al., 2023).
Interaction-graph Rewards: Multi-agent interactions are performed by maximizing alignment of key semantic landmarks and their velocities, rather than simply joint angles, thereby ensuring meaningful physical interactions even under large morphological variation (Zhang et al., 2023).

Physical self-consistency is critical for producing results that are robust to domain gaps and generalize well to real-world deployment, as shown by empirical sim-to-real transfer (Pan et al., 12 Nov 2025).

5. Empirical Performance and Generalization

Physics-driven neural retargeting methods demonstrate strong empirical performance across several axes:

Contact accuracy and balance: Interaction-graph rewards yield landmark errors of $2$–$3$ cm (vs. $8$–$10$ cm for joint rewards) and >95% grasp success rate in multi-agent retargeting (Zhang et al., 2023).
Morphology-robust tracking: Avatars with significantly different skeletons (mouse, dinosaur) maintain head/foot alignment and contacts despite only having sparse sensor input (Reda et al., 2023).
Retargeting success rates: In dexterous robot experiments, SPIDER achieves up to 88% success in GigaHands datasets, outperforming kinematic and conventional sampling methods by 18% and at 10 $\times$ the speed of RL (Pan et al., 12 Nov 2025).
Domain transfer: CRNet’s garment retargeting generalizes to real-world photos, with online refinement further improving physical realism (Yoon et al., 2021).
Robustness: Controllers remain stable under sensor sparsity, domain randomization, and adversarial perturbations (Reda et al., 2023).

6. Extensions, Limitations, and Future Directions

Physics-driven neural retargeting is extensible to a wide spectrum of domains but has recognized boundaries:

Scalability: Methods like SPIDER leverage parallel GPU-based rollouts for scalability, enabling large-scale dataset synthesis for downstream neural policy training (Pan et al., 12 Nov 2025).
Residual and Hybrid Architectures: Residual policies refine physics-based open-loop retargeting with neural feedback, improving online robustness (Pan et al., 12 Nov 2025).
Generalization: Morphological randomization and normalization (by T-pose, interaction graph) support transfer to previously unseen skeletons, but extreme deviations (e.g., $>3\times$ scale change, radical topology) may degrade performance (Zhang et al., 2023).
Reward and Loss Design: Removal of physically motivated reward terms leads to loss of tracking fidelity or physically implausible artifacts (drift, foot-slip, unnatural body postures) (Reda et al., 2023, Zhang et al., 2023).
Specialization: Policies tend to be clip- and morphology-specific; generalizing across highly distinct interactions or inventing new behaviors remains challenging (Zhang et al., 2023).
Future Architectures: Research directions include diffusion-based trajectory optimizers conditioned on physically feasible seeds, graph-neural network contact predictors, and meta-learned sampling strategies that tune exploration/exploitation under physics constraints (Pan et al., 12 Nov 2025).

Physics-driven neural retargeting now forms the backbone of generalizable, robust motion transfer in human–robot interaction, character animation, and physically plausible digital twins, offering a convergent solution grounded in neural representation and physical law.