Motion Retargeting Techniques
- Motion retargeting is the process of transferring motion from a source to a target with differing kinematics, preserving semantics, style, and dynamics.
- Techniques range from classical kinematics and geometry-aware methods to deep learning frameworks, enabling realistic animations and effective robotics control.
- Recent approaches integrate physics-based dynamics and reinforcement learning to ensure contact fidelity and dynamic plausibility in diverse applications.
Motion retargeting is the process of transferring movement from a source character or agent to a target character or system with different kinematics, morphology, or constraints, while preserving the relevant characteristics of the motion such as semantics, style, contacts, and dynamics. Sophisticated motion retargeting techniques are crucial in computer animation, character-driven games, robotics, VR/AR teleoperation, imitation learning, and biomechanics. The technical landscape of motion retargeting spans classical kinematics, optimal control, inverse kinematics, deep learning, non-rigid shape correspondence, latent-space translation, and contact- or geometry-aware formulations. This article provides a comprehensive overview of state-of-the-art motion retargeting techniques, structured across methodology, representation, contact/geometry handling, learning frameworks, evaluation, and practical implications.
1. Kinematic and Frame-Based Retargeting Paradigms
Fundamental to classical motion retargeting is the kinematic mapping of joint-space configurations or end-effector trajectories from source to target systems, accounting for differences in link lengths, degrees of freedom (DOF), or topology.
- Separation of Translation and Rotation: Advanced strategies for manipulation and teleoperation, such as in "Disentangling Coordinate Frames for Task Specific Motion Retargeting in Teleoperation using Shared Control and VR Controllers," employ two independent coordinate trees—one for the input device, one for the manipulator. By independently calibrating and retargeting translation and rotation displacements via homogeneous transforms, these techniques allow flexible switching between relative and absolute reference frames and mitigate the operator's cognitive load in aligning device and end-effector frames (Grobbel et al., 19 May 2025). Real-time optimal control planners further smooth and constrain the trajectories by minimizing tracking and smoothness costs over joint space.
- Inverse Kinematics with Contact Constraints: Kinematic retargeting often incorporates inverse kinematics (IK) solvers subject to joint limits and, increasingly, contact or penetration constraints. For instance, in hand-object manipulation, dense contact regions are transferred between dissimilar hand meshes using non-isometric shape matching atlases, and the target hand configuration is solved to match both marker and contact correspondences in the object frame leveraging energy minimization under geometric and anatomical constraints (Lakshmipathy et al., 7 Feb 2024).
- Leader-Follower and Graph-Based Optimization: In manipulation and robotics, temporal coordination between multiple slave DOFs is retained via leader-follower schemes (e.g., for dual-arm sign language, where wrists and elbows are parameterized with Dynamic Movement Primitives—DMPs—and optimized over feasible robot trajectories). Graph-based optimization frameworks, such as in (Liang et al., 2020), enable rigorous handling of biomechanical and joint feasibility during smoothing and retargeting.
These paradigms emphasize explicit geometric and frame correspondences and excel in contexts where low-level kinematics and control are paramount, but are limited by their reliance on explicit mappings and susceptibility to semantic drift and unrealistic contacts in the presence of large morphological disparities.
2. Geometry- and Contact-Aware Motion Retargeting
Modern retargeting methods increasingly go beyond joint-space matching, addressing skinned body or mesh deformations and the preservation of contact events.
- Dense Geometric Interaction Modeling: MeshRet (Ye et al., 28 Oct 2024) introduces Semantically Consistent Sensors (SCS) to establish dense correspondence fields across arbitrary mesh topologies, and Dense Mesh Interaction (DMI) fields to encode all body–body contacts and spatial relationships. By directly aligning DMI fields during optimization, MeshRet enforces not just joint accuracy, but contact fidelity and strong interpenetration avoidance, outperforming skeleton-only and sequential geometry-correction baselines in joint error, contact mismatch, and penetration rate.
- Key-Vertex and Contact Descriptor Embeddings: ReConForM (Cheynel et al., 28 Feb 2025) abstracts mesh shape and pose using a sparse set of key vertices transferred across morphologies using optimal transport; time-varying descriptors such as inter-vertex distances, penetrations, and velocities encode contact and floor semantics. An adaptive masking mechanism selects only relevant contact constraints frame-by-frame, enabling real-time optimization without manually pre-specified correspondences. Evaluation on challenging animation datasets demonstrates superior preservation of foot-ground and self-contacts, minimal penetration, and smooth trajectories compared to both classical and deep learning baselines.
- Contact Pair Extraction and Geometry-Aware RNNs: Direct detection and transfer of self-contact vertex pairs, along with surface-aware RNNs and latent space optimization, allow for skeleton-agnostic contact preservation and minimal interpenetration (Villegas et al., 2021). These methods are especially effective for high-dimensional character animation, where surface detail cannot be neglected.
This class of techniques is characterized by their direct modeling of surface and volumetric constraints, enabling plausible retargeting even for highly contact-rich or morphologically diverse settings.
3. Learning-Based and Latent-Space Retargeting Frameworks
Deep learning has led to the emergence of data-driven motion retargeting, where learned representations replace hand-designed correspondence maps, and embedding methods facilitate transfer across domains, morphologies, or styles.
- Latent Alignment and Shared Embedding Spaces: Methods such as S3LE construct shared latent spaces between human poses and robot configurations, learning projection-invariant mappings via paired/unpaired data bootstrapping and contrastive learning (Choi et al., 2021). Nonparametric nearest-neighbor regression in latent space ensures that retargeted poses are both expressive and satisfy safety constraints (collision-free, within limits).
- Topology-Agnostic Transformers and Variable Morphology: HuMoT leverages conditioned transformers autoencoding motion sequences with explicit skeleton templates, supporting variable and even previously unseen topologies. By stochastically subsampling joints and enforcing bone-length invariance, the learned latent code generalizes to cross-topology retargeting, joint upsampling, and denoising, without requiring hand-crafted mappings or retraining for new skeletons (Mourot et al., 2023).
- Flow-Matching in Latent Space: MoReFlow (Kim et al., 29 Sep 2025) uses VQ-VAE tokenization of source and target motions, followed by unsupervised Schrödinger bridge flow-matching between embedding spaces. Conditional coupling enables reversible, controllable retargeting across arbitrary character morphologies, supporting domain-specific objectives such as style fidelity or task-space accuracy.
- Canonicalization and Disentanglement: MoCaNet and TransMoMo (Zhu et al., 2021, Yang et al., 2020) disentangle skeleton motion into motion, structure, and view-angle components, enabling unsupervised or weakly supervised lifting from 2D videos (even in-the-wild) to canonicalized 3D representations, which can then be recombined and decoded for retargeting.
Latent space methods enable flexible, generalizable motion retargeting, support unpaired or synthetic data, and provide modularity for disentangling semantic, stylistic, and morphological factors. However, they typically require carefully designed loss functions and regularization to avoid degenerate solutions or loss of fidelity in extreme morphology differences.
4. Physics-Based and Dynamically Feasible Retargeting
Ensuring that retargeted motions are not just kinematically plausible but also dynamically feasible is critical for applications in robotics and physically simulated avatars.
- Physics-Based RL Controllers from Sparse Inputs: Learning-based approaches have demonstrated the direct retargeting from sparsely sensed (e.g., VR headset/controllers) human motion to animal, human, or fantastical morphologies via reinforcement learning. Kinematic “rough” retargets provide imitation targets for a policy that controls the physics simulator, with reward functions enforcing contact, pose, and effort (Reda et al., 2023). Policies trained in this manner generalize across characters and unseen users and can produce plausible full-body animation even with minimal sensory input.
- Kinodynamic Retargeting for Humanoid Imitation: IKMR (Chen et al., 18 Sep 2025) integrates topology-aware autoencoder mappings (for kinematic fidelity) with imitation learning-based refinement (for dynamic feasibility) to produce robust, scalable retargeting in robotic systems. The network is first pretrained for kinematic mapping, then further fine-tuned as a downstream RL policy tracks the retargeted trajectory in simulation, ensuring physical plausibility.
- Legged Robot Motion Retargeting: For quadrupeds or other non-humanoid morphologies, pipelines such as STMR (Yoon et al., 17 Apr 2024) decompose retargeting into spatial (kinematic whole-body match with foot anchoring) and temporal (dynamic time-warping under optimal control) modules, with integrated imitation-RL to fine-tune control for real robot deployment. This approach preserves key contract phases and ground contacts across species, supporting robust real-world transfer.
Physics-based approaches can accommodate actuator limits, complex contacts, and dynamic balance, and have demonstrated effectiveness for real-time, real-user, and in-the-wild driving of both avatars and robots.
5. Evaluation Metrics, Comparative Analysis, and Practical Implications
Rigorous quantitative and qualitative evaluation is essential for benchmarking motion retargeting systems, and multiple standardized metrics have emerged:
| Metric | Measurement Domain | Reported Usage |
|---|---|---|
| Mean Per-Joint Position Error (MPJPE) | Skeletal space | Normalized by height, Mixamo/AMASS/ScanRet |
| Contact error | Vertex pairs/distances | Dense mesh/vertex, hand-object/body, key vertices |
| Interpenetration rate | Mesh or limb volume | Percentage of arm/hand vertices in body (Ye et al., 28 Oct 2024Cheynel et al., 28 Feb 2025) |
| Penetration/Floor violation | Mesh | Volume below ground, negative SDF |
| Temporal smoothness (Curvature/Jerk) | Trajectory curvature | Average joint acceleration, jerk (Cheynel et al., 28 Feb 2025Yang et al., 9 Apr 2025) |
| Foot sliding/Grounding F1 | Kinematic contacts | Framewise velocity/position criteria |
| Perceptual/Preference paper | Human raters | User paper, A/B, expert/non-expert (Cheynel et al., 28 Feb 2025Villegas et al., 2021) |
Recent works have demonstrated that direct geometry- and contact-aware supervision yields lower interpenetration, contact error, and higher subjective quality than skeleton-only or sequential geometry-repair pipelines (Ye et al., 28 Oct 2024Cheynel et al., 28 Feb 2025Villegas et al., 2021). Notably, adaptive descriptor weighting and dense mesh interaction modeling have supplanted hand-crafted or post hoc corrections for contact preservation.
Physics-based and RL approaches are essential for real-world robotics and avatars, where dynamic feasibility is non-negotiable (Chen et al., 18 Sep 2025Reda et al., 2023Yoon et al., 17 Apr 2024). Topology-agnostic (transformer-based) and unsupervised learning methods are especially valuable for animation and content pipeline generalization, supporting cross-morphology and unseen skeletons without retraining (Mourot et al., 2023Kim et al., 29 Sep 2025). Contact- and geometry-aware pipelines are increasingly essential for applications demanding high realism or complex environmental interactions.
6. Limitations and Future Research Directions
Current motion retargeting systems have identifiable limitations. Contact- and geometry-aware models cannot handle highly noisy or non-canonical source data without robust pre-processing (Ye et al., 28 Oct 2024). Most pipelines require a consistent one-to-one mapping or reference template for mesh correspondence; adaptation to topologically novel skeletons (additional or missing limbs) remains non-trivial (Cheynel et al., 28 Feb 2025Ye et al., 28 Oct 2024). While some transformer and latent space methods support variable topology, they may underperform in fine-grain morphological detail or in situations where contact semantics are critical (Mourot et al., 2023). Physics-based RL solutions, while robust to dynamic constraints, are sample-inefficient and require extensive simulation and reward shaping.
Emergent research directions include:
- User studies and systematic task completion evaluations across teleoperation and collaborative settings (Grobbel et al., 19 May 2025),
- Unified many-to-many flow networks for large-scale character sets and on-the-fly online retargeting (Kim et al., 29 Sep 2025),
- Integration of haptic or force feedback, especially for telemanipulation and shared-control modalities,
- Incorporation of dynamic descriptors and joint multi-modal contact modeling for environment-aware retargeting,
- Integration of physics-based simulation with geometry- and contact-based losses for end-to-end differentiable dynamic retargeting pipelines,
- Robustification of mesh and key-vertex extraction on noisy, occluded, or heavily clothed inputs (Ye et al., 28 Oct 2024).
These innovations are expected to further narrow the gap between direct human performance, real-world robotic/animated realization, and high-fidelity, real-time retargeting.
Key Citations:
- (Grobbel et al., 19 May 2025) Disentangling Coordinate Frames for Task Specific Motion Retargeting in Teleoperation using Shared Control and VR Controllers
- (Lakshmipathy et al., 7 Feb 2024) Kinematic Motion Retargeting for Contact-Rich Anthropomorphic Manipulations
- (Ye et al., 28 Oct 2024) Skinned Motion Retargeting with Dense Geometric Interaction Perception
- (Cheynel et al., 28 Feb 2025) ReConForM : Real-time Contact-aware Motion Retargeting for more Diverse Character Morphologies
- (Ayhan et al., 2023) Implicit Kinodynamic Motion Retargeting for Human-to-humanoid Imitation Learning
- (Reda et al., 2023) Physics-based Motion Retargeting from Sparse Inputs
- (Kim et al., 29 Sep 2025) MoReFlow: Motion Retargeting Learning through Unsupervised Flow Matching
- (Mourot et al., 2023) HuMoT: Human Motion Representation using Topology-Agnostic Transformers for Character Animation Retargeting
- (Yang et al., 9 Apr 2025) STaR: Seamless Spatial-Temporal Aware Motion Retargeting with Penetration and Consistency Constraints
- (Villegas et al., 2021) Contact-Aware Retargeting of Skinned Motion
- (Choi et al., 2021) Self-Supervised Motion Retargeting with Safety Guarantee