Optimization-Based Motion Retargeting

Updated 3 February 2026

The paper presents an optimization-based framework that formulates motion retargeting as a constrained numerical problem balancing pose, task, and smoothness objectives.
It utilizes methods like SQP, gradient-based solvers, and hybrid IK with heuristics to effectively manage kinematic differences, contact fidelity, and semantic preservation.
It demonstrates high task success and contact preservation rates with real-time performance while highlighting challenges like initialization sensitivity and computational complexity.

An optimization-based motion retargeting framework is a computational architecture that transforms motion data from a source domain (e.g., a human hand or body, animal, or virtual character) to a target embodiment (robot, character, or different morphology) by explicitly solving constrained numerical optimization problems. Unlike template-based or direct mapping methods, these frameworks formulate the retargeting process as an optimization with various task, feasibility, and semantic-preservation objectives, leveraging the flexibility and rigor of constrained optimization to handle disparate kinematics, semantics, and physical requirements.

1. Mathematical Formulation and Problem Statement

Optimization-based retargeting frameworks are characterized by problem formulations where target trajectories are generated by minimizing an explicit objective under hard and soft constraints derived from kinematics, physics, semantics (semantic contacts and interactions), and system limits.

General Formulation:

Decision variables: Target configuration trajectory, often joint angles $\mathbf{q}_{1:T}$ , poses, or other kinematic variables, sometimes augmented with root transformations, interaction points, or force variables.
Objective function: Weighted sum of terms incorporating pose approximation ( $E_{pose}$ ), semantic/task objectives ( $E_{task}$ ), smoothness regularization, contact preservation, and sometimes dynamic feasibility or interaction energies.
Constraints: Joint/velocity/torque limits, contact, non-penetration, foot sticking, task-dependent equality/inequality relations.

For example, in hand retargeting for dexterous manipulation, the problem is

$\underset{p}{\text{minimize}}~E(x, y, \text{object}) = \omega_{pose}E_{pose}(x, y) + \omega_{task}E_{task}(y, \text{object})$

with $p$ as the 29-DOF model parameters, $x$ from the hand pose estimator, and $y = f_{sim}(p)$ the simulation output (Antotsiou et al., 2018).

In whole-body, interaction-preserving retargeting, the decision variables are robot configuration $q_t$ and the objective is

$E(q_t) = \sum_{i=1}^N \|L(p_{t,i}^{source}) - L(p_{t,i}^{target}(q_t))\|^2 + (q_t - q_{t-1})^TQ(q_t - q_{t-1})$

with Laplacian mesh deformation and temporal smoothness (Yang et al., 30 Sep 2025).

2. Core Optimization Methodologies

Optimization-based frameworks deploy a variety of numerical optimization algorithms tailored to the retargeting problem’s structure:

Sequential Quadratic Programming (SQP): Used for nonconvex, multi-constraint trajectory optimization, e.g., OmniRetarget’s per-frame stick/contact-preserving solve (Yang et al., 30 Sep 2025), multi-contact QP for humanoid/legged robots (Rouxel et al., 2022).
Hybrid Methods (IK + Swarm/Heuristics): Task-oriented hand retargeting leverages an IK initialization followed by local Particle Swarm Optimization (PSO) to escape poor local minima and achieve task contacts (Antotsiou et al., 2018).
Gradient-based Solvers: Real-time, contact-aware character morph retargeting uses batch gradient-based optimization (Adam, MMA), exploiting low-dimensional embeddings and sparse semantics; batch solutions facilitate temporal coherence (Cheynel et al., 28 Feb 2025, Lakshmipathy et al., 2024).
Encoder/Decoder Latent Optimization: Neural latent spaces are constructed for kinematic mapping; online inference alternates between fast encoder initialization and gradient descent or RL-based optimization in a latent manifold (Zhang et al., 2021, Kim et al., 2019).

Constraints are enforced either natively in the optimizer (QP/SQP) or via soft/penalty terms in unconstrained solvers.

3. Semantic, Contact, and Interaction Modeling

Preservation of high-level semantic properties distinguishes advanced frameworks:

Interaction/Contact Meshes: Interaction-preserving methods construct explicit meshes connecting agent, object, and environment, with Laplacian deformation minimization to ensure plausible contacts and task-appropriate transfer (Yang et al., 30 Sep 2025).
Semantic Feature Embeddings: Contact-aware frameworks extract descriptors for distances, directions, height, and penetration among keypoints or mesh elements, focusing the optimization on entries with contact proximity, governed by adaptive masks (Cheynel et al., 28 Feb 2025, Villegas et al., 2021).
Task/Energy-based Objective Terms: Grasping and manipulation objectives use spatial penalties that drive hand points into object contact, regularizing against pose estimator noise (Antotsiou et al., 2018); multi-contact frameworks enforce sequential force and kinematic feasibility (Rouxel et al., 2022).

Common principles include prioritizing contacts most relevant for the task (via adaptive weightings) and utilizing relative, rather than absolute, spatial relationships for transfer robustness.

4. Pipeline Architectures and Algorithmic Workflows

A typical pipeline includes source motion capture, low-dimensional embedding or descriptor extraction, mapping or correspondences, the optimization loop, and postprocessing:

Correspondence and Descriptor Initialization: Establishing source-target mappings via mesh landmarks, bone correspondences, or interaction charts; virtual marker schemes and atlas-based non-isometric mapping for shape-robustness (Lakshmipathy et al., 2024).
Optimization Loops: For each frame or trajectory, initialize (often with a naive copy or IK), optionally run in a low-dimensional or feature-embedded space, and iteratively solve for the best fit with contacts and semantic loss terms (Cheynel et al., 28 Feb 2025, Antotsiou et al., 2018).
Constraint Handling: Hard constraints (joint/contact limits) incorporated as equalities/inequalities, or via high-penalty weights; soft constraints (smoothness, regularization) via dedicated terms.
Post-hoc Refinement: Procedures for ensuring temporal continuity, post-solve acceleration filtering, and spline fitting for smooth playback (Lakshmipathy et al., 2024, Cheynel et al., 28 Feb 2025).
Multi-stage Solvers and Data Augmentation: Warm-starting frames, hierarchical updating, and strategy for domain randomization or data augmentation to support RL or system generalization (Yang et al., 30 Sep 2025).

5. Empirical Validation and Benchmarking

Experimental results validate these frameworks on several axes:

Task Success and Contact Preservation: For dexterous grasping, hybrid PSO+IK yields a lifting ratio of ~60% vs. ~10% for pure IK; trajectory-level success rates approach 80% with full task objective (Antotsiou et al., 2018).
Kinematic/Interaction Accuracy: OmniRetarget achieves near zero penetration and zero foot skating, maintaining ≥96% contact preservation. RL policies trained on such references reach 82–95% task success in downstream evaluation (Yang et al., 30 Sep 2025).
Motion Smoothness/Jerk: Sparse contact-weighted frameworks reach jerk close to the original motion, with ~35–50% lower self-penetration and ~70% reduction in floor penetration compared to prior methods (Cheynel et al., 28 Feb 2025).
Contact-Faithfulness (F1/AUC): Foot/ground contact F1 ≈ 0.925 and AUC ≈ 0.90, comparable to commercial inverse-kinematics tools (Cheynel et al., 28 Feb 2025).
Real-Time Performance: Batch optimization with Adam or custom QP solvers, leveraging sparsity and GPU/CPU parallelism, achieves 60+ fps or control cycle times of <1 ms per step (Cheynel et al., 28 Feb 2025, Rouxel et al., 2022).
Cross-morphology Robustness: Validated retargeting hand grasps to highly dissimilar hands, and whole-body motions to morphologically-divergent robots/characters (e.g., three-fingered, prosthetic, or alien hands; humanoid/animal robots) (Lakshmipathy et al., 2024, Cheynel et al., 28 Feb 2025).

6. Limitations, Practical Considerations, and Extensions

Despite robust performance, several limitations are endemic to the optimization-based approach:

Local Minima / Initialization Sensitivity: Methods based on local or swarm search may be misled by poor pose estimation or large embodiment gaps, necessitating careful choice of initializations and optimization hyperparameters (Antotsiou et al., 2018, Cheynel et al., 28 Feb 2025).
Contact and Constraint Modeling: Hard contacts are approximated by high weights or local smooth penalties; explicit dynamics, force closure, or compliance may be missing. Some meta-heuristics offer only local (not global) refinement (Antotsiou et al., 2018).
Computational Complexity: Full-batch optimizations or high-frequency online solvers require careful parallelism, feature reduction, or constraint exploitation for real-time operation (Rouxel et al., 2022, Yang et al., 30 Sep 2025).
Semantic Loss Tradeoffs: Naively tuned weights may cause finger or limb misorientation or over-regularization; adaptive strategies and interactive weighting sliders can resolve such conflicts but increase the complexity of deployment (Cheynel et al., 28 Feb 2025).
Extensibility: Most frameworks are readily extended to new morphologies by user-specified correspondences (landmarks, marker sets, or axial curves), and contact descriptors can be upgraded for dynamic scenes, multi-character interactions, or complex terrains (Lakshmipathy et al., 2024, Cheynel et al., 28 Feb 2025).

Optimization-based retargeting architectures have become the foundation for robust, artifact-free transfer of complex motion and interaction patterns, enabling not only realistic humanoid and hand teleoperation but also general-purpose data generation for sample-efficient robot learning and digital animation. Their continuing evolution is oriented toward tighter integration of physical simulation, perceptual metrics, and adaptive, task-driven semantics.