Task-Preserving Retargeting Across Embodiments

Updated 11 June 2026

The paper introduces unified latent space and graph-based approaches to map source motions onto diverse embodiments while preserving task-level semantics and dynamic validity.
It employs optimization and RL-guided mapping techniques alongside perceptual domain adaptation to reconcile morphological mismatches with physical constraints.
Empirical benchmarks and comparative studies demonstrate significant gains in transfer fidelity, interaction consistency, and robustness across heterogeneous kinematic structures.

Task-preserving retargeting across embodiments refers to mapping motion, control trajectories, or policies from one physical embodiment (an agent with a specific kinematic and dynamic structure—human, robot, simulated character, or animal) onto another, with the explicit goal of preserving task-level semantics and/or underlying intent. Central to this problem is the challenge of morphological mismatch—differences in numbers, types, and topology of joints, link proportions, dynamic constraints, sensor modalities, and actuation capabilities—which must be reconciled while maintaining both feasibility and the preservation of original high-level behaviors.

1. Formal Problem Definition and Core Challenges

Mathematically, task-preserving retargeting seeks a mapping $f$ from a source motion or policy, $X^{src}$ (e.g., a human motion sequence or control trajectory), onto a target embodiment's control trajectory $\hat Y^{tgt}$ such that both adherence to the high-level task specification and dynamic fidelity are preserved. Formally, given

$X^{src} = \{ x^{src}_t \}_{t=1}^T,$

and a morphological descriptor $M^{tgt}$ (e.g., link lengths, joint limits, topology) for the target, the objective is to learn $f$ such that

$\hat Y^{tgt} = f(X^{src}, M^{tgt})$

retains the intended goal, dynamic intent, and physically feasibility (Zhang et al., 12 Jan 2026).

This mapping is nontrivial due to:

Severe articulation and topology mismatch (heterogeneous degree-of-freedom (DoF) structures, non-correspondence of joints)
Kinematic constraints (joint ranges, reachable workspace)
Physical plausibility (dynamics, contacts, stability, collision-avoidance)
Maintaining task-level semantics (semantic equivalent of “reaching,” “grasping,” or “walking”) across disparate morphological capabilities.

2. Representative Methodological Paradigms

Several methodological categories have emerged for task-preserving retargeting, each reflecting different strategies for encoding, aligning, and adapting motion or control across embodiments.

2.1. Unified Latent/Intent Spaces

Frameworks such as AdaMorph (Zhang et al., 12 Jan 2026), “One-Policy-Fits-All” (OPFA) (Mu et al., 15 Mar 2026), and “Learning a Unified Latent Space” (Yan et al., 21 Jan 2026) define a shared latent or intent space $Z$ that abstracts away embodiment-specific details. The critical design is decoupling semantic intent from kinematic realization:

Encoding: $X^{src}$ is mapped to morphology-agnostic $Z$ (via Transformer, point cloud, or multi-part MLPs).
Decoding: $X^{src}$ 0 is conditioned or projected into embodiment-specific action or trajectory space via per-robot embeddings, adaptive normalization (e.g., AdaLN), or lightweight retargeting decoders.
Training: Typically supervised by reconstruction/objective losses on reference retargeted motions, augmented with physical/dynamic consistency losses.

This approach enables one model to handle diverse robot morphologies, facilitates zero-shot or few-shot transfer, and empirically preserves both task semantics and dynamic attributes (Zhang et al., 12 Jan 2026, Mu et al., 15 Mar 2026, Yan et al., 21 Jan 2026).

2.2. Graph-, Mesh- and Anchor-Based Correspondence

Methods such as G-DReaM (Cao et al., 27 May 2025), OmniRetarget (Yang et al., 30 Sep 2025), and spatially-adaptive anchor strategies (Choi et al., 19 May 2026) exploit explicit spatial or structural encodings:

Graph representations: The embodiment's kinematic tree or mesh is encoded as a graph, with nodes as joints and edges parameterized by physical features (axis, link length), and attention or diffusion models leverage this structure.
Anchors and interaction meshes: Surface-based or contact-proximate anchors (adaptively sampled) are dynamically mapped across bodies; interaction meshes encode agent, object, and environment relations, enabling robust preservation of spatial/interaction semantics even as proportions or topologies change.
Guidance losses: Energy-based, Laplacian deformation, and anchor proximity/direction losses enforce preservation of both global pose and local interaction relationships.

Such approaches excel in complex interaction retention, contact fidelity, and generalization to highly non-homeomorphic morphologies (Cao et al., 27 May 2025, Choi et al., 19 May 2026, Yang et al., 30 Sep 2025).

2.3. Optimization and RL-Guided Mapping

Optimization-based solvers (e.g., GMR (Araujo et al., 2 Oct 2025), OmniRetarget (Yang et al., 30 Sep 2025), CEI (Wu et al., 14 Jan 2026)) iteratively solve for robot poses that best satisfy a set of spatial, orientation, and physical constraints with respect to the source trajectory. RL-centric methods (e.g., ReActor (Müller et al., 7 May 2026), NMR (Zhao et al., 23 Mar 2026)) embed retargeting as an auxiliary objective in a bilevel optimization or expert-track-and-repair loop, with lower-level RL policies enforcing physical feasibility and high-level parameterizations ensuring task/loss preservation. These approaches are crucial for ensuring physically-valid, artifact-free motion and successful downstream policy learning (Müller et al., 7 May 2026, Zhao et al., 23 Mar 2026).

2.4. Mask and Perceptual Domain Adaptation

Data-driven techniques such as Shadow (Lepert et al., 2 Mar 2025) use composite segmentation masks to harmonize the appearance of source and target robots in visual data, aligning policy input distributions across embodiments and achieving robust zero-shot policy transfer in the image domain.

3. Task and Physical Preservation Objectives

Rigorous preservation of task semantics and physical validity is enforced through a suite of objectives:

Task intent and dynamic similarity: Metrics such as Pearson correlation of velocity profiles, orientation geodesic deviations, and trajectory positional errors are common as training losses and evaluation metrics (Zhang et al., 12 Jan 2026, Araujo et al., 2 Oct 2025).
Contact and interaction fidelity: Preserved via Laplacian deformation, hard constraint enforcement (e.g., stance-foot sticking, object-hand contacts), and explicit anchor/contact consistency (Yang et al., 30 Sep 2025, Choi et al., 19 May 2026).
Physical feasibility: Enforced via curriculum-based training (progressively increasing dynamic consistency terms), RL-based projection into feasible manifolds, and physics-aware losses (ground penetration, joint jumps, self-collision, etc.) (Zhang et al., 12 Jan 2026, Zhao et al., 23 Mar 2026, Müller et al., 7 May 2026, Huang et al., 2 Jun 2026).
Morphology invariance: Scale-normalized losses (e.g., end-effector consistency (Huang et al., 2 Jun 2026)), adaptive normalization, and per-embodiment embeddings ensure correspondence despite severe differences in structure, link proportion, and DoFs.

4. Comparative Performance and Empirical Insights

Empirical benchmarks demonstrate the importance of unified and physically grounded retargeting methods.

Method/Domain	Quantitative Highlights	Generalization and Fidelity
AdaMorph (Zhang et al., 12 Jan 2026)	PCC(root velocity) > 0.9; traj. dev. <5 cm; $X^{src}$ 1 rad	Strong zero-shot to out-of-distribution behaviors
OmniRetarget (Yang et al., 30 Sep 2025)	Penetration: 0; Contact-preserve: >0.96; RL success: ~82–100%	Manifold augmentation across morph., object, terrain
GMR (Araujo et al., 2 Oct 2025)	E_{g-mpbpe} mean: 104 mm (close to closed-source); RL succ: ≥99.5%	Outperforms other open-source retargeters
ReActor (Müller et al., 7 May 2026)	Zero penetration/self-collision; RL tracking SR: ≥95%	Handles human→quadruped large gap
Human2Humanoid (Huang et al., 2 Jun 2026)	SR: 88.5%, TE: 0.12, GP: 0.05 cm	Outperforms both optimization and learned baselines
OPFA (Mu et al., 15 Mar 2026)	Zero/few-shot real-world: 90–100% on 7 manipulation tasks	Retains downstream task semantics by design
NMR (Zhao et al., 23 Mar 2026)	Zero joint jumps/collisions; accelerates RL convergence	Smooths out kinematic noise, robust to motif variety
Shadow (Lepert et al., 2 Mar 2025)	94% zero-shot transfer in simulation (no target data)	Efficient: only single-robot demos

A key insight is that physically validated and interaction- or intent-preserving retargeting directly translates into improved downstream policy learning rates, robustness to hardware noise, and qualitative fidelity of execution (Zhang et al., 12 Jan 2026, Araujo et al., 2 Oct 2025, Zhao et al., 23 Mar 2026).

5. Advanced Topics and Extensions

Interaction and contact generalization: “Spatially adaptive” schemes dynamically reposition anchors or correspondences to reachable regions, addressing failures of static mapping when body proportions are exaggerated (Choi et al., 19 May 2026).
Latent goal-conditioned control: Some frameworks (e.g., (Yan et al., 21 Jan 2026)) operate entirely in decoupled latent spaces, supporting teleoperation and few-shot adaptation by modification of intent vectors.
Vision-language-action unification: Models like Qwen-VLA (Wang et al., 28 May 2026) unify retargeting, policy, and environment contextualization via multi-modal conditioning and textual prompts, supporting simultaneous generalization across tasks and morphologies.
Scaling and co-training: Embodiment-agnostic world models trained with both simulated and real human/robot data can generalize to unseen hardware by operating on intermediate particle or point-cloud spaces, leveraging the invariance of task-level dynamics rather than kinematic commands (He et al., 3 Nov 2025).

6. Limitations and Outstanding Challenges

Per-morphology data overhead: Some architectures (e.g., NMR (Zhao et al., 23 Mar 2026)) require expert RL repair pipelines per new robot, whereas others (AdaMorph (Zhang et al., 12 Jan 2026)) scale to arbitrary robots through unified structural conditioning.
Topology mismatches: Handling non-homeomorphic skeletons and missing/extra joints remains a challenge, though graph- and energy-based methods (G-DReaM (Cao et al., 27 May 2025), Human2Humanoid (Huang et al., 2 Jun 2026)) offer tractable solutions.
Interaction under strong underactuation: Preserving fine detailed contact tasks (e.g., in-cloth manipulation) with severe embodiment gaps requires further research into mesh- or object-centric retargeting constraints beyond joint space.
Objective trade-offs: Strong physical/dynamic preservation can sometimes marginally degrade geometric fidelity on edge cases, and vice versa—a fundamental trade-off in complex morphologies (Zhao et al., 23 Mar 2026, Zhang et al., 12 Jan 2026).

7. Future Directions

Likely developments include:

End-to-end unified task/interaction retargeters integrating perception, intent inference, and control policy.
Curriculum-driven and motif-aware training loops to handle long-horizon or multi-task motion retargeting.
Foundation models with prompt-based morphological and semantic adaptation, leveraging massive cross-domain pretraining (Wang et al., 28 May 2026).
Deeper integration of on-the-fly adaptation, real-time teleoperation, and unpaired data alignment for deployment at large scale.

In summary, task-preserving retargeting across embodiments has advanced from per-robot heuristic mapping to highly principled, data-driven methods that exploit structured latent spaces, graph-conditioned encodings, adaptive normalization, and physical interaction models. These innovations have led to robust, scalable, and generalizable frameworks that enable cross-morphology control and imitation while strictly preserving both high-level behavior and physical feasibility (Zhang et al., 12 Jan 2026, Yang et al., 30 Sep 2025, Cao et al., 27 May 2025, Araujo et al., 2 Oct 2025, Zhao et al., 23 Mar 2026, Huang et al., 2 Jun 2026, Wang et al., 28 May 2026).