Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 180 tok/s
Gemini 2.5 Pro 55 tok/s Pro
GPT-5 Medium 34 tok/s Pro
GPT-5 High 37 tok/s Pro
GPT-4o 95 tok/s Pro
Kimi K2 205 tok/s Pro
GPT OSS 120B 433 tok/s Pro
Claude Sonnet 4.5 38 tok/s Pro
2000 character limit reached

Embodiment-Aware Diffusion Policy (EADP)

Updated 8 October 2025
  • The paper introduces EADP, which integrates low-level controller feedback into high-level diffusion sampling to generate trajectories that meet strict dynamic and kinematic constraints.
  • It employs a dual-level system combining human demonstration-trained diffusion policies with embodiment-specific controllers like IK and MPC for real-world task adaptation.
  • Experimental results show improved success rates and robustness in manipulation tasks, particularly under disturbance, across simulated and real robotic platforms.

Embodiment-Aware Diffusion Policy (EADP) is a computational framework for adapting high-level visuomotor policies—learned from embodiment-agnostic human demonstration data—for deployment on robotic embodiments with strict dynamic and kinematic constraints, such as aerial manipulators. The EADP approach enables plug-and-play adaptation by integrating feedback from embodiment-specific low-level controllers directly into the diffusion-based policy sampling loop at inference time. This operation steers trajectory generation towards modes that are dynamically feasible given the robot’s unique hardware, thereby improving execution robustness, efficiency, and success in manipulation tasks, particularly under disturbance and in previously unseen real-world environments (Gupta et al., 2 Oct 2025).

1. Framework Architecture and Key Design

EADP is structured as a two-level system:

  • High-Level Diffusion Policy: Trained on diverse, unconstrained human demonstrations using the Universal Manipulation Interface (UMI), the policy outputs manipulation trajectories as sequences of end-effector (EE) waypoints. These trajectories represent generic, task-level intentions with no inherent knowledge of target embodiment limitations.
  • Low-Level Embodiment-Specific Controller: Implements either inverse kinematics (IK) with constraints (e.g., velocity, joint limits) or a full model predictive controller (MPC), specialized for the target robot. The controller evaluates the high-level trajectory for feasibility, quantifying the tracking cost incurred when attempting to execute the trajectory on real hardware.

The central technical innovation lies in a feedback loop where, at each step of the diffusion denoising process, the gradient of the embodiment-specific controller’s tracking cost is computed with respect to the current trajectory sample and used to guide the diffusion sampling:

a~k=akλωkakLtrak(ak)\tilde{a}^k = a^k - \lambda \, \overline{\omega}_k \nabla_{a^k} L_{\text{trak}}(a^k)

Here, aka^k is the noisy trajectory at denoising step kk, λ\lambda the guidance scale, and ωk\overline{\omega}_k a schedule parameter often matched to the cumulative noise schedule. The guided trajectory a~k\tilde{a}^k is then processed via the denoising network, becoming:

a(k1)=a~k+ψk(πθ(a~k,ko))a^{(k-1)} = \tilde{a}^k + \psi_k(\pi_\theta(\tilde{a}^k, k | o))

This mechanism transforms otherwise embodiment-agnostic trajectory proposals into dynamically and kinematically feasible trajectories for the robot at test time, without retraining or additional data collection.

2. Technical Implementation and Tracking Cost Integration

EADP relies fundamentally on the calculation of a tracking cost LtrakL_{\text{trak}}, which varies with controller design:

  • IK Controller:

Ltrak(a)=t=1HfFK(qt)at2L_{\text{trak}}(a) = \sum_{t=1}^H \| f_{FK}(q_t) - a_t \|^2

Here, fFKf_{FK} is the robot forward kinematics mapping from joint configuration qtq_t to EE pose, and ata_t is the target EE waypoint.

  • MPC Controller:

Ltrak(a)=t=1H(ep,tTQpep,t+eR,tTQReR,t)L_{\text{trak}}(a) = \sum_{t=1}^H \left( e_{p,t}^T Q_p e_{p,t} + e_{R,t}^T Q_R e_{R,t} \right)

With errors ep,te_{p,t} (position) and eR,te_{R,t} (orientation), and positive semi-definite weights QpQ_p, QRQ_R.

At each denoising iteration, differentiation of LtrakL_{\text{trak}} yields gradients that indicate directions in trajectory space associated with improved trackability. By “nudging” the policy sample along these gradients (scaled via λ\lambda and ωk\overline{\omega}_k), EADP dynamically pulls trajectory generation into regimes where controller constraints (such as stability, velocity, and aerodynamic limitations) are respected.

The procedure is agnostic to the particular tracking controller and applies equally to analytical or learned controllers, provided gradients can be computed (or approximated).

3. Addressing the Embodiment Gap: Challenges and Solutions

Transferring UMI-derived policies to robots with strict or distinctive embodiment constraints presents key difficulties:

  • Action Space Mismatch: Human demonstrations via UMI assume unconstrained motion, leading to trajectory proposals that may be out-of-distribution for constrained robots, for example, aerial manipulators with limited EE workspace or dynamic constraints.
  • Feasibility and Safety: Unconstrained trajectory execution may result in poor controller performance, actuation saturation, or unsafe behaviors (e.g., instability during aerial manipulation).

EADP operationally mitigates these issues by incorporating low-level controller feedback at inference, so trajectories are iteratively refined to conform to embodiment-specific constraints. This plug-and-play guidance enables adaptation for previously unseen robots without retraining or extensive embodiment-specific data, facilitating practical deployment scalability.

4. Experimental Validation and Empirical Performance

Experiments on both simulated and real-world robotic manipulation tasks showcase significant improvements:

  • Simulation: Compared to unguided diffusion policies (DP), EADP closes the embodiment gap in challenging tasks—such as open-and-retrieve, peg-in-hole, rotate-valve, and pick-and-place—across both Oracle (ideal controller), fixed-base (UR10e), and aerial manipulator (UAM) embodiments. In the UAM case, EADP provides a mean success rate improvement by over 9% without disturbances and by more than 20% under disturbance conditions.
  • Real-World Deployment: On tasks such as peg-in-hole insertion, lemon harvesting, and long-horizon lightbulb installation, EADP also outperforms unguided baselines, demonstrating increased robustness and efficiency, especially in the presence of controller saturation, dynamic constraints, and disturbances.

Performance remains strong when EADP adapts policies using UMI demonstrations collected “in the wild,” illustrating scalable, data-efficient transfer.

5. Practical Implications and Limitations

EADP furnishes a pathway to scale general manipulation skills across heterogeneous embodiments, including those with demanding dynamic constraints. Benefits include:

  • Robust adaptation to embodiment-specific physical feasibility at test time,
  • Safe and efficient task execution by steering away from infeasible trajectories,
  • Data-efficient transfer by eliminating the need for extensive retraining or new demonstrations.

A current limitation is the lower inference rate of EADP (1–2 Hz) relative to robot control frequencies (up to 50 Hz). This temporal gap has practical consequences for real-time deployment, suggesting that further development (e.g., streaming diffusion, adaptive guidance) may be necessary.

6. Future Directions

Potential research directions for EADP include:

  • Streaming and Continuous Guidance: Closed-loop, continuous incorporation of controller gradients to address temporal mismatches and improve real-time adaptability.
  • Integration with Learned Controllers: Extension to RL-based or neural controllers that model dynamics or kinematics, expanding EADP’s range across diverse robot types.
  • Adaptive Guidance Scaling: Refinement of schedule and scaling mechanisms for trajectory guidance, balancing optimality, constraint satisfaction, and task performance.
  • Complex Multi-Modal Manipulation: Application of EADP in domains that combine manipulation with navigation, require coordination across multiple agents, or operate in unstructured, cluttered environments.

The demonstrated empirical robustness and data-efficient adaptation position EADP as a leading approach for embodiment-aware trajectory generation and policy deployment in diverse and highly constrained robotic platforms (Gupta et al., 2 Oct 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)
Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Embodiment-Aware Diffusion Policy (EADP).