Hierarchical Diffusion Policy: manipulation trajectory generation via contact guidance (2411.12982v1)

Published 20 Nov 2024 in cs.RO

Abstract: Decision-making in robotics using denoising diffusion processes has increasingly become a hot research topic, but end-to-end policies perform poorly in tasks with rich contact and have limited controllability. This paper proposes Hierarchical Diffusion Policy (HDP), a new imitation learning method of using objective contacts to guide the generation of robot trajectories. The policy is divided into two layers: the high-level policy predicts the contact for the robot's next object manipulation based on 3D information, while the low-level policy predicts the action sequence toward the high-level contact based on the latent variables of observation and contact. We represent both level policies as conditional denoising diffusion processes, and combine behavioral cloning and Q-learning to optimize the low level policy for accurately guiding actions towards contact. We benchmark Hierarchical Diffusion Policy across 6 different tasks and find that it significantly outperforms the existing state of-the-art imitation learning method Diffusion Policy with an average improvement of 20.8%. We find that contact guidance yields significant improvements, including superior performance, greater interpretability, and stronger controllability, especially on contact-rich tasks. To further unlock the potential of HDP, this paper proposes a set of key technical contributions including snapshot gradient optimization, 3D conditioning, and prompt guidance, which improve the policy's optimization efficiency, spatial awareness, and controllability respectively. Finally, real world experiments verify that HDP can handle both rigid and deformable objects.

Summary

The paper introduces Hierarchical Diffusion Policy (HDP), a novel imitation learning method that uses a two-layer diffusion model (Guider and Actor) to improve performance and controllability, particularly in contact-rich robot manipulation tasks.
HDP features technical innovations like snapshot gradient optimization, 3D conditioning, and prompt guidance, leading to superior interpretability and allowing human intervention via specified contact objectives.
Evaluations demonstrate HDP significantly outperforms Diffusion Policy by an average of 20.8% across various simulated and real-world tasks, showcasing its robustness with different objects and operations.

This paper introduces Hierarchical Diffusion Policy (HDP), a novel imitation learning method for robot manipulation trajectory generation, especially in contact-rich tasks. HDP addresses the limitations of end-to-end policies, like Diffusion Policy, which struggle with tasks involving rich contacts and have limited controllability.

HDP employs a hierarchical approach, dividing the policy into two layers: a high-level policy (Guider) and a low-level policy (Actor). The Guider predicts the contact position for the robot's next object manipulation based on 3D information, while the Actor predicts the action sequence required to reach the predicted contact, using observations and contact information as input. Both policies are modeled as conditional denoising diffusion processes. The Actor is optimized using a combination of behavioral cloning and Q-learning, enabling it to accurately guide actions towards the desired contact while learning the conditional probability distribution of actions.

The paper highlights three key advantages of HDP: superior performance due to breaking down the task, greater interpretability because of the hierarchical structure, and stronger controllability, allowing human intervention via specifying objective contact positions.

To further improve HDP, the paper presents three technical contributions:

Snapshot gradient optimization: A faster training method where the gradient is calculated at only one timestep in the diffusion process per iteration, in contrast to prior work's less efficient method of calculating the gradient at all timesteps.
3D conditioning: Improves the policy's perception by using 3D point clouds to represent the state. The point cloud representation is extracted once and used throughout the denoising iterations.
Prompt guidance: Allows manual specification of objective contacts to generate customized operation trajectories, improving human-machine interaction.

The HDP was evaluated across six different tasks and showed significant performance improvements over Diffusion Policy, with an average improvement of 20.8%. Real-world experiments demonstrate that HDP can handle both rigid and deformable objects. The authors also provide detailed analysis of the algorithm's characteristics and design decisions.

The evaluation includes comparisons with LSTM-GMM, IBC, and BET, in addition to Diffusion Policy. HDP was tested in both simulated and real-world environments, with varying action space dimensions (2DOF to 6DOF), prehensile and non-prehensile operations, and with rigid and deformable objects. The demonstrations used for training were collected by both single and multiple users. The results indicate that HDP outperforms existing methods in terms of success rate and robustness.

The paper also investigates the importance of phased objective contacts, a 3D vision encoder, and Q-learning action steps. The benefits of HDP also include its ability to decompose multi-modal action distributions and provide a framework for prompt guidance.

PDF Markdown

Hierarchical Diffusion Policy: manipulation trajectory generation via contact guidance (2411.12982v1)

Summary

Related Papers