Diffusion Policy: Visuomotor Policy Learning via Action Diffusion (2303.04137v4)

Published 7 Mar 2023 in cs.RO

Abstract: This paper introduces Diffusion Policy, a new way of generating robot behavior by representing a robot's visuomotor policy as a conditional denoising diffusion process. We benchmark Diffusion Policy across 12 different tasks from 4 different robot manipulation benchmarks and find that it consistently outperforms existing state-of-the-art robot learning methods with an average improvement of 46.9%. Diffusion Policy learns the gradient of the action-distribution score function and iteratively optimizes with respect to this gradient field during inference via a series of stochastic Langevin dynamics steps. We find that the diffusion formulation yields powerful advantages when used for robot policies, including gracefully handling multimodal action distributions, being suitable for high-dimensional action spaces, and exhibiting impressive training stability. To fully unlock the potential of diffusion models for visuomotor policy learning on physical robots, this paper presents a set of key technical contributions including the incorporation of receding horizon control, visual conditioning, and the time-series diffusion transformer. We hope this work will help motivate a new generation of policy learning techniques that are able to leverage the powerful generative modeling capabilities of diffusion models. Code, data, and training details will be publicly available.

Citations (649)

View on Semantic Scholar

Summary

The paper introduces the Diffusion Policy method, which iteratively refines actions using denoising diffusion to tackle multimodal distribution challenges.
It achieves an average 46.9% improvement over current methods across 15 diverse tasks, demonstrating both simulation and real-world effectiveness.
The approach integrates transformer-based diffusion, receding horizon control, and visual conditioning to ensure temporal consistency and robust high-dimensional control.

Diffusion Policy: Visuomotor Policy Learning via Action Diffusion

The paper presents a novel approach to robot visuomotor policy generation through a technique termed Diffusion Policy. This method leverages the capabilities of denoising diffusion processes, typically used in generative modeling, to enhance robot action learning. By framing a robot's policy as a conditional denoising diffusion process, the authors provide a robust solution to some persistent challenges in the field, particularly modeling multimodal action distributions and maintaining stability in high-dimensional action spaces.

Key Contributions and Methodology

Diffusion Policy is evaluated across a broad spectrum of tasks, yielding an average improvement of 46.9% in performance over current state-of-the-art methods. The methodology centers on three core innovations:

Action Diffusion Framework: Instead of directly predicting actions, the model iteratively refines noise towards actions using learned gradients from an underlying score function, disseminated through multiple diffusion iterations. This iterative refinement leverages stochastic Langevin dynamics and allows the policy to explore a wider and more complex action space than traditional methods.
Handling of Multimodal Distributions: By learning the gradient of an action distribution’s score function, Diffusion Policy can seamlessly model complex, multimodal action distributions—a common challenge in imitation learning due to the nuanced and varied nature of human demonstrations.
High-Dimensional Action Sequences: Unlike conventional policies that output single-step actions, Diffusion Policies predict sequences of actions, enhancing temporal consistency and flexibility. This method is scalable and well-suited for environments requiring precise long-term planning and real-time reactions.

Technical Additions

The paper introduces several technical contributions to maximize the applicability and effectiveness of diffusion models in robotic policy learning:

Receding Horizon Control and Visual Conditioning: By integrating short-prediction horizons and conditioning actions on visual observations, the model enables dynamic replanning and reduces computational latency, crucial for practical deployment in physical robots.
Time-Series Diffusion Transformer: The implementation of a transformer-based network minimizes common over-smoothing issues seen in convolutional architectures, facilitating high-frequency action changes and enhancing control over actions requiring fine velocity adjustments.

Evaluation and Performance

The empirical evaluation covers 15 tasks across multiple benchmarks, including both simulations and real-world scenarios. These tasks vary in complexity, dimensionality, and involve different degrees of task precision and object manipulation. The robustness across these varied conditions underlines the versatility and robustness of Diffusion Policy.

Theoretical and Practical Implications

Theoretically, this paper bridges the gap between diffusion-based generative models and real-world robot learning, presenting new opportunities for integrating these techniques. Practically, the enhancements in handling multimodal distributions and high-dimensional actions offer more reliable and adaptable robotic policies, which can be transformational for tasks requiring intricate manipulation.

Future Prospects

The success of Diffusion Policy opens several avenues for future research, both in its immediate domain of imitation learning and broader applications in reinforcement learning and autonomous planning systems. Integrating diffusion models with reinforcement learning could further exploit suboptimal data and address challenges where exhaustive demonstration data isn't feasible.

In summary, Diffusion Policy as outlined in this paper represents a significant methodological advance in robot visuomotor control. By adapting principles from generative modeling to address key challenges in robotic policy learning, this approach sets a new standard for how robots can learn complex behaviors from high-dimensional data. Its proven effectiveness across rigorous benchmarks makes it a compelling candidate for future research and application.

PDF Markdown

Related Papers

Tweets

https://twitter.com/RemiCadene/status/1816165796521075194

https://twitter.com/PLBiojout/status/1858863454188122305

https://twitter.com/re105_m/status/1774805349138129386

https://twitter.com/dimid_ml/status/1744026109849833697

YouTube

Show All Videos