One-Step Diffusion Policy: Fast Visuomotor Policies via Diffusion Distillation (2410.21257v1)

Published 28 Oct 2024 in cs.RO and cs.LG

Abstract: Diffusion models, praised for their success in generative tasks, are increasingly being applied to robotics, demonstrating exceptional performance in behavior cloning. However, their slow generation process stemming from iterative denoising steps poses a challenge for real-time applications in resource-constrained robotics setups and dynamically changing environments. In this paper, we introduce the One-Step Diffusion Policy (OneDP), a novel approach that distills knowledge from pre-trained diffusion policies into a single-step action generator, significantly accelerating response times for robotic control tasks. We ensure the distilled generator closely aligns with the original policy distribution by minimizing the Kullback-Leibler (KL) divergence along the diffusion chain, requiring only $2\%$-$10\%$ additional pre-training cost for convergence. We evaluated OneDP on 6 challenging simulation tasks as well as 4 self-designed real-world tasks using the Franka robot. The results demonstrate that OneDP not only achieves state-of-the-art success rates but also delivers an order-of-magnitude improvement in inference speed, boosting action prediction frequency from 1.5 Hz to 62 Hz, establishing its potential for dynamic and computationally constrained robotic applications. We share the project page at https://research.nvidia.com/labs/dir/onedp/.

References (50)

Citations (3)

View on Semantic Scholar

Summary

The paper introduces OneDP, a one-step diffusion policy that speeds up robotic control by replacing iterative diffusion steps with a single inference.
It employs a unique policy-matching distillation method that minimizes KL divergence to retain model fidelity with only 2%-10% additional pre-training.
Experimental results on simulation and real-world Franka robot tasks demonstrate an inference speed leap from 1.5 Hz to 62 Hz, enabling real-time operations.

Overview of "One-Step Diffusion Policy: Fast Visuomotor Policies via Diffusion Distillation"

The paper "One-Step Diffusion Policy: Fast Visuomotor Policies via Diffusion Distillation" presents an innovative approach to accelerating the application of diffusion models in robotic control tasks. The primary contribution, termed One-Step Diffusion Policy (OneDP), addresses the inherent speed limitations of diffusion models when applied to real-time robot control by drastically reducing the computational burden associated with generating actions.

Core Contributions

Diffusion models have shown remarkable success in generative AI domains but face practical challenges in robotic applications due to their slow inference speed, arising from multiple iterative denoising steps needed to traverse a diffusion chain. The authors propose OneDP, a methodology to distill a pre-trained diffusion model into a one-step action generator. This allows for rapid decision-making by the robot, enhancing its ability to operate in dynamic and resource-constrained environments.

The paper details a novel distillation process that transfers learned behavior from a traditional multi-step diffusion model to a single-step model. This is achieved by minimizing the KL divergence along the diffusion chain, ensuring that the distilled model retains the original policy's distribution characteristics with minimal additional training cost.

Methodology

The authors employ a stochastic policy-matching distillation method, inspired by advances in text-to-3D generation techniques like SDS (Score Distillation Sampling) and VSD (Variational Score Distillation). The distillation involves training a one-step action generator alongside a generator score network, ensuring fidelity to the original diffusion policy. The distilled policy is not only computationally efficient but achieves this with only 2%-10% additional pre-training, which is significant given the real-time demands of robotic systems.

Experimental Results

The proposed OneDP was evaluated on a suite of six challenging simulation tasks and four real-world tasks using a Franka robot. The experimental results highlight that OneDP not only achieves state-of-the-art success rates comparable to existing diffusion policies but also demonstrates a dramatic improvement in inference speed, enhancing action prediction from 1.5 Hz to an impressive 62 Hz. This represents an order-of-magnitude speed-up, which is pivotal for real-time applications.

Implications and Future Directions

The development of the OneDP has substantial implications for the field of robotic control. By reducing the computational overhead associated with traditional diffusion models, this approach paves the way for deploying advanced AI systems in real-world scenarios that demand quick adaptation and responsiveness. Theoretically, it opens up further exploration into improving distillation techniques and leveraging efficient diffusion strategies, potentially benefiting areas beyond robotics, such as interactive AI and autonomous systems.

Future work could explore integrating discriminative learning frameworks to bolster the alignment between the distilled and original models, as well as extending OneDP to tackle more complex, long-horizon tasks in robotics. The adaptability of OneDP in handling variabilities typical of real-world environments also presents exciting opportunities for broader applications.

In summary, this paper introduces a significant step forward in marrying the strengths of diffusion models with the stringent demands of robotic control, all while streamlining the inference process to facilitate real-time operational capabilities.

PDF Markdown

Related Papers

Tweets

https://twitter.com/OWW/status/1851439435105132843