Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
184 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

One-Step Diffusion Policy: Fast Visuomotor Policies via Diffusion Distillation (2410.21257v1)

Published 28 Oct 2024 in cs.RO and cs.LG

Abstract: Diffusion models, praised for their success in generative tasks, are increasingly being applied to robotics, demonstrating exceptional performance in behavior cloning. However, their slow generation process stemming from iterative denoising steps poses a challenge for real-time applications in resource-constrained robotics setups and dynamically changing environments. In this paper, we introduce the One-Step Diffusion Policy (OneDP), a novel approach that distills knowledge from pre-trained diffusion policies into a single-step action generator, significantly accelerating response times for robotic control tasks. We ensure the distilled generator closely aligns with the original policy distribution by minimizing the Kullback-Leibler (KL) divergence along the diffusion chain, requiring only $2\%$-$10\%$ additional pre-training cost for convergence. We evaluated OneDP on 6 challenging simulation tasks as well as 4 self-designed real-world tasks using the Franka robot. The results demonstrate that OneDP not only achieves state-of-the-art success rates but also delivers an order-of-magnitude improvement in inference speed, boosting action prediction frequency from 1.5 Hz to 62 Hz, establishing its potential for dynamic and computationally constrained robotic applications. We share the project page at https://research.nvidia.com/labs/dir/onedp/.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (50)
  1. Is conditional generative modeling all you need for decision-making? arXiv preprint arXiv:2211.15657, 2022.
  2. ediff-i: Text-to-image diffusion models with an ensemble of expert denoisers. arXiv preprint arXiv:2211.01324, 2022.
  3. Videocrafter1: Open diffusion models for high-quality video generation. arXiv preprint arXiv:2310.19512, 2023a.
  4. Score regularized policy optimization through diffusion behavior. arXiv preprint arXiv:2310.07297, 2023b.
  5. Diffusion policy: Visuomotor policy learning via action diffusion. arXiv preprint arXiv:2303.04137, 2023.
  6. Implicit behavioral cloning. In Conference on Robot Learning, pp.  158–168. PMLR, 2022.
  7. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International conference on machine learning, pp.  1861–1870. PMLR, 2018.
  8. Idql: Implicit q-learning as an actor-critic method with diffusion policies. arXiv preprint arXiv:2304.10573, 2023.
  9. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.  770–778, 2016.
  10. Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, 33:6840–6851, 2020.
  11. Video diffusion models. Advances in Neural Information Processing Systems, 35:8633–8646, 2022.
  12. Planning with diffusion for flexible behavior synthesis. arXiv preprint arXiv:2205.09991, 2022.
  13. Elucidating the design space of diffusion-based generative models. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho (eds.), Advances in Neural Information Processing Systems, 2022. URL https://openreview.net/forum?id=k7FuTOWMOc7.
  14. 3d diffuser actor: Policy diffusion with 3d scene representations. arXiv preprint arXiv:2402.10885, 2024.
  15. Consistency trajectory models: Learning probability flow ode trajectory of diffusion. arXiv preprint arXiv:2310.02279, 2023a.
  16. Consistency trajectory models: Learning probability flow ode trajectory of diffusion. In The Twelfth International Conference on Learning Representations, 2023b.
  17. Diffwave: A versatile diffusion model for audio synthesis. arXiv preprint arXiv:2009.09761, 2020.
  18. Pseudo numerical methods for diffusion models on manifolds. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=PlKWVd2yBkY.
  19. DPM-solver: A fast ODE solver for diffusion probabilistic model sampling in around 10 steps. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho (eds.), Advances in Neural Information Processing Systems, 2022. URL https://openreview.net/forum?id=2uAaGwlP_V.
  20. Knowledge distillation in iterative generative models for improved sampling speed. arXiv preprint arXiv:2101.02388, 2021.
  21. Diff-instruct: A universal approach for transferring knowledge from pre-trained diffusion models. Advances in Neural Information Processing Systems, 36, 2024.
  22. RoboTurk: A Crowdsourcing Platform for Robotic Skill Learning through Imitation. In Conference on Robot Learning, 2018.
  23. Scaling robot supervision to hundreds of hours with roboturk: Robotic manipulation dataset through human reasoning and dexterity. arXiv preprint arXiv:1911.04052, 2019.
  24. What matters in learning from offline human demonstrations for robot manipulation. arXiv preprint arXiv:2108.03298, 2021.
  25. OpenAI. Video generation models as world simulators, 2024. URL https://openai.com/index/video-generation-models-as-world-simulators/.
  26. Dreamfusion: Text-to-3d using 2d diffusion. arXiv preprint arXiv:2209.14988, 2022.
  27. Grad-tts: A diffusion probabilistic model for text-to-speech. In International Conference on Machine Learning, pp.  8599–8608. PMLR, 2021.
  28. Consistency policy: Accelerated visuomotor policies via consistency distillation. arXiv preprint arXiv:2405.07503, 2024.
  29. Learning a diffusion model policy from rewards via q-score matching. arXiv preprint arXiv:2312.11752, 2023.
  30. Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125, 2022.
  31. Goal-conditioned imitation learning using score-based diffusion policies. arXiv preprint arXiv:2304.02532, 2023.
  32. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  10684–10695, 2022.
  33. Photorealistic text-to-image diffusion models with deep language understanding. arXiv preprint arXiv:2205.11487, 2022.
  34. Progressive distillation for fast sampling of diffusion models. arXiv preprint arXiv:2202.00512, 2022.
  35. Deep unsupervised learning using nonequilibrium thermodynamics. In International Conference on Machine Learning, pp.  2256–2265. PMLR, 2015.
  36. Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502, 2020a.
  37. Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456, 2020b.
  38. Consistency models. arXiv preprint arXiv:2303.01469, 2023.
  39. Octo: An open-source generalist robot policy. arXiv preprint arXiv:2405.12213, 2024.
  40. Se (3)-diffusionfields: Learning smooth cost functions for joint grasp and motion optimization through diffusion. In 2023 IEEE International Conference on Robotics and Automation (ICRA), pp.  5923–5930. IEEE, 2023.
  41. Diffusion policies as an expressive policy class for offline reinforcement learning. arXiv preprint arXiv:2208.06193, 2022.
  42. Diffusion-gan: Training gans with diffusion. International Conference on Learning Representations, 2023.
  43. Prolificdreamer: High-fidelity and diverse text-to-3d generation with variational score distillation. Advances in Neural Information Processing Systems, 36, 2024.
  44. Tackling the generative learning trilemma with denoising diffusion gans. arXiv preprint arXiv:2112.07804, 2021.
  45. Ufogen: You forward once large scale text-to-image generation via diffusion gans. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  8196–8206, 2024.
  46. One-step diffusion with distribution matching distillation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  6613–6623, 2024.
  47. 3d diffusion policy. arXiv preprint arXiv:2403.03954, 2024.
  48. Learning fine-grained bimanual manipulation with low-cost hardware. arXiv preprint arXiv:2304.13705, 2023.
  49. Fast sampling of diffusion models via operator learning. In International conference on machine learning, pp.  42390–42402. PMLR, 2023.
  50. Score identity distillation: Exponentially fast distillation of pretrained diffusion models for one-step generation. In Forty-first International Conference on Machine Learning, 2024.
Citations (3)

Summary

  • The paper introduces OneDP, a one-step diffusion policy that speeds up robotic control by replacing iterative diffusion steps with a single inference.
  • It employs a unique policy-matching distillation method that minimizes KL divergence to retain model fidelity with only 2%-10% additional pre-training.
  • Experimental results on simulation and real-world Franka robot tasks demonstrate an inference speed leap from 1.5 Hz to 62 Hz, enabling real-time operations.

Overview of "One-Step Diffusion Policy: Fast Visuomotor Policies via Diffusion Distillation"

The paper "One-Step Diffusion Policy: Fast Visuomotor Policies via Diffusion Distillation" presents an innovative approach to accelerating the application of diffusion models in robotic control tasks. The primary contribution, termed One-Step Diffusion Policy (OneDP), addresses the inherent speed limitations of diffusion models when applied to real-time robot control by drastically reducing the computational burden associated with generating actions.

Core Contributions

Diffusion models have shown remarkable success in generative AI domains but face practical challenges in robotic applications due to their slow inference speed, arising from multiple iterative denoising steps needed to traverse a diffusion chain. The authors propose OneDP, a methodology to distill a pre-trained diffusion model into a one-step action generator. This allows for rapid decision-making by the robot, enhancing its ability to operate in dynamic and resource-constrained environments.

The paper details a novel distillation process that transfers learned behavior from a traditional multi-step diffusion model to a single-step model. This is achieved by minimizing the KL divergence along the diffusion chain, ensuring that the distilled model retains the original policy's distribution characteristics with minimal additional training cost.

Methodology

The authors employ a stochastic policy-matching distillation method, inspired by advances in text-to-3D generation techniques like SDS (Score Distillation Sampling) and VSD (Variational Score Distillation). The distillation involves training a one-step action generator alongside a generator score network, ensuring fidelity to the original diffusion policy. The distilled policy is not only computationally efficient but achieves this with only 2%-10% additional pre-training, which is significant given the real-time demands of robotic systems.

Experimental Results

The proposed OneDP was evaluated on a suite of six challenging simulation tasks and four real-world tasks using a Franka robot. The experimental results highlight that OneDP not only achieves state-of-the-art success rates comparable to existing diffusion policies but also demonstrates a dramatic improvement in inference speed, enhancing action prediction from 1.5 Hz to an impressive 62 Hz. This represents an order-of-magnitude speed-up, which is pivotal for real-time applications.

Implications and Future Directions

The development of the OneDP has substantial implications for the field of robotic control. By reducing the computational overhead associated with traditional diffusion models, this approach paves the way for deploying advanced AI systems in real-world scenarios that demand quick adaptation and responsiveness. Theoretically, it opens up further exploration into improving distillation techniques and leveraging efficient diffusion strategies, potentially benefiting areas beyond robotics, such as interactive AI and autonomous systems.

Future work could explore integrating discriminative learning frameworks to bolster the alignment between the distilled and original models, as well as extending OneDP to tackle more complex, long-horizon tasks in robotics. The adaptability of OneDP in handling variabilities typical of real-world environments also presents exciting opportunities for broader applications.

In summary, this paper introduces a significant step forward in marrying the strengths of diffusion models with the stringent demands of robotic control, all while streamlining the inference process to facilitate real-time operational capabilities.

X Twitter Logo Streamline Icon: https://streamlinehq.com