Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
129 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Diffusion Models for Reinforcement Learning: A Survey (2311.01223v4)

Published 2 Nov 2023 in cs.LG and cs.AI

Abstract: Diffusion models surpass previous generative models in sample quality and training stability. Recent works have shown the advantages of diffusion models in improving reinforcement learning (RL) solutions. This survey aims to provide an overview of this emerging field and hopes to inspire new avenues of research. First, we examine several challenges encountered by RL algorithms. Then, we present a taxonomy of existing methods based on the roles of diffusion models in RL and explore how the preceding challenges are addressed. We further outline successful applications of diffusion models in various RL-related tasks. Finally, we conclude the survey and offer insights into future research directions. We are actively maintaining a GitHub repository for papers and other related resources in utilizing diffusion models in RL: https://github.com/apexrl/Diff4RLSurvey.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (134)
  1. Constrained policy optimization. In International conference on machine learning, pages 22–31. PMLR, 2017.
  2. Diffusion policies for out-of-distribution generalization in offline reinforcement learning, 2023.
  3. Is conditional generative modeling all you need for decision-making? 2023.
  4. Deep reinforcement learning: A brief survey. IEEE Signal Processing Magazine, 34(6):26–38, 2017.
  5. Structured denoising diffusion models in discrete state-spaces. Advances in Neural Information Processing Systems, 34:17981–17993, 2021.
  6. A survey of meta-reinforcement learning. arXiv preprint arXiv:2301.08028, 2023.
  7. Zero-shot robotic manipulation with pretrained image-editing diffusion models, 2023.
  8. Edgi: Equivariant diffusion for planning with embodied agents, 2023.
  9. Vode: A variable-coefficient ode solver. SIAM Journal on Scientific and Statistical Computing, 10(5):1038–1051, 1989.
  10. Motion planning diffusion: Learning and planning of robot motions with diffusion models, 2023.
  11. Decision transformer: Reinforcement learning via sequence modeling. Advances in neural information processing systems, 34:15084–15097, 2021.
  12. Executing your commands via motion diffusion in latent space. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 18000–18010, 2022.
  13. Offline reinforcement learning via high-fidelity generative behavior modeling. In The Eleventh International Conference on Learning Representations, 2023.
  14. Equidiff: A conditional equivariant diffusion model for trajectory prediction, 2023.
  15. Playfusion: Skill acquisition via diffusion from language-annotated play. In 7th Annual Conference on Robot Learning, 2023.
  16. Boosting continuous control with consistency policy, 2023.
  17. Genaug: Retargeting behaviors to unseen situations via generative augmentation. arXiv preprint arXiv:2302.06671, 2023.
  18. Diffusion policy: Visuomotor policy learning via action diffusion, 2023.
  19. S2p: State-conditioned image synthesis for data augmentation in offline reinforcement learning. Advances in Neural Information Processing Systems, 35:11534–11546, 2022.
  20. Mofusion: A framework for denoising-diffusion-based motion synthesis. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 9760–9770, 2022.
  21. Diffusion models beat gans on image synthesis. Advances in neural information processing systems, 34:8780–8794, 2021.
  22. Aligndiff: Aligning diverse human preferences via behavior-customisable diffusion model, 2023.
  23. A family of embedded runge-kutta formulae. Journal of Computational and Applied Mathematics, 6(1):19–26, 1980.
  24. Learning universal policies via text-guided video generation, 2023.
  25. Video language planning, 2023.
  26. A minimalist approach to offline reinforcement learning. Advances in neural information processing systems, 34:20132–20145, 2021.
  27. Off-policy deep reinforcement learning without exploration. In International conference on machine learning, pages 2052–2062. PMLR, 2019.
  28. Can pre-trained text-to-image models generate visual goals for reinforcement learning?, 2023.
  29. A comprehensive survey on safe reinforcement learning. Journal of Machine Learning Research, 16(1):1437–1480, 2015.
  30. Generative adversarial nets. Advances in neural information processing systems, 27, 2014.
  31. A review of safe reinforcement learning: Methods, theory and applications. arXiv preprint arXiv:2205.10330, 2022.
  32. Scaling up and distilling down: Language-guided robot skill acquisition. In 7th Annual Conference on Robot Learning, 2023.
  33. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International conference on machine learning, pages 1861–1870. PMLR, 2018.
  34. Idql: Implicit q-learning as an actor-critic method with diffusion policies, 2023.
  35. Diffusion model is an effective planner and data synthesizer for multi-task reinforcement learning. 2023.
  36. Diffcps: Diffusion model based constrained policy search for offline reinforcement learning, 2023.
  37. Generating behaviorally diverse policies with latent diffusion models, 2023.
  38. Classifier-free diffusion guidance, 2022.
  39. Denoising diffusion probabilistic models, 2020.
  40. Imagen video: High definition video generation with diffusion models. arXiv preprint arXiv:2210.02303, 2022.
  41. Instructed diffuser with temporal condition guidance for offline reinforcement learning, 2023.
  42. Baris Imre. An investigation of generative replay in deep reinforcement learning. B.S. thesis, University of Twente, 2021.
  43. Planning with diffusion for flexible behavior synthesis. In International Conference on Machine Learning, 2022.
  44. Motiondiffuser: Controllable multi-agent motion prediction using diffusion, 2023.
  45. Large language models struggle to learn long-tail knowledge. In International Conference on Machine Learning, pages 15696–15707. PMLR, 2023.
  46. Efficient diffusion policies for offline reinforcement learning, 2023.
  47. Dall-e-bot: Introducing web-scale diffusion models to robotics. IEEE Robotics and Automation Letters, 8:3956–3963, 2022.
  48. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013.
  49. Deep reinforcement learning for autonomous driving: A survey. IEEE Transactions on Intelligent Transportation Systems, 23(6):4909–4926, 2021.
  50. Learning to act from actionless videos through dense correspondences, 2023.
  51. Reinforcement learning in robotics: A survey. The International Journal of Robotics Research, 32(11):1238–1274, 2013.
  52. Diffwave: A versatile diffusion model for audio synthesis. arXiv preprint arXiv:2009.09761, 2020.
  53. Offline reinforcement learning with fisher divergence critic regularization. In International Conference on Machine Learning, pages 5774–5783. PMLR, 2021.
  54. Conservative q-learning for offline reinforcement learning. Advances in Neural Information Processing Systems, 33:1179–1191, 2020.
  55. Reinforcement learning with augmented data. Advances in neural information processing systems, 33:19884–19895, 2020.
  56. Nu-wave: A diffusion probabilistic model for neural audio upsampling. arXiv preprint arXiv:2104.02321, 2021.
  57. Multi-game decision transformers. Advances in Neural Information Processing Systems, 35:27921–27936, 2022.
  58. Offline reinforcement learning: Tutorial, review, and perspectives on open problems, 2020.
  59. Diffusion-lm improves controllable text generation. Advances in Neural Information Processing Systems, 35:4328–4343, 2022.
  60. Hierarchical diffusion for offline decision making. In Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett, editors, Proceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Research, pages 20035–20064. PMLR, 23–29 Jul 2023.
  61. Crossway diffusion: Improving diffusion-based visuomotor policy via self-supervised learning, 2023.
  62. Beyond conservatism: Diffusion policies in offline multi-agent reinforcement learning, 2023.
  63. Wenhao Li. Efficient planning with latent diffusion, 2023.
  64. AdaptDiffuser: Diffusion models as adaptive self-evolving planners. In Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett, editors, Proceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Research, pages 20725–20745. PMLR, 23–29 Jul 2023.
  65. Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971, 2015.
  66. Conflict-averse gradient descent for multi-task learning. Advances in Neural Information Processing Systems, 34:18878–18890, 2021.
  67. Constrained decision transformer for offline safe reinforcement learning. arXiv preprint arXiv:2302.07351, 2023.
  68. Dpm-solver: A fast ode solver for diffusion probabilistic model sampling in around 10 steps, 2022.
  69. Contrastive energy prediction for exact energy-guided diffusion sampling in offline reinforcement learning, 2023.
  70. Synthetic experience replay. In Workshop on Reincarnating Reinforcement Learning at ICLR 2023, 2023.
  71. Repaint: Inpainting using denoising diffusion probabilistic models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11461–11471, 2022.
  72. Diffusion probabilistic models for 3d point cloud generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2837–2845, 2021.
  73. A survey on model-based reinforcement learning. arXiv preprint arXiv:2206.09328, 2022.
  74. Mildly conservative q-learning for offline reinforcement learning. Advances in Neural Information Processing Systems, 35:1711–1724, 2022.
  75. Value function estimation using conditional diffusion models for control, 2023.
  76. Offline pre-trained multi-agent decision transformer: One big sequence model tackles all smac tasks. arXiv preprint arXiv:2112.02845, 2021.
  77. Generative skill chaining: Long-horizon skill planning with diffusion models. In 7th Annual Conference on Robot Learning, 2023.
  78. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602, 2013.
  79. Model-based reinforcement learning: A survey. Foundations and Trends® in Machine Learning, 16(1):1–118, 2023.
  80. Algaedice: Policy gradient from arbitrary experience. arXiv preprint arXiv:1912.02074, 2019.
  81. Neural network dynamics for model-based deep reinforcement learning with model-free fine-tuning. In 2018 IEEE international conference on robotics and automation (ICRA), pages 7559–7566. IEEE, 2018.
  82. Skill-based meta-reinforcement learning, 2022.
  83. Radford M Neal et al. Mcmc using hamiltonian dynamics. Handbook of markov chain monte carlo, 2(11):2, 2011.
  84. Diffusion co-policy for synergistic human-robot collaborative tasks, 2023.
  85. Metadiffuser: Diffusion model as conditional planner for offline meta-rl, 2023.
  86. Extracting reward functions from diffusion models, 2023.
  87. Zero-shot task generalization with multi-task deep reinforcement learning. In International Conference on Machine Learning, pages 2661–2670. PMLR, 2017.
  88. Imitating human behaviour with diffusion models. In The Eleventh International Conference on Learning Representations, 2023.
  89. User behavior retrieval for click-through rate prediction. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 2347–2356, 2020.
  90. Single motion diffusion, 2023.
  91. A generalist agent. arXiv preprint arXiv:2205.06175, 2022.
  92. Goal-conditioned imitation learning using score-based diffusion policies, 2023.
  93. High-resolution image synthesis with latent diffusion models, 2022.
  94. Juergen Schmidhuber. Reinforcement learning upside down: Don’t predict rewards–just map them to actions. arXiv preprint arXiv:1912.02875, 2019.
  95. Structure-based drug design with dvariant diffusion models. arXiv preprint arXiv:2210.13695, 2022.
  96. Mastering atari, go, chess and shogi by planning with a learned model. Nature, 588(7839):604–609, 2020.
  97. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
  98. Knn-diffusion: Image generation via large-scale retrieval. arXiv preprint arXiv:2204.02849, 2022.
  99. Skill-based model-based reinforcement learning, 2022.
  100. S4rl: Surprisingly simple self-supervision for offline reinforcement learning, 2021.
  101. Sliced score matching: A scalable approach to density and score estimation, 2019.
  102. Score-based generative modeling through stochastic differential equations, 2021.
  103. Denoising diffusion implicit models, 2022.
  104. Consistency models, 2023.
  105. Sauté rl: Almost surely safe reinforcement learning using state augmentation. In International Conference on Machine Learning, pages 20423–20443. PMLR, 2022.
  106. Nomad: Goal masked diffusion policies for navigation and exploration, 2023.
  107. Training agents using upside-down reinforcement learning. arXiv preprint arXiv:1912.02877, 2019.
  108. Fighting uncertainty with gradients: Offline reinforcement learning via diffusion score matching, 2023.
  109. Reinforcement learning: An introduction. MIT press, 2018.
  110. Investigating multi-task pretraining and generalization in reinforcement learning. In The Eleventh International Conference on Learning Representations, 2022.
  111. Reward constrained policy optimization. arXiv preprint arXiv:1805.11074, 2018.
  112. Human motion diffusion model, 2022.
  113. Reasoning with latent diffusion in offline reinforcement learning, 2023.
  114. A survey of multi-task deep reinforcement learning. Electronics, 9(9):1363, 2020.
  115. Diffusion policies as an expressive policy class for offline reinforcement learning. In The Eleventh International Conference on Learning Representations, 2023.
  116. Learning to efficiently sample from diffusion probabilistic models, 2021.
  117. Multi-agent reinforcement learning is a sequence modeling problem. Advances in Neural Information Processing Systems, 35:16509–16521, 2022.
  118. Behavior regularized offline reinforcement learning. arXiv preprint arXiv:1911.11361, 2019.
  119. Chaineddiffuser: Unifying trajectory diffusion and keypose prediction for robotic manipulation. In 7th Annual Conference on Robot Learning, 2023.
  120. Learning to combat compounding-error in model-based reinforcement learning. arXiv preprint arXiv:1912.11206, 2019.
  121. Safediffuser: Safe planning with diffusion probabilistic models, 2023.
  122. Geodiff: A geometric diffusion model for molecular conformation generation. arXiv preprint arXiv:2203.02923, 2022.
  123. Xskill: Cross embodiment skill discovery, 2023.
  124. Learning interactive real-world simulators, 2023.
  125. To the noise and back: Diffusion for shared autonomy, 2023.
  126. Gradient surgery for multi-task learning. Advances in Neural Information Processing Systems, 33:5824–5836, 2020.
  127. Scaling robot learning with semantically imagined experience. arXiv preprint arXiv:2302.11550, 2023.
  128. Motiondiffuse: Text-driven human motion generation with diffusion model. arXiv preprint arXiv:2208.15001, 2022.
  129. Lad: Language control diffusion: efficiently scaling through space, time, and tasks. arXiv preprint arXiv:2210.15629, 2023.
  130. Remodiffuse: Retrieval-augmented motion diffusion model, 2023.
  131. Saformer: A conditional sequence modeling approach to offline safe reinforcement learning. arXiv preprint arXiv:2301.12203, 2023.
  132. Truncated diffusion probabilistic models and diffusion-based adversarial auto-encoders, 2023.
  133. Mapgo: Model-assisted policy optimization for goal-oriented tasks. arXiv preprint arXiv:2105.06350, 2021.
  134. Madiff: Offline multi-agent learning with diffusion models, 2023.
Citations (38)

Summary

  • The paper demonstrates that diffusion models improve policy expressiveness and trajectory planning in reinforcement learning.
  • It categorizes diffusion models into planners, policies, and data synthesizers to tackle offline RL challenges like data scarcity and distribution shifts.
  • The survey presents strong numerical results, showing notable performance gains in offline, multi-task, and multi-agent RL settings.

Diffusion Models for Reinforcement Learning: A Survey

The paper "Diffusion Models for Reinforcement Learning: A Survey" provides an analytical overview of the recent integration and application of diffusion models within reinforcement learning (RL). This area of paper examines how these models, known for their high-quality generative capabilities, contribute to addressing longstanding challenges in RL, such as restricted policy expressiveness, data scarcity, and compounding errors in model-based planning.

Challenges in Reinforcement Learning

The survey begins by exploring the inherent challenges in RL, particularly in offline settings. Traditional RL algorithms often suffer from low sample efficiency, compounded by limitations in policy expressiveness. Offline RL, which relies on existing datasets without real-time interactions, faces further constraints due to the potential mismatch between the dataset's distribution and the policy's operational environment. This distributional shift leads to the necessity for policies that can adapt beyond the constraints of unimodal Gaussian distributions.

Additionally, model-based RL methods encounter compounding errors from cumulative prediction inaccuracies, and conventional multitask RL approaches struggle with generalization across varied task settings. These challenges present a landscape where diffusion models could leverage their distribution modeling strengths to offer effective solutions.

Roles and Frameworks for Diffusion Models

The paper categorizes the roles of diffusion models into three primary functions:

  1. Planner: Diffusion models are deployed to generate multistep plans by modeling full trajectory segments, thus addressing the temporal consistency issues in planning with dynamic models in RL. Notably, diffusion-based planners can utilize classifier-guided or classifier-free sampling for trajectory generation, proving beneficial in environments with restrictive offline datasets.
  2. Policy: Here, diffusion models replace traditional policy parameterizations, directly modeling more expressive action distributions in environments with complex dynamics. These diffusion policies are integrated with Q-learning-based frameworks, where the model's expressiveness can particularly benefit offline RL techniques like weighted regression.
  3. Data Synthesizer: Diffusion models enhance dataset diversity by generating high-quality synthetic data to augment training sets. This approach alleviates data scarcity concerns by maintaining dynamic consistency and expanding representational coverage beyond what is contained in limited offline datasets.

Applications and Strong Numerical Results

The survey highlights that diffusion models achieve significant performance improvements across several applications, including standard, multitask, and multi-agent offline RL. Incorporating these models into imitation learning provides further validation, with enhanced ability to imitate multi-modal behaviors from complex real-world datasets. These models also extend into trajectory generation tasks beyond RL, offering superior outcomes in human pose and robotic motion synthesis.

Speculative Futures and Implications

The survey concludes with potential future directions for research. It suggests the exploration of generative simulation to create diverse, contextual interaction environments and the integration of safety constraints within diffusion-guided decision frameworks. Additionally, leveraging retrieval-augmented diffusion models could bolster generation quality in long-tail distributed datasets.

In sum, this survey elevates the understanding of diffusion models in RL, emphasizing their versatility and effectiveness in addressing core challenges. As research progress continues, this integration may further evolve to utilize the full breadth of diffusion models in complex decision-making domains.