Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 80 tok/s
Gemini 2.5 Pro 49 tok/s Pro
GPT-5 Medium 33 tok/s Pro
GPT-5 High 25 tok/s Pro
GPT-4o 117 tok/s Pro
Kimi K2 176 tok/s Pro
GPT OSS 120B 457 tok/s Pro
Claude Sonnet 4.5 32 tok/s Pro
2000 character limit reached

Generating Behaviorally Diverse Policies with Latent Diffusion Models (2305.18738v2)

Published 30 May 2023 in cs.LG, cs.AI, and cs.RO

Abstract: Recent progress in Quality Diversity Reinforcement Learning (QD-RL) has enabled learning a collection of behaviorally diverse, high performing policies. However, these methods typically involve storing thousands of policies, which results in high space-complexity and poor scaling to additional behaviors. Condensing the archive into a single model while retaining the performance and coverage of the original collection of policies has proved challenging. In this work, we propose using diffusion models to distill the archive into a single generative model over policy parameters. We show that our method achieves a compression ratio of 13x while recovering 98% of the original rewards and 89% of the original coverage. Further, the conditioning mechanism of diffusion models allows for flexibly selecting and sequencing behaviors, including using language. Project website: https://sites.google.com/view/policydiffusion/home

Definition Search Book Streamline Icon: https://streamlinehq.com
References (33)
  1. Proximal policy gradient arborescence for quality diversity reinforcement learning. arXiv preprint arXiv:2305.13795, 2023.
  2. Deep surrogate assisted generation of environments. CoRR, abs/2206.04199, 2022.
  3. Decision transformer: Reinforcement learning via sequence modeling. In Neural Information Processing Systems, 2021.
  4. Scaling instruction-finetuned language models, 2022.
  5. Robots that can adapt like animals. Nat., 521(7553):503–507, 2015.
  6. Diffusion models beat gans on image synthesis. In Marc’Aurelio Ranzato, Alina Beygelzimer, Yann N. Dauphin, Percy Liang, and Jennifer Wortman Vaughan, editors, Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual, pages 8780–8794, 2021.
  7. Go-explore: a new approach for hard-exploration problems. CoRR, abs/1901.10995, 2019.
  8. Map-elites with descriptor-conditioned gradients and archive distillation into a single policy. CoRR, abs/2303.03832, 2023.
  9. Differentiable quality diversity. Advances in Neural Information Processing Systems, 34:10040–10052, 2021.
  10. Brax - a differentiable physics engine for large scale rigid body simulation, 2021.
  11. Discovering representations for black-box optimization. Proceedings of the 2020 Genetic and Evolutionary Computation Conference, 2020.
  12. Hypernetworks. CoRR, abs/1609.09106, 2016.
  13. Efficiently learning small policies for locomotion and manipulation. arXiv preprint arXiv:2210.00140, 2022.
  14. Denoising diffusion probabilistic models. In Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin, editors, Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, 2020.
  15. Generative adversarial policy networks for behavioural repertoire. CoRR, abs/1811.02945, 2018.
  16. Elucidating the design space of diffusion-based generative models. ArXiv, abs/2206.00364, 2022.
  17. Parameter prediction for unseen deep architectures. In Advances in Neural Information Processing Systems, 2021.
  18. Dpm-solver: A fast ode solver for diffusion probabilistic model sampling in around 10 steps. ArXiv, abs/2206.00927, 2022.
  19. The quality-diversity transformer: Generating behavior-conditioned trajectories with decision transformers. ArXiv, abs/2303.16207, 2023.
  20. Illuminating search spaces by mapping elites. CoRR, abs/1504.04909, 2015.
  21. Policy gradient assisted map-elites. In Francisco Chicano and Krzysztof Krawiec, editors, GECCO ’21: Genetic and Evolutionary Computation Conference, Lille, France, July 10-14, 2021, pages 866–875. ACM, 2021.
  22. Diversity policy gradient for sample efficient quality-diversity optimization. In Jonathan E. Fieldsend and Markus Wagner, editors, GECCO ’22: Genetic and Evolutionary Computation Conference, Boston, Massachusetts, USA, July 9 - 13, 2022, pages 1075–1083. ACM, 2022.
  23. Policy manifold search: exploring the manifold hypothesis for diversity-based neuroevolution. Proceedings of the Genetic and Evolutionary Computation Conference, 2021.
  24. High-resolution image synthesis with latent diffusion models. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 18-24, 2022, pages 10674–10685. IEEE, 2022.
  25. Denoising diffusion implicit models. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net, 2021.
  26. Score-based generative modeling through stochastic differential equations. ArXiv, abs/2011.13456, 2020.
  27. Approximating gradients for differentiable quality diversity in reinforcement learning. In Jonathan E. Fieldsend and Markus Wagner, editors, GECCO ’22: Genetic and Evolutionary Computation Conference, Boston, Massachusetts, USA, July 9 - 13, 2022, pages 1102–1111. ACM, 2022.
  28. Approximating gradients for differentiable quality diversity in reinforcement learning. In Proceedings of the Genetic and Evolutionary Computation Conference, pages 1102–1111, 2022.
  29. Scaling covariance matrix adaptation map-annealing to high-dimensional controllers. In Deep Reinforcement Learning Workshop NeurIPS 2022.
  30. Using centroidal voronoi tessellations to scale up the multidimensional archive of phenotypic elites algorithm. IEEE Trans. Evol. Comput., 22(4):623–630, 2018.
  31. Discovering the elite hypervolume by leveraging interspecies correlation. Proceedings of the Genetic and Evolutionary Computation Conference, 2018.
  32. Continual learning with hypernetworks. CoRR, abs/1906.00695, 2019.
  33. Graph hypernetworks for neural architecture search. In 7th International Conference on Learning Representations, ICLR 2019, 2019.
Citations (9)

Summary

We haven't generated a summary for this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.