Generating Behaviorally Diverse Policies with Latent Diffusion Models (2305.18738v2)
Abstract: Recent progress in Quality Diversity Reinforcement Learning (QD-RL) has enabled learning a collection of behaviorally diverse, high performing policies. However, these methods typically involve storing thousands of policies, which results in high space-complexity and poor scaling to additional behaviors. Condensing the archive into a single model while retaining the performance and coverage of the original collection of policies has proved challenging. In this work, we propose using diffusion models to distill the archive into a single generative model over policy parameters. We show that our method achieves a compression ratio of 13x while recovering 98% of the original rewards and 89% of the original coverage. Further, the conditioning mechanism of diffusion models allows for flexibly selecting and sequencing behaviors, including using language. Project website: https://sites.google.com/view/policydiffusion/home
- Proximal policy gradient arborescence for quality diversity reinforcement learning. arXiv preprint arXiv:2305.13795, 2023.
- Deep surrogate assisted generation of environments. CoRR, abs/2206.04199, 2022.
- Decision transformer: Reinforcement learning via sequence modeling. In Neural Information Processing Systems, 2021.
- Scaling instruction-finetuned language models, 2022.
- Robots that can adapt like animals. Nat., 521(7553):503–507, 2015.
- Diffusion models beat gans on image synthesis. In Marc’Aurelio Ranzato, Alina Beygelzimer, Yann N. Dauphin, Percy Liang, and Jennifer Wortman Vaughan, editors, Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual, pages 8780–8794, 2021.
- Go-explore: a new approach for hard-exploration problems. CoRR, abs/1901.10995, 2019.
- Map-elites with descriptor-conditioned gradients and archive distillation into a single policy. CoRR, abs/2303.03832, 2023.
- Differentiable quality diversity. Advances in Neural Information Processing Systems, 34:10040–10052, 2021.
- Brax - a differentiable physics engine for large scale rigid body simulation, 2021.
- Discovering representations for black-box optimization. Proceedings of the 2020 Genetic and Evolutionary Computation Conference, 2020.
- Hypernetworks. CoRR, abs/1609.09106, 2016.
- Efficiently learning small policies for locomotion and manipulation. arXiv preprint arXiv:2210.00140, 2022.
- Denoising diffusion probabilistic models. In Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin, editors, Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, 2020.
- Generative adversarial policy networks for behavioural repertoire. CoRR, abs/1811.02945, 2018.
- Elucidating the design space of diffusion-based generative models. ArXiv, abs/2206.00364, 2022.
- Parameter prediction for unseen deep architectures. In Advances in Neural Information Processing Systems, 2021.
- Dpm-solver: A fast ode solver for diffusion probabilistic model sampling in around 10 steps. ArXiv, abs/2206.00927, 2022.
- The quality-diversity transformer: Generating behavior-conditioned trajectories with decision transformers. ArXiv, abs/2303.16207, 2023.
- Illuminating search spaces by mapping elites. CoRR, abs/1504.04909, 2015.
- Policy gradient assisted map-elites. In Francisco Chicano and Krzysztof Krawiec, editors, GECCO ’21: Genetic and Evolutionary Computation Conference, Lille, France, July 10-14, 2021, pages 866–875. ACM, 2021.
- Diversity policy gradient for sample efficient quality-diversity optimization. In Jonathan E. Fieldsend and Markus Wagner, editors, GECCO ’22: Genetic and Evolutionary Computation Conference, Boston, Massachusetts, USA, July 9 - 13, 2022, pages 1075–1083. ACM, 2022.
- Policy manifold search: exploring the manifold hypothesis for diversity-based neuroevolution. Proceedings of the Genetic and Evolutionary Computation Conference, 2021.
- High-resolution image synthesis with latent diffusion models. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 18-24, 2022, pages 10674–10685. IEEE, 2022.
- Denoising diffusion implicit models. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net, 2021.
- Score-based generative modeling through stochastic differential equations. ArXiv, abs/2011.13456, 2020.
- Approximating gradients for differentiable quality diversity in reinforcement learning. In Jonathan E. Fieldsend and Markus Wagner, editors, GECCO ’22: Genetic and Evolutionary Computation Conference, Boston, Massachusetts, USA, July 9 - 13, 2022, pages 1102–1111. ACM, 2022.
- Approximating gradients for differentiable quality diversity in reinforcement learning. In Proceedings of the Genetic and Evolutionary Computation Conference, pages 1102–1111, 2022.
- Scaling covariance matrix adaptation map-annealing to high-dimensional controllers. In Deep Reinforcement Learning Workshop NeurIPS 2022.
- Using centroidal voronoi tessellations to scale up the multidimensional archive of phenotypic elites algorithm. IEEE Trans. Evol. Comput., 22(4):623–630, 2018.
- Discovering the elite hypervolume by leveraging interspecies correlation. Proceedings of the Genetic and Evolutionary Computation Conference, 2018.
- Continual learning with hypernetworks. CoRR, abs/1906.00695, 2019.
- Graph hypernetworks for neural architecture search. In 7th International Conference on Learning Representations, ICLR 2019, 2019.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.