Synergizing Quality-Diversity with Descriptor-Conditioned Reinforcement Learning (2401.08632v2)
Abstract: A haLLMark of intelligence is the ability to exhibit a wide range of effective behaviors. Inspired by this principle, Quality-Diversity algorithms, such as MAP-Elites, are evolutionary methods designed to generate a set of diverse and high-fitness solutions. However, as a genetic algorithm, MAP-Elites relies on random mutations, which can become inefficient in high-dimensional search spaces, thus limiting its scalability to more complex domains, such as learning to control agents directly from high-dimensional inputs. To address this limitation, advanced methods like PGA-MAP-Elites and DCG-MAP-Elites have been developed, which combine actor-critic techniques from Reinforcement Learning with MAP-Elites, significantly enhancing the performance and efficiency of Quality-Diversity algorithms in complex, high-dimensional tasks. While these methods have successfully leveraged the trained critic to guide more effective mutations, the potential of the trained actor remains underutilized in improving both the quality and diversity of the evolved population. In this work, we introduce DCRL-MAP-Elites, an extension of DCG-MAP-Elites that utilizes the descriptor-conditioned actor as a generative model to produce diverse solutions, which are then injected into the offspring batch at each generation. Additionally, we present an empirical analysis of the fitness and descriptor reproducibility of the solutions discovered by each algorithm. Finally, we present a second empirical analysis shedding light on the synergies between the different variations operators and explaining the performance improvement from PGA-MAP-Elites to DCRL-MAP-Elites.
- Hindsight Experience Replay. In Advances in Neural Information Processing Systems, Vol. 30. Curran Associates, Inc. https://proceedings.neurips.cc/paper_files/paper/2017/hash/453fadbd8a1a3af50a9df4df899537b5-Abstract.html
- Neuroevolution is a Competitive Alternative to Reinforcement Learning for Skill Discovery. https://openreview.net/forum?id=6BHlZgyPOZY
- QDax: A Library for Quality-Diversity and Population-based Algorithms with Hardware Acceleration. arXiv:2308.03665 [cs.AI]
- Assessing Quality-Diversity Neuro-Evolution Algorithms Performance in Hard Exploration Problems. https://doi.org/10.48550/arXiv.2211.13742 arXiv:2211.13742 [cs].
- Quality-Diversity Optimization: A Novel Branch of Stochastic Optimization. In Black Box Optimization, Machine Learning, and No-Free Lunch Theorems, Panos M. Pardalos, Varvara Rasskazova, and Michael N. Vrahatis (Eds.). Springer International Publishing, Cham, 109–135. https://doi.org/10.1007/978-3-030-66515-9_4
- Reset-free Trial-and-Error Learning for Robot Damage Recovery. Robotics and Autonomous Systems 100 (Feb. 2018), 236–250. https://doi.org/10.1016/j.robot.2017.11.010
- Scaling MAP-Elites to deep neuroevolution. In Proceedings of the 2020 Genetic and Evolutionary Computation Conference (GECCO ’20). Association for Computing Machinery, New York, NY, USA, 67–75. https://doi.org/10.1145/3377930.3390217
- Robots that can adapt like animals. Nature 521, 7553 (May 2015), 503–507. https://doi.org/10.1038/nature14422 Number: 7553 Publisher: Nature Publishing Group.
- Antoine Cully and Yiannis Demiris. 2018. Quality and Diversity Optimization: A Unifying Modular Framework. IEEE Transactions on Evolutionary Computation 22, 2 (2018), 245–259. https://doi.org/10.1109/TEVC.2017.2704781
- First return, then explore. Nature 590, 7847 (Feb. 2021), 580–586. https://doi.org/10.1038/s41586-020-03157-9 Number: 7847 Publisher: Nature Publishing Group.
- Diversity is All You Need: Learning Skills without a Reward Function. https://doi.org/10.48550/arXiv.1802.06070 arXiv:1802.06070 [cs].
- MAP-Elites with Descriptor-Conditioned Gradients and Archive Distillation into a Single Policy. In Proceedings of the Genetic and Evolutionary Computation Conference (Lisbon, Portugal) (GECCO ’23). Association for Computing Machinery, New York, NY, USA, 138–146. https://doi.org/10.1145/3583131.3590503
- Empirical analysis of PGA-MAP-Elites for Neuroevolution in Uncertain Domains. ACM Transactions on Evolutionary Learning and Optimization 3, 1 (March 2023), 1:1–1:32. https://doi.org/10.1145/3577203
- Manon Flageat and Antoine Cully. 2023. Uncertain Quality-Diversity: Evaluation methodology and new methods for Quality-Diversity in Uncertain Domains. IEEE Transactions on Evolutionary Computation (2023).
- Benchmarking Quality-Diversity Algorithms on Neuroevolution for Reinforcement Learning. https://doi.org/10.48550/arXiv.2211.02193 arXiv:2211.02193 [cs].
- Matthew Fontaine and Stefanos Nikolaidis. 2021. Differentiable Quality Diversity. In Advances in Neural Information Processing Systems, Vol. 34. Curran Associates, Inc., 10040–10052. https://proceedings.neurips.cc/paper/2021/hash/532923f11ac97d3e7cb0130315b067dc-Abstract.html
- Matthew Fontaine and Stefanos Nikolaidis. 2023. Covariance Matrix Adaptation MAP-Annealing. In Proceedings of the Genetic and Evolutionary Computation Conference (GECCO ’23). Association for Computing Machinery, New York, NY, USA, 456–465. https://doi.org/10.1145/3583131.3590389
- Brax - A Differentiable Physics Engine for Large Scale Rigid Body Simulation. http://github.com/google/brax
- Addressing Function Approximation Error in Actor-Critic Methods. In Proceedings of the 35th International Conference on Machine Learning. PMLR, 1587–1596. https://proceedings.mlr.press/v80/fujimoto18a.html ISSN: 2640-3498.
- Variational Intrinsic Control. https://doi.org/10.48550/arXiv.1611.07507 arXiv:1611.07507 [cs].
- Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. In Proceedings of the 35th International Conference on Machine Learning. PMLR, 1861–1870. https://proceedings.mlr.press/v80/haarnoja18b.html ISSN: 2640-3498.
- Nikolaus Hansen. 2023. The CMA Evolution Strategy: A Tutorial. https://doi.org/10.48550/arXiv.1604.00772 arXiv:1604.00772 [cs, stat].
- Emergence of Locomotion Behaviours in Rich Environments. (July 2017).
- Multilayer feedforward networks are universal approximators. Neural Networks 2, 5 (1989), 359–366. https://doi.org/10.1016/0893-6080(89)90020-8
- Population Based Training of Neural Networks. https://doi.org/10.48550/arXiv.1711.09846 arXiv:1711.09846 [cs].
- Behavioural Repertoire via Generative Adversarial Policy Networks. In 2019 Joint IEEE 9th International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob). https://doi.org/10.1109/ICDL-EpiRob44920.2019 arXiv:1811.02945 [cs, stat].
- One Solution is Not All You Need: Few-Shot Extrapolation via Structured MaxEnt RL. In Advances in Neural Information Processing Systems, Vol. 33. Curran Associates, Inc., 8198–8210. https://proceedings.neurips.cc/paper/2020/hash/5d151d1059a6281335a10732fc49620e-Abstract.html
- Continuous control with deep reinforcement learning. In 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, Conference Track Proceedings, Yoshua Bengio and Yann LeCun (Eds.). http://arxiv.org/abs/1509.02971
- Human-level control through deep reinforcement learning. Nature 518, 7540 (Feb. 2015), 529–533. https://doi.org/10.1038/nature14236 Number: 7540 Publisher: Nature Publishing Group.
- Jean-Baptiste Mouret and Jeff Clune. 2015. Illuminating search spaces by mapping elites. CoRR abs/1504.04909 (2015). arXiv:1504.04909 http://arxiv.org/abs/1504.04909
- Olle Nilsson and Antoine Cully. 2021. Policy gradient assisted MAP-Elites. In Proceedings of the Genetic and Evolutionary Computation Conference (GECCO ’21). Association for Computing Machinery, New York, NY, USA, 866–875. https://doi.org/10.1145/3449639.3459304
- Solving Rubik’s Cube with a Robot Hand. https://doi.org/10.48550/arXiv.1910.07113 arXiv:1910.07113 [cs, stat].
- Thomas Pierrot and Arthur Flajolet. 2023. Evolving Populations of Diverse RL Agents with MAP-Elites. https://doi.org/10.48550/arXiv.2303.12803 arXiv:2303.12803 [cs].
- Diversity policy gradient for sample efficient quality-diversity optimization. In Proceedings of the Genetic and Evolutionary Computation Conference (GECCO ’22). Association for Computing Machinery, New York, NY, USA, 1075–1083. https://doi.org/10.1145/3512290.3528845
- Quality Diversity: A New Frontier for Evolutionary Computation. Frontiers in Robotics and AI 3 (2016). https://www.frontiersin.org/articles/10.3389/frobt.2016.00040
- Evolution strategies as a scalable alternative to reinforcement learning. arXiv preprint arXiv:1703.03864 (2017).
- Universal Value Function Approximators. In Proceedings of the 32nd International Conference on Machine Learning. PMLR, 1312–1320. https://proceedings.mlr.press/v37/schaul15.html ISSN: 1938-7228.
- Dynamics-Aware Unsupervised Discovery of Skills. https://openreview.net/forum?id=HJgLZR4KvH
- Mastering the game of Go with deep neural networks and tree search. Nature 529, 7587 (Jan. 2016), 484–489. https://doi.org/10.1038/nature16961 Number: 7587 Publisher: Nature Publishing Group.
- Deterministic Policy Gradient Algorithms. In Proceedings of the 31st International Conference on Machine Learning. PMLR, 387–395. https://proceedings.mlr.press/v32/silver14.html ISSN: 1938-7228.
- Training Diverse High-Dimensional Controllers by Scaling Covariance Matrix Adaptation MAP-Annealing. https://doi.org/10.48550/arXiv.2210.02622 arXiv:2210.02622 [cs].
- Approximating gradients for differentiable quality diversity in reinforcement learning. In Proceedings of the Genetic and Evolutionary Computation Conference (GECCO ’22). Association for Computing Machinery, New York, NY, USA, 1102–1111. https://doi.org/10.1145/3512290.3528705
- Using Centroidal Voronoi Tessellations to Scale Up the Multidimensional Archive of Phenotypic Elites Algorithm. IEEE Transactions on Evolutionary Computation 22, 4 (2018), 623–630. https://doi.org/10.1109/TEVC.2017.2735550
- Vassiiis Vassiliades and Jean-Baptiste Mouret. 2018. Discovering the elite hypervolume by leveraging interspecies correlation. In Proceedings of the Genetic and Evolutionary Computation Conference (GECCO ’18). Association for Computing Machinery, New York, NY, USA, 149–156. https://doi.org/10.1145/3205455.3205602
- Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature 575, 7782 (Nov. 2019), 350–354. https://doi.org/10.1038/s41586-019-1724-z Number: 7782 Publisher: Nature Publishing Group.
- Maxence Faldor (11 papers)
- Manon Flageat (17 papers)
- Antoine Cully (68 papers)
- Félix Chalumeau (4 papers)