Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Synergizing Quality-Diversity with Descriptor-Conditioned Reinforcement Learning (2401.08632v2)

Published 10 Dec 2023 in cs.NE, cs.AI, cs.LG, and cs.RO

Abstract: A haLLMark of intelligence is the ability to exhibit a wide range of effective behaviors. Inspired by this principle, Quality-Diversity algorithms, such as MAP-Elites, are evolutionary methods designed to generate a set of diverse and high-fitness solutions. However, as a genetic algorithm, MAP-Elites relies on random mutations, which can become inefficient in high-dimensional search spaces, thus limiting its scalability to more complex domains, such as learning to control agents directly from high-dimensional inputs. To address this limitation, advanced methods like PGA-MAP-Elites and DCG-MAP-Elites have been developed, which combine actor-critic techniques from Reinforcement Learning with MAP-Elites, significantly enhancing the performance and efficiency of Quality-Diversity algorithms in complex, high-dimensional tasks. While these methods have successfully leveraged the trained critic to guide more effective mutations, the potential of the trained actor remains underutilized in improving both the quality and diversity of the evolved population. In this work, we introduce DCRL-MAP-Elites, an extension of DCG-MAP-Elites that utilizes the descriptor-conditioned actor as a generative model to produce diverse solutions, which are then injected into the offspring batch at each generation. Additionally, we present an empirical analysis of the fitness and descriptor reproducibility of the solutions discovered by each algorithm. Finally, we present a second empirical analysis shedding light on the synergies between the different variations operators and explaining the performance improvement from PGA-MAP-Elites to DCRL-MAP-Elites.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (45)
  1. Hindsight Experience Replay. In Advances in Neural Information Processing Systems, Vol. 30. Curran Associates, Inc. https://proceedings.neurips.cc/paper_files/paper/2017/hash/453fadbd8a1a3af50a9df4df899537b5-Abstract.html
  2. Neuroevolution is a Competitive Alternative to Reinforcement Learning for Skill Discovery. https://openreview.net/forum?id=6BHlZgyPOZY
  3. QDax: A Library for Quality-Diversity and Population-based Algorithms with Hardware Acceleration. arXiv:2308.03665 [cs.AI]
  4. Assessing Quality-Diversity Neuro-Evolution Algorithms Performance in Hard Exploration Problems. https://doi.org/10.48550/arXiv.2211.13742 arXiv:2211.13742 [cs].
  5. Quality-Diversity Optimization: A Novel Branch of Stochastic Optimization. In Black Box Optimization, Machine Learning, and No-Free Lunch Theorems, Panos M. Pardalos, Varvara Rasskazova, and Michael N. Vrahatis (Eds.). Springer International Publishing, Cham, 109–135. https://doi.org/10.1007/978-3-030-66515-9_4
  6. Reset-free Trial-and-Error Learning for Robot Damage Recovery. Robotics and Autonomous Systems 100 (Feb. 2018), 236–250. https://doi.org/10.1016/j.robot.2017.11.010
  7. Scaling MAP-Elites to deep neuroevolution. In Proceedings of the 2020 Genetic and Evolutionary Computation Conference (GECCO ’20). Association for Computing Machinery, New York, NY, USA, 67–75. https://doi.org/10.1145/3377930.3390217
  8. Robots that can adapt like animals. Nature 521, 7553 (May 2015), 503–507. https://doi.org/10.1038/nature14422 Number: 7553 Publisher: Nature Publishing Group.
  9. Antoine Cully and Yiannis Demiris. 2018. Quality and Diversity Optimization: A Unifying Modular Framework. IEEE Transactions on Evolutionary Computation 22, 2 (2018), 245–259. https://doi.org/10.1109/TEVC.2017.2704781
  10. First return, then explore. Nature 590, 7847 (Feb. 2021), 580–586. https://doi.org/10.1038/s41586-020-03157-9 Number: 7847 Publisher: Nature Publishing Group.
  11. Diversity is All You Need: Learning Skills without a Reward Function. https://doi.org/10.48550/arXiv.1802.06070 arXiv:1802.06070 [cs].
  12. MAP-Elites with Descriptor-Conditioned Gradients and Archive Distillation into a Single Policy. In Proceedings of the Genetic and Evolutionary Computation Conference (Lisbon, Portugal) (GECCO ’23). Association for Computing Machinery, New York, NY, USA, 138–146. https://doi.org/10.1145/3583131.3590503
  13. Empirical analysis of PGA-MAP-Elites for Neuroevolution in Uncertain Domains. ACM Transactions on Evolutionary Learning and Optimization 3, 1 (March 2023), 1:1–1:32. https://doi.org/10.1145/3577203
  14. Manon Flageat and Antoine Cully. 2023. Uncertain Quality-Diversity: Evaluation methodology and new methods for Quality-Diversity in Uncertain Domains. IEEE Transactions on Evolutionary Computation (2023).
  15. Benchmarking Quality-Diversity Algorithms on Neuroevolution for Reinforcement Learning. https://doi.org/10.48550/arXiv.2211.02193 arXiv:2211.02193 [cs].
  16. Matthew Fontaine and Stefanos Nikolaidis. 2021. Differentiable Quality Diversity. In Advances in Neural Information Processing Systems, Vol. 34. Curran Associates, Inc., 10040–10052. https://proceedings.neurips.cc/paper/2021/hash/532923f11ac97d3e7cb0130315b067dc-Abstract.html
  17. Matthew Fontaine and Stefanos Nikolaidis. 2023. Covariance Matrix Adaptation MAP-Annealing. In Proceedings of the Genetic and Evolutionary Computation Conference (GECCO ’23). Association for Computing Machinery, New York, NY, USA, 456–465. https://doi.org/10.1145/3583131.3590389
  18. Brax - A Differentiable Physics Engine for Large Scale Rigid Body Simulation. http://github.com/google/brax
  19. Addressing Function Approximation Error in Actor-Critic Methods. In Proceedings of the 35th International Conference on Machine Learning. PMLR, 1587–1596. https://proceedings.mlr.press/v80/fujimoto18a.html ISSN: 2640-3498.
  20. Variational Intrinsic Control. https://doi.org/10.48550/arXiv.1611.07507 arXiv:1611.07507 [cs].
  21. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. In Proceedings of the 35th International Conference on Machine Learning. PMLR, 1861–1870. https://proceedings.mlr.press/v80/haarnoja18b.html ISSN: 2640-3498.
  22. Nikolaus Hansen. 2023. The CMA Evolution Strategy: A Tutorial. https://doi.org/10.48550/arXiv.1604.00772 arXiv:1604.00772 [cs, stat].
  23. Emergence of Locomotion Behaviours in Rich Environments. (July 2017).
  24. Multilayer feedforward networks are universal approximators. Neural Networks 2, 5 (1989), 359–366. https://doi.org/10.1016/0893-6080(89)90020-8
  25. Population Based Training of Neural Networks. https://doi.org/10.48550/arXiv.1711.09846 arXiv:1711.09846 [cs].
  26. Behavioural Repertoire via Generative Adversarial Policy Networks. In 2019 Joint IEEE 9th International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob). https://doi.org/10.1109/ICDL-EpiRob44920.2019 arXiv:1811.02945 [cs, stat].
  27. One Solution is Not All You Need: Few-Shot Extrapolation via Structured MaxEnt RL. In Advances in Neural Information Processing Systems, Vol. 33. Curran Associates, Inc., 8198–8210. https://proceedings.neurips.cc/paper/2020/hash/5d151d1059a6281335a10732fc49620e-Abstract.html
  28. Continuous control with deep reinforcement learning. In 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, Conference Track Proceedings, Yoshua Bengio and Yann LeCun (Eds.). http://arxiv.org/abs/1509.02971
  29. Human-level control through deep reinforcement learning. Nature 518, 7540 (Feb. 2015), 529–533. https://doi.org/10.1038/nature14236 Number: 7540 Publisher: Nature Publishing Group.
  30. Jean-Baptiste Mouret and Jeff Clune. 2015. Illuminating search spaces by mapping elites. CoRR abs/1504.04909 (2015). arXiv:1504.04909 http://arxiv.org/abs/1504.04909
  31. Olle Nilsson and Antoine Cully. 2021. Policy gradient assisted MAP-Elites. In Proceedings of the Genetic and Evolutionary Computation Conference (GECCO ’21). Association for Computing Machinery, New York, NY, USA, 866–875. https://doi.org/10.1145/3449639.3459304
  32. Solving Rubik’s Cube with a Robot Hand. https://doi.org/10.48550/arXiv.1910.07113 arXiv:1910.07113 [cs, stat].
  33. Thomas Pierrot and Arthur Flajolet. 2023. Evolving Populations of Diverse RL Agents with MAP-Elites. https://doi.org/10.48550/arXiv.2303.12803 arXiv:2303.12803 [cs].
  34. Diversity policy gradient for sample efficient quality-diversity optimization. In Proceedings of the Genetic and Evolutionary Computation Conference (GECCO ’22). Association for Computing Machinery, New York, NY, USA, 1075–1083. https://doi.org/10.1145/3512290.3528845
  35. Quality Diversity: A New Frontier for Evolutionary Computation. Frontiers in Robotics and AI 3 (2016). https://www.frontiersin.org/articles/10.3389/frobt.2016.00040
  36. Evolution strategies as a scalable alternative to reinforcement learning. arXiv preprint arXiv:1703.03864 (2017).
  37. Universal Value Function Approximators. In Proceedings of the 32nd International Conference on Machine Learning. PMLR, 1312–1320. https://proceedings.mlr.press/v37/schaul15.html ISSN: 1938-7228.
  38. Dynamics-Aware Unsupervised Discovery of Skills. https://openreview.net/forum?id=HJgLZR4KvH
  39. Mastering the game of Go with deep neural networks and tree search. Nature 529, 7587 (Jan. 2016), 484–489. https://doi.org/10.1038/nature16961 Number: 7587 Publisher: Nature Publishing Group.
  40. Deterministic Policy Gradient Algorithms. In Proceedings of the 31st International Conference on Machine Learning. PMLR, 387–395. https://proceedings.mlr.press/v32/silver14.html ISSN: 1938-7228.
  41. Training Diverse High-Dimensional Controllers by Scaling Covariance Matrix Adaptation MAP-Annealing. https://doi.org/10.48550/arXiv.2210.02622 arXiv:2210.02622 [cs].
  42. Approximating gradients for differentiable quality diversity in reinforcement learning. In Proceedings of the Genetic and Evolutionary Computation Conference (GECCO ’22). Association for Computing Machinery, New York, NY, USA, 1102–1111. https://doi.org/10.1145/3512290.3528705
  43. Using Centroidal Voronoi Tessellations to Scale Up the Multidimensional Archive of Phenotypic Elites Algorithm. IEEE Transactions on Evolutionary Computation 22, 4 (2018), 623–630. https://doi.org/10.1109/TEVC.2017.2735550
  44. Vassiiis Vassiliades and Jean-Baptiste Mouret. 2018. Discovering the elite hypervolume by leveraging interspecies correlation. In Proceedings of the Genetic and Evolutionary Computation Conference (GECCO ’18). Association for Computing Machinery, New York, NY, USA, 149–156. https://doi.org/10.1145/3205455.3205602
  45. Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature 575, 7782 (Nov. 2019), 350–354. https://doi.org/10.1038/s41586-019-1724-z Number: 7782 Publisher: Nature Publishing Group.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Maxence Faldor (11 papers)
  2. Manon Flageat (17 papers)
  3. Antoine Cully (68 papers)
  4. Félix Chalumeau (4 papers)
Citations (2)

Summary

We haven't generated a summary for this paper yet.