Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Evolving Populations of Diverse RL Agents with MAP-Elites (2303.12803v2)

Published 9 Mar 2023 in cs.NE and cs.AI

Abstract: Quality Diversity (QD) has emerged as a powerful alternative optimization paradigm that aims at generating large and diverse collections of solutions, notably with its flagship algorithm MAP-ELITES (ME) which evolves solutions through mutations and crossovers. While very effective for some unstructured problems, early ME implementations relied exclusively on random search to evolve the population of solutions, rendering them notoriously sample-inefficient for high-dimensional problems, such as when evolving neural networks. Follow-up works considered exploiting gradient information to guide the search in order to address these shortcomings through techniques borrowed from either Black-Box Optimization (BBO) or Reinforcement Learning (RL). While mixing RL techniques with ME unlocked state-of-the-art performance for robotics control problems that require a good amount of exploration, it also plagued these ME variants with limitations common among RL algorithms that ME was free of, such as hyperparameter sensitivity, high stochasticity as well as training instability, including when the population size increases as some components are shared across the population in recent approaches. Furthermore, existing approaches mixing ME with RL tend to be tied to a specific RL algorithm, which effectively prevents their use on problems where the corresponding RL algorithm fails. To address these shortcomings, we introduce a flexible framework that allows the use of any RL algorithm and alleviates the aforementioned limitations by evolving populations of agents (whose definition include hyperparameters and all learnable parameters) instead of just policies. We demonstrate the benefits brought about by our framework through extensive numerical experiments on a number of robotics control problems, some of which with deceptive rewards, taken from the QD-RL literature.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (36)
  1. JAX: composable transformations of Python+NumPy programs, 2018. URL http://github.com/google/jax.
  2. GEP-PG: decoupling exploration and exploitation in deep reinforcement learning algorithms. In Proceedings of the International Conference on Machine Learning, volume 80, pp.  1038–1047, 2018.
  3. Scaling map-elites to deep neuroevolution. In Proceedings of the Genetic and Evolutionary Computation Conference, pp.  67–75, 2020.
  4. Improving exploration in evolution strategies for deep reinforcement learning via a population of novelty-seeking agents. In Advances in Neural Information Processing Systems, pp. 5027–5038, 2018.
  5. Hierarchical behavioral repertoires with unsupervised descriptors. In Proceedings of the Genetic and Evolutionary Computation Conference, pp.  69–76, 2018.
  6. Robots that can adapt like animals. Nature, 521(7553):503–507, 2015.
  7. Fast population-based reinforcement learning on a single machine. In Proceedings of the International Conference on Machine Learning, pp.  6533–6547, 2022.
  8. Differentiable quality diversity. In Advances in Neural Information Processing Systems, volume 34, pp.  10040–10052, 2021.
  9. Covariance matrix adaptation for the rapid illumination of behavior space. In Proceedings of the Genetic and Evolutionary Computation Conference, pp.  94–102, 2020.
  10. Brax–a differentiable physics engine for large scale rigid body simulation. arXiv preprint arXiv:2106.13281, 2021.
  11. Addressing function approximation error in actor-critic methods. In Proceedings of the International Conference on Machine Learning, pp.  1587–1596, 2018.
  12. Data-efficient design exploration through surrogate-assisted illumination. Evolutionary computation, 26(3):381–410, 2018.
  13. Are quality diversity algorithms better at generating stepping stones than objective-based search? In Proceedings of the Genetic and Evolutionary Computation Conference Companion, pp.  115–116, 2019.
  14. Procedural content generation through quality diversity. In 2019 IEEE Conference on Games (CoG), pp.  1–8, 2019.
  15. Soft actor-critic algorithms and applications. arXiv preprint arXiv:1812.05905, 2018.
  16. Population-based training of neural networks. arXiv preprint arXiv:1711.09846, 2017.
  17. Population-guided parallel policy search for reinforcement learning. In Proceedings of the International Conference on Learning Representations, 2020.
  18. Evolution-guided policy gradient in reinforcement learning. In Neural Information Processing Systems, volume 31, 2018.
  19. Collaborative evolutionary reinforcement learning. In Proceedings of the International Conference on Machine Learning, pp.  3341–3350, 2019.
  20. Abandoning objectives: Evolution through the search for novelty alone. Evolutionary computation, 19(2):189–223, 2011.
  21. The surprising creativity of digital evolution: A collection of anecdotes from the evolutionary computation and artificial life research communities. Artificial life, 26(2):274–306, 2020.
  22. Accelerated quality-diversity for robotics through massive parallelism. arXiv preprint arXiv:2202.01258, 2022.
  23. Illuminating search spaces by mapping elites. arXiv preprint arXiv:1504.04909, 2015.
  24. Policy gradient assisted map-elites. In Proceedings of the Genetic and Evolutionary Computation Conference, pp.  866–875, 2021.
  25. Effective diversity in population-based reinforcement learning. In Neural Information Processing Systems, volume 33, pp. 18050–18062, 2020.
  26. Diversity policy gradient for sample efficient quality-diversity optimization. In Proceedings of the Genetic and Evolutionary Computation Conference, pp.  1075–1083, 2022.
  27. CEM-RL: combining evolutionary and gradient-based methods for policy search. In Proceedings of the International Conference on Learning Representations, 2019.
  28. Policy manifold search: Exploring the manifold hypothesis for diversity-based neuroevolution. In Proceedings of the Genetic and Evolutionary Computation Conference, pp.  901–909, 2021.
  29. Evolution strategies as a scalable alternative to reinforcement learning. arXiv preprint arXiv:1703.03864, 2017.
  30. Generating and blending game levels via quality-diversity in the latent space of a variational autoencoder. In Proceedings of the International Conference on the Foundations of Digital Games, pp.  1–11, 2021.
  31. Evolving neural networks through augmenting topologies. Evolutionary computation, 10(2):99–127, 2002.
  32. Evojax: Hardware-accelerated neuroevolution. In Proceedings of the Genetic and Evolutionary Computation Conference Companion, pp.  308–311, 2022.
  33. Approximating gradients for differentiable quality diversity in reinforcement learning. In Proceedings of the Genetic and Evolutionary Computation Conference, pp.  1102–1111, 2022.
  34. Discovering the elite hypervolume by leveraging interspecies correlation. In Proceedings of the Genetic and Evolutionary Computation Conference, pp.  149–156, 2018.
  35. Using centroidal voronoi tessellations to scale up the multidimensional archive of phenotypic elites algorithm. IEEE Transactions on Evolutionary Computation, 22(4):623–630, 2017.
  36. On the importance of hyperparameter optimization for model-based reinforcement learning. In International Conference on Artificial Intelligence and Statistics, pp.  4015–4023, 2021.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Thomas Pierrot (21 papers)
  2. Arthur Flajolet (10 papers)
Citations (6)

Summary

We haven't generated a summary for this paper yet.