Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
184 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Evolutionary Strategy Guided Reinforcement Learning via MultiBuffer Communication (2306.11535v1)

Published 20 Jun 2023 in cs.NE and cs.AI

Abstract: Evolutionary Algorithms and Deep Reinforcement Learning have both successfully solved control problems across a variety of domains. Recently, algorithms have been proposed which combine these two methods, aiming to leverage the strengths and mitigate the weaknesses of both approaches. In this paper we introduce a new Evolutionary Reinforcement Learning model which combines a particular family of Evolutionary algorithm called Evolutionary Strategies with the off-policy Deep Reinforcement Learning algorithm TD3. The framework utilises a multi-buffer system instead of using a single shared replay buffer. The multi-buffer system allows for the Evolutionary Strategy to search freely in the search space of policies, without running the risk of overpopulating the replay buffer with poorly performing trajectories which limit the number of desirable policy behaviour examples thus negatively impacting the potential of the Deep Reinforcement Learning within the shared framework. The proposed algorithm is demonstrated to perform competitively with current Evolutionary Reinforcement Learning algorithms on MuJoCo control tasks, outperforming the well known state-of-the-art CEM-RL on 3 of the 4 environments tested.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (30)
  1. Lamarckian Platform: Pushing the Boundaries of Evolutionary Reinforcement Learning towards Asynchronous Commercial Games. IEEE Transactions on Games (2022).
  2. Proximal distilled evolutionary reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 3283–3290.
  3. Improving exploration in evolution strategies for deep reinforcement learning via a population of novelty-seeking agents. Advances in neural information processing systems 31.
  4. Making Reinforcement Learning Work on Swimmer. arXiv preprint arXiv:2208.07587 (2022).
  5. Addressing function approximation error in actor-critic methods. In International conference on machine learning. PMLR, 1587–1596.
  6. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International conference on machine learning. PMLR, 1861–1870.
  7. Ming Jiang and Li Zhang. 2021. An Interactive Evolution Strategy based Deep Convolutional Generative Adversarial Network for 2D Video Game Level Procedural Content Generation. In 2021 International Joint Conference on Neural Networks (IJCNN). IEEE, 1–6.
  8. Population-guided parallel policy search for reinforcement learning. arXiv preprint arXiv:2001.02907 (2020).
  9. Shauharda Khadka and Kagan Tumer. 2018. Evolution-guided policy gradient in reinforcement learning. Advances in Neural Information Processing Systems 31 (2018).
  10. An efficient asynchronous method for integrating evolutionary and gradient-based policy search. Advances in Neural Information Processing Systems 33 (2020), 10124–10135.
  11. ES is more than just a traditional finite-difference approximator. In Proceedings of the Genetic and Evolutionary Computation Conference. 450–457.
  12. Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015).
  13. A Safe and Efficient Lane Change Decision-Making Strategy of Autonomous Driving Based on Deep Reinforcement Learning. Mathematics 10, 9 (2022), 1551.
  14. Tom M Mitchell. 1997. Machine learning. Vol. 1. McGraw-hill New York.
  15. Human-level control through deep reinforcement learning. nature 518, 7540 (2015), 529–533.
  16. Aloïs Pourchot and Olivier Sigaud. 2018. CEM-RL: Combining evolutionary and gradient-based methods for policy search. arXiv preprint arXiv:1810.01222 (2018).
  17. Hogwild!: A lock-free approach to parallelizing stochastic gradient descent. Advances in neural information processing systems 24 (2011).
  18. Marzieh Sadat Esmaeeli and Hamed Malek. 2022. Evolutionary Deep Reinforcement Learning Using Elite Buffer: A Novel Approach Towards DRL Combined with EA in Continuous Control Tasks. arXiv e-prints (2022), arXiv–2209.
  19. Evolution strategies as a scalable alternative to reinforcement learning. arXiv preprint arXiv:1703.03864 (2017).
  20. Eyal Segal and Moshe Sipper. 2022. Adaptive Combination of a Genetic Algorithm and Novelty Search for Deep Neuroevolution. arXiv preprint arXiv:2209.03618 (2022).
  21. FiDi-RL: Incorporating Deep Reinforcement Learning with Finite-Difference Policy Search for Efficient Learning of Continuous Control. arXiv preprint arXiv:1907.00526 (2019).
  22. Olivier Sigaud. 2022. Combining Evolution and Deep Reinforcement Learning for Policy Search: a Survey. ACM Transactions on Evolutionary Learning (2022).
  23. Mastering the game of Go with deep neural networks and tree search. nature 529, 7587 (2016), 484–489.
  24. Kenneth O Stanley and Risto Miikkulainen. 2002. Evolving neural networks through augmenting topologies. Evolutionary computation 10, 2 (2002), 99–127.
  25. Cliff Diving: Exploring Reward Surfaces in Reinforcement Learning Environments. arXiv preprint arXiv:2205.07015 (2022).
  26. Maximum mutation reinforcement learning for scalable control. arXiv preprint arXiv:2007.13690 (2020).
  27. Richard S. Sutton and Andrew G. Barto. 2018. Reinforcement Learning: An Introduction (second ed.). The MIT Press. http://incompleteideas.net/book/the-book-2nd.html
  28. Gerald Tesauro et al. 1995. Temporal difference learning and TD-Gammon. Commun. ACM 38, 3 (1995), 58–68.
  29. An Efficient Evaluation Mechanism for Evolutionary Reinforcement Learning. In International Conference on Intelligent Computing. Springer, 41–50.
  30. Cooperative heterogeneous deep reinforcement learning. Advances in Neural Information Processing Systems 33 (2020), 17455–17465.
Citations (2)

Summary

We haven't generated a summary for this paper yet.