Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
126 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Generalized Population-Based Training for Hyperparameter Optimization in Reinforcement Learning (2404.08233v2)

Published 12 Apr 2024 in cs.LG, cs.AI, and cs.NE

Abstract: Hyperparameter optimization plays a key role in the machine learning domain. Its significance is especially pronounced in reinforcement learning (RL), where agents continuously interact with and adapt to their environments, requiring dynamic adjustments in their learning trajectories. To cater to this dynamicity, the Population-Based Training (PBT) was introduced, leveraging the collective intelligence of a population of agents learning simultaneously. However, PBT tends to favor high-performing agents, potentially neglecting the explorative potential of agents on the brink of significant advancements. To mitigate the limitations of PBT, we present the Generalized Population-Based Training (GPBT), a refined framework designed for enhanced granularity and flexibility in hyperparameter adaptation. Complementing GPBT, we further introduce Pairwise Learning (PL). Instead of merely focusing on elite agents, PL employs a comprehensive pairwise strategy to identify performance differentials and provide holistic guidance to underperforming agents. By integrating the capabilities of GPBT and PL, our approach significantly improves upon traditional PBT in terms of adaptability and computational efficiency. Rigorous empirical evaluations across a range of RL benchmarks confirm that our approach consistently outperforms not only the conventional PBT but also its Bayesian-optimized variant.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (50)
  1. D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot et al., “Mastering the game of go with deep neural networks and tree search,” Nature, vol. 529, no. 7587, p. 484, 2016.
  2. V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski et al., “Human-level control through deep reinforcement learning,” Nature, vol. 518, no. 7540, pp. 529–533, 2015.
  3. T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra, “Continuous control with deep reinforcement learning,” in International Conference on Learning Representations, 2016.
  4. Y. Chen, A. Huang, Z. Wang, I. Antonoglou, J. Schrittwieser, D. Silver, and N. de Freitas, “Bayesian optimization in alphago,” arXiv preprint arXiv:1812.06855, 2018.
  5. T. Elsken, J. H. Metzen, and F. Hutter, “Neural architecture search: A survey,” arXiv preprint arXiv:1808.05377, 2018.
  6. J. Parker-Holder, R. Rajan, X. Song, A. Biedenkapp, Y. Miao, T. Eimer, B. Zhang, V. Nguyen, R. Calandra, A. Faust et al., “Automated reinforcement learning (autorl): A survey and open problems,” arXiv preprint arXiv:2201.03916, 2022.
  7. M. Feurer and F. Hutter, “Hyperparameter optimization,” in AutoML: Methods, Sytems, Challenges, F. Hutter, L. Kotthoff, and J. Vanschoren, Eds.   Springer, 2019, ch. 1, pp. 3–33.
  8. J. Wu, X.-Y. Chen, H. Zhang, L.-D. Xiong, H. Lei, and S.-H. Deng, “Hyperparameter optimization for machine learning models based on bayesian optimization,” Journal of Electronic Science and Technology, vol. 17, no. 1, pp. 26–40, 2019.
  9. J. Bergstra and Y. Bengio, “Random search for hyper-parameter optimization,” Journal of Machine Learning Research, vol. 13, no. 2, 2012.
  10. M. Jaderberg, V. Dalibard, S. Osindero, W. M. Czarnecki, J. Donahue, A. Razavi, O. Vinyals, T. Green, I. Dunning, K. Simonyan et al., “Population based training of neural networks,” arXiv preprint arXiv:1711.09846, 2017.
  11. M. Jaderberg, W. M. Czarnecki, I. Dunning, L. Marris, G. Lever, A. G. Castaneda, C. Beattie, N. C. Rabinowitz, A. S. Morcos, A. Ruderman et al., “Human-level performance in 3D multiplayer games with population-based reinforcement learning,” Science, vol. 364, no. 6443, pp. 859–865, 2019.
  12. S. Liu, G. Lever, J. Merel, S. Tunyasuvunakool, N. Heess, and T. Graepel, “Emergent coordination through competition,” in International Conference on Learning Representations, 2019.
  13. Y. Liu, Y. Gao, and W. Yin, “An improved analysis of stochastic gradient descent with momentum,” Advances in Neural Information Processing Systems, vol. 33, pp. 18 261–18 271, 2020.
  14. J. Snoek, H. Larochelle, and R. P. Adams, “Practical bayesian optimization of machine learning algorithms,” Advances in Neural Information Processing Systems, vol. 25, 2012.
  15. L. Li, K. Jamieson, G. DeSalvo, A. Rostamizadeh, and A. Talwalkar, “Hyperband: A novel bandit-based approach to hyperparameter optimization,” The Journal of Machine Learning Research, vol. 18, no. 1, pp. 6765–6816, 2017.
  16. L. Li, K. Jamieson, A. Rostamizadeh, E. Gonina, M. Hardt, B. Recht, and A. Talwalkar, “Massively parallel hyperparameter tuning,” arXiv preprint arXiv:1810.05934, vol. 5, 2018.
  17. A. Eriksson, G. Capi, and K. Doya, “Evolution of meta-parameters in reinforcement learning algorithm,” in Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems, vol. 1.   IEEE, 2003, pp. 412–417.
  18. S. Elfwing, E. Uchibe, and K. Doya, “Online meta-learning by parallel algorithm competition,” in Proceedings of the Genetic and Evolutionary Computation Conference, 2018, pp. 426–433.
  19. S. Paul, V. Kurin, and S. Whiteson, “Fast efficient hyperparameter tuning for policy gradient methods,” Advances in Neural Information Processing Systems, vol. 32, 2019.
  20. J. Parker-Holder, V. Nguyen, and S. J. Roberts, “Provably efficient online hyperparameter optimization with population-based bandits,” Advances in Neural Information Processing Systems, vol. 33, pp. 17 200–17 211, 2020.
  21. H. Bai, R. Cheng, and Y. Jin, “Evolutionary reinforcement learning: A survey,” Intelligent Computing, vol. 2, p. 0025, 2023. [Online]. Available: https://spj.science.org/doi/abs/10.34133/icomputing.0025
  22. S. Falkner, A. Klein, and F. Hutter, “Bohb: Robust and efficient hyperparameter optimization at scale,” in International Conference on Machine Learning.   PMLR, 2018, pp. 1437–1446.
  23. N. M. Aszemi and P. Dominic, “Hyperparameter optimization in convolutional neural network using genetic algorithms,” International Journal of Advanced Computer Science and Applications, vol. 10, no. 6, 2019.
  24. L. Espeholt, H. Soyer, R. Munos, K. Simonyan, V. Mnih, T. Ward, Y. Doron, V. Firoiu, T. Harley, I. Dunning et al., “IMPALA: scalable distributed deep-rl with importance weighted actor-learner architectures,” in International Conference on Machine Learning, 2018.
  25. T. R. Wu, T. H. Wei, and I. C. Wu, “Accelerating and improving alphazero using population based training,” in Proceedings of the AAAI Conference on Artificial Intelligence, 2020.
  26. S. Schmitt, J. J. Hudson, A. Zidek, S. Osindero, C. Doersch, W. M. Czarnecki, J. Z. Leibo, H. Kuttler, A. Zisserman, K. Simonyan et al., “Kickstarting deep reinforcement learning,” arXiv preprint arXiv:1803.03835, 2018.
  27. W. Jung, G. Park, and Y. Sung, “Population-guided parallel policy search for reinforcement learning,” in International Conference on Learning Representations, 2020. [Online]. Available: https://openreview.net/forum?id=rJeINp4KwH
  28. Q. Liu, Y. Wang, and X. Liu, “Pns: Population-guided novelty search for reinforcement learning in hard exploration environments,” in 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2021.
  29. S. Khadka and K. Tumer, “Evolution-guided policy gradient in reinforcement learning,” in International Conference on Neural Information Processing Systems, 2018.
  30. S. Khadka, S. Majumdar, T. Nassar, Z. Dwiel, E. Tumer, S. Miret, Y. Liu, and K. Tumer, “Collaborative evolutionary reinforcement learning,” International Conference on Machine Learning, 2019.
  31. S. Majumdar, S. Khadka, S. Miret, S. Mcaleer, and K. Tumer, “Evolutionary reinforcement learning for sample-efficient multiagent coordination,” in International Conference on Machine Learning, 2020.
  32. E. Conti, V. Madhavan, F. Petroski Such, J. Lehman, K. Stanley, and J. Clune, “Improving exploration in evolution strategies for deep reinforcement learning via a population of novelty-seeking agents,” Advances in Neural Information Processing Systems, vol. 31, 2018.
  33. J. Lehman and K. O. Stanley, “Evolving a diversity of virtual creatures through novelty search and local competition,” in Proceedings of the 13th Annual Conference on Genetic and Evolutionary Computation, 2011.
  34. D. Golovin, B. Solnik, S. Moitra, G. Kochanski, J. Karro, and D. Sculley, “Google vizier: A service for black-box optimization,” in Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining, 2017, pp. 1487–1495.
  35. V. Dalibard and M. Jaderberg, “Faster improvement rate population based training,” arXiv preprint arXiv:2109.13800, 2021.
  36. X. Wan, C. Lu, J. Parker-Holder, P. J. Ball, V. Nguyen, B. Ru, and M. Osborne, “Bayesian generational population-based training,” in International Conference on Learning Representations Workshop on Agent Learning in Open-Endedness, 2022.
  37. F. Vavak and T. C. Fogarty, “Comparison of steady state and generational genetic algorithms for use in nonstationary environments,” in Proceedings of IEEE International Conference on Evolutionary Computation.   IEEE, 1996, pp. 192–195.
  38. S. Jiang and S. Yang, “A steady-state and generational evolutionary algorithm for dynamic multiobjective optimization,” IEEE Transactions on Evolutionary Computation, vol. 21, no. 1, pp. 65 – 82, February 2017.
  39. J. D. Dyer, R. J. Hartfield, G. V. Dozier, and J. E. Burkhalter, “Aerospace design optimization using a steady state real-coded genetic algorithm,” Applied Mathematics and Computation, vol. 218, no. 9, pp. 4710–4730, 2012.
  40. M. Lozano, F. Herrera, and J. R. Cano, “Replacement strategies to preserve useful diversity in steady-state genetic algorithms,” Information sciences, vol. 178, no. 23, pp. 4421–4433, 2008.
  41. T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein, “Introduction to algorithms second edition,” The Knuth-Morris-Pratt Algorithm, 2001.
  42. R. Livni, S. Shalev-Shwartz, and O. Shamir, “On the computational efficiency of training neural networks,” Advances in neural information processing systems, vol. 27, 2014.
  43. R. Cheng and Y. Jin, “A competitive swarm optimizer for large scale optimization,” IEEE Transactions on Cybernetics, vol. 45, no. 2, pp. 191–204, 2014.
  44. P. Henderson, R. Islam, P. Bachman, J. Pineau, D. Precup, and D. Meger, “Deep reinforcement learning that matters,” in Proceedings of the AAAI conference on artificial intelligence, vol. 32, no. 1, 2018.
  45. G. Brockman, V. Cheung, L. Pettersson, J. Schneider, J. Schulman, J. Tang, and W. Zaremba, “OpenAI Gym,” arXiv preprint arXiv:1606.01540, 2016.
  46. J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” arXiv preprint arXiv:1707.06347, 2017.
  47. P. Moritz, R. Nishihara, S. Wang, A. Tumanov, R. Liaw, E. Liang, M. Elibol, Z. Yang, W. Paul, M. I. Jordan, and I. Stoica, “Ray: A distributed framework for emerging AI applications,” in 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18).   Carlsbad, CA: USENIX Association, Oct. 2018, pp. 561–577.
  48. E. Liang, R. Liaw, R. Nishihara, P. Moritz, R. Fox, K. Goldberg, J. Gonzalez, M. Jordan, and I. Stoica, “RLlib: Abstractions for distributed reinforcement learning,” in International Conference on Machine Learning (ICML), 2018.
  49. M. G. Bellemare, Y. Naddaf, J. Veness, and M. Bowling, “The arcade learning environment: An evaluation platform for general agents,” Journal of Artificial Intelligence Research, vol. 47, pp. 253–279, 2013.
  50. V. Mnih, A. P. Badia, M. Mirza, A. Graves, T. Lillicrap, T. Harley, D. Silver, and K. Kavukcuoglu, “Asynchronous methods for deep reinforcement learning,” in International Conference on Machine Learning, 2016.
Citations (3)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets