Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
153 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Discovering General Reinforcement Learning Algorithms with Adversarial Environment Design (2310.02782v1)

Published 4 Oct 2023 in cs.LG and cs.AI

Abstract: The past decade has seen vast progress in deep reinforcement learning (RL) on the back of algorithms manually designed by human researchers. Recently, it has been shown that it is possible to meta-learn update rules, with the hope of discovering algorithms that can perform well on a wide range of RL tasks. Despite impressive initial results from algorithms such as Learned Policy Gradient (LPG), there remains a generalization gap when these algorithms are applied to unseen environments. In this work, we examine how characteristics of the meta-training distribution impact the generalization performance of these algorithms. Motivated by this analysis and building on ideas from Unsupervised Environment Design (UED), we propose a novel approach for automatically generating curricula to maximize the regret of a meta-learned optimizer, in addition to a novel approximation of regret, which we name algorithmic regret (AR). The result is our method, General RL Optimizers Obtained Via Environment Design (GROOVE). In a series of experiments, we show that GROOVE achieves superior generalization to LPG, and evaluate AR against baseline metrics from UED, identifying it as a critical component of environment design in this setting. We believe this approach is a step towards the discovery of truly general RL algorithms, capable of solving a wide range of real-world environments.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (57)
  1. Deep reinforcement learning at the edge of the statistical precipice. Advances in neural information processing systems, 34:29304–29320, 2021.
  2. Meta-learning curiosity algorithms. CoRR, abs/2003.05325, 2020. URL https://arxiv.org/abs/2003.05325.
  3. What matters for on-policy deep actor-critic methods? a large-scale study. In International Conference on Learning Representations, 2021.
  4. Learning dexterous in-hand manipulation. The International Journal of Robotics Research, 39(1):3–20, 2020.
  5. Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Transactions on Systems, Man, and Cybernetics, SMC-13(5):834–846, 1983. doi: 10.1109/TSMC.1983.6313077.
  6. Meta learning via learned loss. In 2020 25th International Conference on Pattern Recognition (ICPR), pages 4161–4168. IEEE, 2021.
  7. Hypernetworks in meta-reinforcement learning. In Conference on Robot Learning, pages 1478–1487. PMLR, 2023a.
  8. A survey of meta-reinforcement learning. arXiv preprint arXiv:2301.08028, 2023b.
  9. The arcade learning environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research, 47:253–279, 2013.
  10. Dota 2 with large scale deep reinforcement learning. CoRR, abs/1912.06680, 2019.
  11. JAX: composable transformations of Python+NumPy programs. 2018. URL http://github.com/google/jax.
  12. Minimal criterion coevolution: A new approach to open-ended search. In Proceedings of the Genetic and Evolutionary Computation Conference, GECCO ’17, page 67–74, New York, NY, USA, 2017. Association for Computing Machinery. ISBN 9781450349208. doi: 10.1145/3071178.3071186.
  13. J. Clune. Ai-gas: Ai-generating algorithms, an alternate paradigm for producing general artificial intelligence. CoRR, abs/1905.10985, 2019.
  14. Evolving reinforcement learning algorithms. arXiv preprint arXiv:2101.03958, 2021.
  15. Magnetic control of tokamak plasmas through deep reinforcement learning. Nature, 602:414–419, 02 2022.
  16. Emergent complexity and zero-shot transfer via unsupervised environment design. Advances in neural information processing systems, 33:13049–13061, 2020.
  17. Rl 22{}^{2}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPT: Fast reinforcement learning via slow reinforcement learning. arXiv preprint arXiv:1611.02779, 2016.
  18. Model-agnostic meta-learning for fast adaptation of deep networks. In International conference on machine learning, pages 1126–1135. PMLR, 2017.
  19. Multi-objective evolution for generalizable policy gradient algorithms. arXiv preprint arXiv:2204.04292, 2022.
  20. Deep reinforcement learning that matters. AAAI, 2018.
  21. Rainbow: Combining improvements in deep reinforcement learning. In Proceedings of the AAAI conference on artificial intelligence, volume 32, 2018.
  22. S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural computation, 9(8):1735–1780, 1997.
  23. Evolved policy gradients. Advances in Neural Information Processing Systems, 31, 2018.
  24. Meta reinforcement learning as task inference. arXiv preprint arXiv:1905.06424, 2019.
  25. N. Jakobi. Evolutionary robotics and the radical envelope-of-noise hypothesis. Adaptive behavior, 6(2):325–368, 1997.
  26. Replay-guided adversarial environment design. Advances in Neural Information Processing Systems, 34:1884–1897, 2021a.
  27. Prioritized level replay. In International Conference on Machine Learning, pages 4940–4950. PMLR, 2021b.
  28. Grounding aleatoric uncertainty for unsupervised environment design. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, editors, Advances in Neural Information Processing Systems, volume 35, pages 32868–32881. Curran Associates, Inc., 2022.
  29. Improving generalization in meta reinforcement learning using learned objectives. In International Conference on Learning Representations, 2020.
  30. Decoupling exploration and exploitation for meta-reinforcement learning without sacrifices. In International conference on machine learning, pages 6925–6935. PMLR, 2021.
  31. Discovered policy optimisation. Advances in Neural Information Processing Systems, 35:16455–16468, 2022.
  32. Learning robust perceptive locomotion for quadrupedal robots in the wild. Science Robotics, 7(62):eabk2822, 2022.
  33. Asynchronous methods for deep reinforcement learning. In International conference on machine learning, pages 1928–1937. PMLR, 2016.
  34. Discovering reinforcement learning algorithms. Advances in Neural Information Processing Systems, 33:1060–1070, 2020.
  35. Solving rubik’s cube with a robot hand. CoRR, abs/1910.07113, 2019.
  36. Evolving curricula with regret-based environment design. arXiv preprint arXiv:2203.01302, 2022a.
  37. Automated reinforcement learning (autorl): A survey and open problems. Journal of Artificial Intelligence Research, 74:517–568, 2022b.
  38. Rapid learning or feature reuse? towards understanding the effectiveness of maml. arXiv preprint arXiv:1909.09157, 2019.
  39. I. Rechenberg. Evolutionsstrategien. In B. Schneider and U. Ranft, editors, Simulationsmethoden in der Medizin und Biologie, pages 83–114, Berlin, Heidelberg, 1978. Springer Berlin Heidelberg. ISBN 978-3-642-81283-5.
  40. Evolution strategies as a scalable alternative to reinforcement learning. CoRR, 2017.
  41. MAESTRO: Open-ended environment design for multi-agent reinforcement learning. In The Eleventh International Conference on Learning Representations, 2023.
  42. High-dimensional continuous control using generalized advantage estimation. In Y. Bengio and Y. LeCun, editors, 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, Conference Track Proceedings, 2016. URL http://arxiv.org/abs/1506.02438.
  43. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
  44. Mastering the game of Go with deep neural networks and tree search. Nature, 529:484–489, 2016.
  45. Mastering chess and shogi by self-play with a general reinforcement learning algorithm. Science, 2017.
  46. L. Soros and K. Stanley. Identifying necessary conditions for open-ended evolution through the artificial life world of chromaria. In ALIFE 14: The Fourteenth International Conference on the Synthesis and Simulation of Living Systems, pages 793–800. MIT Press, 2014.
  47. Introduction to Reinforcement Learning. MIT Press, Cambridge, MA, USA, 1st edition, 1998. ISBN 0262193981.
  48. Human-timescale adaptation in an open-ended task space. arXiv preprint arXiv:2301.07608, 2023.
  49. Discovery of options via meta-learned subgoals. Advances in Neural Information Processing Systems, 34:29861–29873, 2021.
  50. Multimodal model-agnostic meta-learning via task-aware modulation. Advances in neural information processing systems, 32, 2019.
  51. Learning to reinforcement learn. arXiv preprint arXiv:1611.05763, 2016.
  52. Paired open-ended trailblazer (poet): Endlessly generating increasingly complex and diverse learning environments and their solutions. arXiv preprint arXiv:1901.01753, 2019.
  53. Meta-gradient reinforcement learning. Advances in neural information processing systems, 31, 2018.
  54. Meta-gradient reinforcement learning with an objective discovered online. Advances in Neural Information Processing Systems, 33:15254–15264, 2020.
  55. K. Young and T. Tian. Minatar: An atari-inspired testbed for thorough and reproducible reinforcement learning experiments. arXiv preprint arXiv:1903.03176, 2019.
  56. Fast context adaptation via meta-learning. In International Conference on Machine Learning, pages 7693–7702. PMLR, 2019.
  57. Varibad: a very good method for bayes-adaptive deep rl via meta-learning. Proceedings of ICLR 2020, 2020.
Citations (8)

Summary

We haven't generated a summary for this paper yet.