Generalization through Diversity: Improving Unsupervised Environment Design (2301.08025v2)
Abstract: Agent decision making using Reinforcement Learning (RL) heavily relies on either a model or simulator of the environment (e.g., moving in an 8x8 maze with three rooms, playing Chess on an 8x8 board). Due to this dependence, small changes in the environment (e.g., positions of obstacles in the maze, size of the board) can severely affect the effectiveness of the policy learned by the agent. To that end, existing work has proposed training RL agents on an adaptive curriculum of environments (generated automatically) to improve performance on out-of-distribution (OOD) test scenarios. Specifically, existing research has employed the potential for the agent to learn in an environment (captured using Generalized Advantage Estimation, GAE) as the key factor to select the next environment(s) to train the agent. However, such a mechanism can select similar environments (with a high potential to learn) thereby making agent training redundant on all but one of those environments. To that end, we provide a principled approach to adaptively identify diverse environments based on a novel distance measure relevant to environment design. We empirically demonstrate the versatility and effectiveness of our method in comparison to multiple leading approaches for unsupervised environment design on three distinct benchmark problems used in literature.
- Contrastive behavioral similarity embeddings for generalization in reinforcement learning. arXiv preprint arXiv:2101.05265, 2021.
- Deep reinforcement learning at the edge of the statistical precipice. Advances in neural information processing systems, 34:29304–29320, 2021.
- Solving rubik’s cube with a robot hand. arXiv preprint arXiv:1910.07113, 2019.
- Quantifying generalization in reinforcement learning. In International Conference on Machine Learning, pages 1282–1289. PMLR, 2019.
- Leveraging procedural generation to benchmark reinforcement learning. In International conference on machine learning, pages 2048–2056. PMLR, 2020.
- Emergent complexity and zero-shot transfer via unsupervised environment design. Advances in neural information processing systems, 33:13049–13061, 2020.
- Pot: Python optimal transport. J. Mach. Learn. Res., 22(78):1–8, 2021.
- Diversity-driven exploration strategy for deep reinforcement learning. Advances in neural information processing systems, 31, 2018.
- Nick Jakobi. Evolutionary robotics and the radical envelope-of-noise hypothesis. Adaptive behavior, 6(2):325–368, 1997.
- Replay-guided adversarial environment design. Advances in Neural Information Processing Systems, 34:1884–1897, 2021.
- Prioritized level replay. In International Conference on Machine Learning, pages 4940–4950. PMLR, 2021.
- LV Kantorovich. Kantorovich: On the transfer of masses. In Dokl. Akad. Nauk. SSSR, 1942.
- Trajectory diversity for zero-shot coordination. In International Conference on Machine Learning, pages 7204–7213. PMLR, 2021.
- Active domain randomization. In Conference on Robot Learning, pages 1162–1176. PMLR, 2020.
- A graph placement methodology for fast chip design. Nature, 594(7862):207–212, 2021.
- Human-level control through deep reinforcement learning. nature, 518(7540):529–533, 2015.
- Gaspard Monge. Mémoire sur la théorie des déblais et des remblais. Mem. Math. Phys. Acad. Royale Sci., pages 666–704, 1781.
- Robust reinforcement learning. Neural computation, 17(2):335–359, 2005.
- Effective diversity in population based reinforcement learning. Advances in Neural Information Processing Systems, 33:18050–18062, 2020.
- Evolving curricula with regret-based environment design. arXiv preprint arXiv:2203.01302, 2022.
- Supervision via competition: Robot adversaries for learning tasks. In 2017 IEEE International Conference on Robotics and Automation (ICRA), pages 1601–1608. IEEE, 2017.
- Teacher algorithms for curriculum learning of deep rl in continuously parameterized environments. In Conference on Robot Learning, pages 835–853. PMLR, 2020.
- Cad2rl: Real single-image flight without a single real image. arXiv preprint arXiv:1611.04201, 2016.
- High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:1506.02438, 2015.
- Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
- Mastering the game of go with deep neural networks and tree search. nature, 529(7587):484–489, 2016.
- Measuring the distance between finite markov decision processes. In Proceedings of the 2016 international conference on autonomous agents & multiagent systems, pages 468–476, 2016.
- Domain randomization for transferring deep neural networks from simulation to the real world. In 2017 IEEE/RSJ international conference on intelligent robots and systems (IROS), pages 23–30. IEEE, 2017.
- Measuring structural similarities in finite mdps. In IJCAI, pages 3684–3690, 2019.
- Paired open-ended trailblazer (poet): Endlessly generating increasingly complex and diverse learning environments and their solutions. arXiv preprint arXiv:1901.01753, 2019.
- A dissection of overfitting and generalization in continuous reinforcement learning. arXiv preprint arXiv:1806.07937, 2018.