Black box meta-learning intrinsic rewards for sparse-reward environments (2407.21546v2)
Abstract: Despite the successes and progress of deep reinforcement learning over the last decade, several challenges remain that hinder its broader application. Some fundamental aspects to improve include data efficiency, generalization capability, and ability to learn in sparse-reward environments, which often require human-designed dense rewards. Meta-learning has emerged as a promising approach to address these issues by optimizing components of the learning algorithm to meet desired characteristics. Additionally, a different line of work has extensively studied the use of intrinsic rewards to enhance the exploration capabilities of algorithms. This work investigates how meta-learning can improve the training signal received by RL agents. The focus is on meta-learning intrinsic rewards under a framework that doesn't rely on the use of meta-gradients. We analyze and compare this approach to the use of extrinsic rewards and a meta-learned advantage function. The developed algorithms are evaluated on distributions of continuous control tasks with both parametric and non-parametric variations, and with only sparse rewards accessible for the evaluation tasks.
- Mastering atari, go, chess and shogi by planning with a learned model. Nature, 588(7839):604–609, 2020.
- Faster sorting algorithms discovered using deep reinforcement learning. Nature, 618(7964):257–263, 2023.
- Magnetic control of tokamak plasmas through deep reinforcement learning. Nature, 602(7897):414–419, 2022.
- Controlling commercial cooling systems using reinforcement learning, 2022.
- Model-based reinforcement learning: A survey, 2022.
- Hierarchical reinforcement learning: A comprehensive survey. ACM Computing Surveys (CSUR), 54(5):1–35, 2021.
- Offline reinforcement learning: Tutorial, review, and perspectives on open problems, 2020.
- Diversity is all you need: Learning skills without a reward function, 2018.
- CURL: Contrastive unsupervised representations for reinforcement learning. In H. D. III and A. Singh, editors, Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pages 5639–5650. PMLR, 13–18 Jul 2020. URL https://proceedings.mlr.press/v119/laskin20a.html.
- N. Vithayathil Varghese and Q. H. Mahmoud. A survey of multi-task deep reinforcement learning. Electronics, 9(9), 2020. ISSN 2079-9292. doi:10.3390/electronics9091363. URL https://www.mdpi.com/2079-9292/9/9/1363.
- A survey of meta-reinforcement learning. arXiv preprint arXiv:2301.08028, 2023.
- Meta-learning in neural networks: A survey. IEEE transactions on pattern analysis and machine intelligence, 44(9):5149–5169, 2021.
- A survey on intrinsic motivation in reinforcement learning, 2019.
- Rl2: Fast reinforcement learning via slow reinforcement learning. arXiv preprint arXiv:1611.02779, 2016.
- Learning to reinforcement learn. arXiv preprint arXiv:1611.05763, 2016.
- Meta-gradient reinforcement learning, 2018.
- Model-agnostic meta-learning for fast adaptation of deep networks. In International conference on machine learning, pages 1126–1135. PMLR, 2017.
- Rapid learning or feature reuse? towards understanding the effectiveness of maml. arXiv preprint arXiv:1909.09157, 2019.
- Meta-reinforcement learning of structured exploration strategies. Advances in neural information processing systems, 31, 2018.
- Meta-sgd: Learning to learn quickly for few-shot learning. arXiv preprint arXiv:1707.09835, 2017.
- E. Park and J. B. Oliva. Meta-curvature. Advances in neural information processing systems, 32, 2019.
- On first-order meta-learning algorithms, 2018.
- Learning to learn: Meta-critic networks for sample efficient learning, 2017.
- Evolved policy gradients. Advances in Neural Information Processing Systems, 31, 2018.
- Where do rewards come from? 01 2009.
- On learning intrinsic rewards for policy gradient methods. Advances in Neural Information Processing Systems, 31, 2018.
- Learning intrinsic rewards as a bi-level optimization problem. In J. Peters and D. Sontag, editors, Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence (UAI), volume 124 of Proceedings of Machine Learning Research, pages 111–120. PMLR, 03–06 Aug 2020.
- What can learned intrinsic rewards capture? In International Conference on Machine Learning, pages 11436–11446. PMLR, 2020.
- Meta-learning curiosity algorithms. arXiv preprint arXiv:2003.05325, 2020.
- Reward shaping via meta-learning, 2019.
- Improving generalization in meta reinforcement learning using learned objectives. arXiv preprint arXiv:1910.04098, 2019.
- Discovering reinforcement learning algorithms, 2021.
- Online meta-critic learning for off-policy actor-critic methods. Advances in neural information processing systems, 33:17662–17673, 2020.
- Meta-gradient reinforcement learning with an objective discovered online. Advances in Neural Information Processing Systems, 33:15254–15264, 2020.
- Meta learning via learned loss. In 2020 25th International Conference on Pattern Recognition (ICPR), pages 4161–4168. IEEE, 2021.
- S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural computation, 9(8):1735–1780, 1997.
- Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
- Some considerations on learning to explore via meta-reinforcement learning, 2019.
- Meta-world: A benchmark and evaluation for multi-task and meta reinforcement learning, 2021.
- Concrete problems in ai safety, 2016.
- Efficient off-policy meta-reinforcement learning via probabilistic context variables. In International conference on machine learning, pages 5331–5340. PMLR, 2019.
- Meld: Meta-reinforcement learning from images via latent state models. arXiv preprint arXiv:2010.13957, 2020.