MAMBA: an Effective World Model Approach for Meta-Reinforcement Learning (2403.09859v1)
Abstract: Meta-reinforcement learning (meta-RL) is a promising framework for tackling challenging domains requiring efficient exploration. Existing meta-RL algorithms are characterized by low sample efficiency, and mostly focus on low-dimensional task distributions. In parallel, model-based RL methods have been successful in solving partially observable MDPs, of which meta-RL is a special case. In this work, we leverage this success and propose a new model-based approach to meta-RL, based on elements from existing state-of-the-art model-based and meta-RL methods. We demonstrate the effectiveness of our approach on common meta-RL benchmark domains, attaining greater return with better sample efficiency (up to $15\times$) while requiring very little hyperparameter tuning. In addition, we validate our approach on a slate of more challenging, higher-dimensional domains, taking a step towards real-world generalizing agents.
- A survey of meta-reinforcement learning. arXiv preprint arXiv:2301.08028, 2023.
- Acting optimally in partially observable stochastic domains. In Aaai, volume 94, pp. 1023–1028, 1994.
- Contrabar: Contrastive bayes-adaptive deep rl. arXiv preprint arXiv:2306.02418, 2023.
- Model-based reinforcement learning via meta-policy optimization. In Conference on Robot Learning, pp. 617–629. PMLR, 2018.
- Offline meta reinforcement learning–identifiability challenges and effective data collection strategies. Advances in Neural Information Processing Systems, 34:4607–4618, 2021.
- Rl22{}^{2}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPT: Fast reinforcement learning via slow reinforcement learning, 2016.
- Michael O’Gordon Duff. Optimal Learning: Computational procedures for Bayes-adaptive Markov decision processes. University of Massachusetts Amherst, 2002.
- Model-agnostic meta-learning for fast adaptation of deep networks. In International conference on machine learning, pp. 1126–1135. PMLR, 2017.
- Dream to control: Learning behaviors by latent imagination. In International Conference on Learning Representations, 2019a.
- Learning latent dynamics for planning from pixels. In International conference on machine learning, pp. 2555–2565. PMLR, 2019b.
- Mastering atari with discrete world models. arXiv preprint arXiv:2010.02193, 2020.
- Mastering diverse domains through world models. arXiv preprint arXiv:2301.04104, 2023.
- Meta-model-based meta-policy optimization. In Asian Conference on Machine Learning, pp. 129–144. PMLR, 2021.
- Meta reinforcement learning as task inference. arXiv preprint arXiv:1905.06424, 2019.
- When to trust your model: Model-based policy optimization. Advances in neural information processing systems, 32, 2019.
- Heinrich Jiang. Uniform convergence rates for kernel density estimation. In Doina Precup and Yee Whye Teh (eds.), Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pp. 1694–1703. PMLR, 06–11 Aug 2017.
- Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013.
- Context-aware dynamics model for generalization in model-based reinforcement learning. In International Conference on Machine Learning, pp. 5757–5766. PMLR, 2020.
- Improving generalization in meta-rl with imaginary tasks from latent dynamics mixture. Advances in Neural Information Processing Systems, 34:27222–27235, 2021.
- Model-based adversarial meta-reinforcement learning. Advances in Neural Information Processing Systems, 33:10161–10173, 2020.
- Decoupling exploration and exploitation for meta-reinforcement learning without sacrifices. In International conference on machine learning, pp. 6925–6935. PMLR, 2021.
- On the effectiveness of fine-tuning versus meta-reinforcement learning, 2022.
- Discovering and achieving goals via world models. Advances in Neural Information Processing Systems, 34:24379–24391, 2021.
- Learning to adapt in dynamic, real-world environments through meta-reinforcement learning. arXiv preprint arXiv:1803.11347, 2018.
- Recurrent model-free rl can be a strong baseline for many pomdps. In International Conference on Machine Learning, pp. 16691–16723. PMLR, 2022.
- The complexity of markov decision processes. Mathematics of operations research, 12(3):441–450, 1987.
- Evaluating long-term memory in 3d mazes. arXiv preprint arXiv:2210.13383, 2022.
- Generalized hidden parameter mdps: Transferable model-based rl in a handful of trials. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pp. 5403–5411, 2020.
- A model-based approach to meta-reinforcement learning: Transformers and tree search. arXiv preprint arXiv:2208.11535, 2022.
- Efficient off-policy meta-reinforcement learning via probabilistic context variables. In International conference on machine learning, pp. 5331–5340. PMLR, 2019.
- Meta reinforcement learning with finite training tasks-a density estimation approach. Advances in Neural Information Processing Systems, 35:13640–13653, 2022.
- Murray Rosenblatt. Remarks on Some Nonparametric Estimates of a Density Function. The Annals of Mathematical Statistics, 27(3):832 – 837, 1956. doi: 10.1214/aoms/1177728190.
- Mastering atari, go, chess and shogi by planning with a learned model. Nature, 588(7839):604–609, 2020.
- Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
- Approximate information state for partially observed systems. In 2019 IEEE 58th Conference on Decision and Control (CDC), pp. 1629–1636. IEEE, 2019.
- Policy gradient methods for reinforcement learning with function approximation. Advances in neural information processing systems, 12, 1999.
- dm__\__control: Software and tasks for continuous control. Software Impacts, 6:100022, 2020.
- Learning to reinforcement learn, 2017.
- Qi Wang and Herke Van Hoof. Model-based meta reinforcement learning using graph structured surrogate models and amortized policy search. In International Conference on Machine Learning, pp. 23055–23077. PMLR, 2022.
- Mastering atari games with limited data. Advances in Neural Information Processing Systems, 34:25476–25488, 2021.
- Varibad: A very good method for bayes-adaptive deep rl via meta-learning. In International Conference on Learning Representations, 2019.
- Exploration in approximate hyper-state space for meta reinforcement learning. In International Conference on Machine Learning, pp. 12991–13001. PMLR, 2021.