Challenges and Opportunities in Offline Reinforcement Learning from Visual Observations (2206.04779v3)
Abstract: Offline reinforcement learning has shown great promise in leveraging large pre-collected datasets for policy learning, allowing agents to forgo often-expensive online data collection. However, offline reinforcement learning from visual observations with continuous action spaces remains under-explored, with a limited understanding of the key challenges in this complex domain. In this paper, we establish simple baselines for continuous control in the visual domain and introduce a suite of benchmarking tasks for offline reinforcement learning from visual observations designed to better represent the data distributions present in real-world offline RL problems and guided by a set of desiderata for offline RL from visual observations, including robustness to visual distractions and visually identifiable changes in dynamics. Using this suite of benchmarking tasks, we show that simple modifications to two popular vision-based online reinforcement learning algorithms, DreamerV2 and DrQ-v2, suffice to outperform existing offline RL methods and establish competitive baselines for continuous control in the visual domain. We rigorously evaluate these algorithms and perform an empirical evaluation of the differences between state-of-the-art model-based and model-free offline RL methods for continuous control from visual observations. All code and data used in this evaluation are open-sourced to facilitate progress in this domain.
- An optimistic perspective on offline reinforcement learning. In International Conference on Machine Learning, 2020.
- Model-based offline planning. In International Conference on Learning Representations, 2021.
- A framework for behavioural cloning. In Machine Intelligence 15, pp. 103–129, 1995.
- Augmented world models facilitate zero-shot dynamics generalization from a single offline environment. In Marina Meila and Tong Zhang (eds.), Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pp. 619–629. PMLR, 2021.
- Behavioural cloning: Phenomena, results and problems. IFAC Proceedings Volumes, 28(21):143–149, September 1995. ISSN 1474-6670. doi: 10.1016/s1474-6670(17)46716-4.
- Understanding disentangling in β𝛽\betaitalic_β-vae. CoRR, abs/1804.03599, 2018.
- Scaling data-driven robotics with reward sketching and batch reinforcement learning. In Proceedings of Robotics: Science and Systems, Corvalis, Oregon, USA, July 2020. doi: 10.15607/RSS.2020.XVI.076.
- Actionable models: Unsupervised offline reinforcement learning of robotic skills. In Marina Meila and Tong Zhang (eds.), Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pp. 1518–1528. PMLR, 18–24 Jul 2021.
- Decision transformer: Reinforcement learning via sequence modeling. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan (eds.), Advances in Neural Information Processing Systems, volume 34, pp. 15084–15097. Curran Associates, Inc., 2021.
- Empirical evaluation of gated recurrent neural networks on sequence modeling, 2014.
- Tree-based batch mode reinforcement learning. Journal of Machine Learning Research (JMLR), 6(18):503–556, 2005.
- Implicit behavioral cloning. In 5th Annual Conference on Robot Learning, 2021.
- D4RL: Datasets for deep data-driven reinforcement learning, 2021.
- A minimalist approach to offline reinforcement learning. In Thirty-Fifth Conference on Neural Information Processing Systems, 2021.
- Addressing function approximation error in actor-critic methods. In Jennifer Dy and Andreas Krause (eds.), Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pp. 1587–1596. PMLR, 2018.
- Off-policy deep reinforcement learning without exploration. In International Conference on Machine Learning, pp. 2052–2062, 2019.
- Rl unplugged: Benchmarks for offline reinforcement learning, 2020.
- Soft actor-critic algorithms and applications. CoRR, abs/1812.05905, 2018.
- Learning latent dynamics for planning from pixels. In Kamalika Chaudhuri and Ruslan Salakhutdinov (eds.), Proceedings of the 36th International Conference on Machine Learning, pp. 2555–2565, 2019.
- Dream to control: Learning behaviors by latent imagination. In International Conference on Learning Representations, 2020a.
- Mastering Atari with discrete world models. arXiv preprint arXiv:2010.02193, 2020b.
- Array programming with NumPy. Nature, 585(7825):357–362, September 2020. doi: 10.1038/s41586-020-2649-2.
- Learning robust dynamics through variational sparse gating. In Deep RL Workshop NeurIPS 2021, 2021.
- Scalable deep reinforcement learning for vision-based robotic manipulation. In Aude Billard, Anca Dragan, Jan Peters, and Jun Morimoto (eds.), Proceedings of The 2nd Conference on Robot Learning, volume 87 of Proceedings of Machine Learning Research, pp. 651–673. PMLR, 29–31 Oct 2018.
- Learning to drive in a day, 2018.
- MOReL: Model-based offline reinforcement learning. In H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin (eds.), Advances in Neural Information Processing Systems, volume 33, pp. 21810–21823. Curran Associates, Inc., 2020.
- Robust and efficient transfer learning with hidden parameter markov decision processes. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (eds.), Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017.
- Offline reinforcement learning with implicit q-learning, 2021.
- Stabilizing off-policy q-learning via bootstrapping error reduction. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett (eds.), Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019.
- Conservative q-learning for offline reinforcement learning. In H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin (eds.), Advances in Neural Information Processing Systems, volume 33, pp. 1179–1191. Curran Associates, Inc., 2020.
- Should i run offline reinforcement learning or behavioral cloning? In Deep RL Workshop NeurIPS 2021, 2021.
- In defense of the unitary scalarization for deep multi-task learning, 2022.
- Simple and scalable predictive uncertainty estimation using deep ensembles. In Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS’17, pp. 6405–6416, Red Hook, NY, USA, 2017. Curran Associates Inc. ISBN 9781510860964.
- Reinforcement learning with augmented data. In H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin (eds.), Advances in Neural Information Processing Systems, volume 33, pp. 19884–19895. Curran Associates, Inc., 2020.
- State representation learning for control: An overview. Neural Networks, 108:379–392, December 2018. ISSN 0893-6080. doi: 10.1016/j.neunet.2018.07.006.
- Offline reinforcement learning: Tutorial, review, and perspectives on open problems, 2020.
- Continuous control with deep reinforcement learning. In Yoshua Bengio and Yann LeCun (eds.), 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, Conference Track Proceedings, 2016.
- Revisiting design choices in offline model based reinforcement learning. In International Conference on Learning Representations, 2022.
- 1 year, 1000 km: The oxford RobotCar dataset. The International Journal of Robotics Research, 36(1):3–15, November 2016. ISSN 0278-3649, 1741-3176. doi: 10.1177/0278364916679498.
- What matters in learning from offline human demonstrations for robot manipulation. In 5th Annual Conference on Robot Learning, 2021.
- Disentangling disentanglement in variational autoencoders. In Kamalika Chaudhuri and Ruslan Salakhutdinov (eds.), Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pp. 4402–4412. PMLR, 09–15 Jun 2019.
- Temporal predictive coding for model-based planning in latent space. In Marina Meila and Tong Zhang (eds.), Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pp. 8130–8139. PMLR, 18–24 Jul 2021.
- Estimating the mean and variance of the target probability distribution. In Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN’94), volume 1, pp. 55–60 vol.1. IEEE, 1994. doi: 10.1109/icnn.1994.374138.
- Dreaming: Model-based reinforcement learning by latent imagination without reconstruction. In 2021 IEEE International Conference on Robotics and Automation (ICRA), pp. 4209–4215, 2021. doi: 10.1109/ICRA48506.2021.9560734.
- To tune or not to tune? adapting pretrained representations to diverse tasks. In Proceedings of the 4th Workshop on Representation Learning for NLP (RepL4NLP-2019), pp. 7–14, Florence, Italy, August 2019. Association for Computational Linguistics. doi: 10.18653/v1/w19-4302.
- Offline reinforcement learning from images with latent space models. In Proceedings of the 3rd Conference on Learning for Dynamics and Control, volume 144 of Proceedings of Machine Learning Research, pp. 1154–1168. PMLR, 2021.
- Decoupling value and policy for generalization in reinforcement learning. In Marina Meila and Tong Zhang (eds.), Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pp. 8787–8798. PMLR, 2021.
- Efficient off-policy meta-reinforcement learning via probabilistic context variables. In Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA, volume 97, pp. 5331–5340. PMLR, 2019.
- Improving robustness against common corruptions by covariate shift adaptation. arXiv preprint arXiv:2006.16971, 2020.
- Planning to explore via self-supervised world models. In ICML, 2020.
- ViNG: Learning Open-World Navigation with Visual Goals. In IEEE International Conference on Robotics and Automation (ICRA), 2021. URL https://arxiv.org/abs/2012.09812.
- MTEnv - environment interface for mulit-task reinforcement learning. Github, 2021.
- Observational overfitting in reinforcement learning. In International Conference on Learning Representations, 2020.
- The distracting control suite – a challenging benchmark for reinforcement learning from pixels. arXiv preprint arXiv:2101.02722, 2021.
- Reinforcement Learning. Springer US, second edition, 1992. ISBN 9781461366089, 9781461536185. doi: 10.1007/978-1-4615-3618-5.
- dm_control: Software and tasks for continuous control, 2020.
- Behavior regularized offline reinforcement learning, 2019.
- Pre-training on grayscale ImageNet improves medical image classification. In Laura Leal-Taixé and Stefan Roth (eds.), Computer Vision – ECCV 2018 Workshops, pp. 476–484, Cham, 2019. Springer International Publishing. ISBN 978-3-030-11024-6.
- Mastering visual continuous control: Improved data-augmented reinforcement learning. arXiv preprint arXiv:2107.09645, 2021a.
- Mastering visual continuous control: Improved data-augmented reinforcement learning, 2021b.
- Image augmentation is all you need: Regularizing deep reinforcement learning from pixels. In International Conference on Learning Representations, 2021c.
- BDD100K: A diverse driving dataset for heterogeneous multitask learning. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, June 2020a. doi: 10.1109/cvpr42600.2020.00271.
- MOPO: Model-based offline policy optimization. In H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin (eds.), Advances in Neural Information Processing Systems, volume 33, pp. 14129–14142. Curran Associates, Inc., 2020b.
- COMBO: Conservative offline model-based policy optimization, 2021.
- Learning robust state abstractions for hidden-parameter block MDPs. In International Conference on Learning Representations, 2021.
- VariBAD: Variational Bayes-adaptive deep RL via meta-learning. Journal of Machine Learning Research (JMLR), 22(289):1–39, 2021.