Imitation Bootstrapped Reinforcement Learning (2311.02198v6)
Abstract: Despite the considerable potential of reinforcement learning (RL), robotic control tasks predominantly rely on imitation learning (IL) due to its better sample efficiency. However, it is costly to collect comprehensive expert demonstrations that enable IL to generalize to all possible scenarios, and any distribution shift would require recollecting data for finetuning. Therefore, RL is appealing if it can build upon IL as an efficient autonomous self-improvement procedure. We propose imitation bootstrapped reinforcement learning (IBRL), a novel framework for sample-efficient RL with demonstrations that first trains an IL policy on the provided demonstrations and then uses it to propose alternative actions for both online exploration and bootstrapping target values. Compared to prior works that oversample the demonstrations or regularize RL with an additional imitation loss, IBRL is able to utilize high quality actions from IL policies since the beginning of training, which greatly accelerates exploration and training efficiency. We evaluate IBRL on 6 simulation and 3 real-world tasks spanning various difficulty levels. IBRL significantly outperforms prior methods and the improvement is particularly more prominent in harder tasks.
- Is conditional generative modeling all you need for decision making? In The Eleventh International Conference on Learning Representations, 2023.
- Efficient online reinforcement learning with offline data. In International Conference on Machine Learning (ICML), 2023.
- Hydra: Hybrid robot actions for imitation learning. In Conference on Robot Learning (CoRL), 2023.
- RT-1: Robotics Transformer for Real-World Control at Scale. In Proceedings of Robotics: Science and Systems (RSS), 2023.
- Randomized ensembled double q-learning: Learning fast without a model. In International Conference on Learning Representations (ICLR), 2021.
- Diffusion Policy: Visuomotor Policy Learning via Action Diffusion. In Proceedings of Robotics: Science and Systems (RSS), 2023.
- An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations (ICLR), 2020.
- Human-level play in the game of “diplomacy” by combining language models with strategic reasoning. Science, 378(6624):1067–1074, 2022.
- Addressing function approximation error in actor-critic methods. In International Conference on Machine Learning (ICML), 2018.
- Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International Conference on Machine Learning (ICML), 2018.
- Mastering atari with discrete world models. In International Conference on Learning Representations (ICLR), 2021.
- Watch and match: Supercharging imitation with regularized optimal transport. In Conference on Robot Learning (CoRL), 2022.
- Temporal difference learning for model predictive control. In International Conference on Machine Learning (ICML), 2022.
- MoDem: Accelerating Visual Model-Based Reinforcement Learning with Demonstrations. In International Conference on Learning Representations (ICLR), 2023a.
- On pre-training for visuo-motor control: Revisiting a learning-from-scratch baseline. In International Conference on Machine Learning (ICML), 2023b.
- Deep residual learning for image recognition. In Conference on Computer vision and Pattern Recognition (CVPR), 2016.
- Deep Q-learning from demonstrations. In AAAI Conference on Artificial Intelligence, 2018.
- Dropout Q-functions for doubly efficient reinforcement learning. In International Conference on Learning Representations (ICLR), 2022.
- Vision-based manipulators need to also see from their hands. In International Conference on Learning Representations (ICLR), 2022.
- Adam: A method for stochastic optimization. In International Conference on Learning Representations (ICLR), 2015.
- Pre-Training for Robots: Offline RL Enables Learning New Tasks in a Handful of Trials. Robotics: Science and Systems (RSS), 2022.
- Continuous control with deep reinforcement learning. In Yoshua Bengio and Yann LeCun (eds.), International Conference on Learning Representations (ICLR), 2016.
- What matters in learning from offline human demonstrations for robot manipulation. In Conference on Robot Learning (CoRL), 2021.
- A graph placement methodology for fast chip design. Nature, 594(7862):207–212, 2021.
- Overcoming exploration in reinforcement learning with demonstrations. In International Conference on Robotics and Automation (ICRA), 2018.
- Accelerating online reinforcement learning with offline datasets. arXiv:2006.09359, 2020.
- Learning complex dexterous manipulation with deep reinforcement learning and demonstrations. In Proceedings of Robotics: Science and Systems (RSS), 2018.
- {SQIL}: Imitation learning via reinforcement learning with sparse rewards. In International Conference on Learning Representations, 2020. URL https://openreview.net/forum?id=S1xKd24twB.
- Goal conditioned imitation learning using score-based diffusion policies. In Robotics: Science and Systems, 2023.
- On Pathologies in KL-Regularized Reinforcement Learning from Expert Demonstrations. In Advances in Neural Information Processing Systems (NeurIPS), 2021.
- Masked world models for visual control. In Conference on Robot Learning (CoRL), 2022.
- RRL: resnet as representation for reinforcement learning. In Marina Meila and Tong Zhang (eds.), International Conference on Machine Learning (ICML), 2021.
- Mastering the game of go without human knowledge. Nature, 550(7676):354–359, 2017.
- Dropout: a simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research (JMLR), 15(1):1929–1958, 2014.
- Sebastian Thrun and A. Schwartz. Issues in using function approximation for reinforcement learning. In M. Mozer, P. Smolensky, D. Touretzky, J. Elman, and A. Weigend (eds.), Proceedings of 4th Connectionist Models Summer School. Erlbaum Associates, June 1993.
- Leveraging demonstrations for deep reinforcement learning on robotics problems with sparse rewards. arXiv:1707.08817, 2017.
- Grandmaster level in starcraft II using multi-agent reinforcement learning. Nature, 575(7782):350–354, 2019.
- Group normalization. In Proceedings of the European Conference on Computer Vision (ECCV), September 2018.
- Image augmentation is all you need: Regularizing deep reinforcement learning from pixels. In International Conference on Learning Representations (ICLR), 2021.
- Mastering visual continuous control: Improved data-augmented reinforcement learning. In International Conference on Learning Representations (ICLR), 2022.
- Visual imitation made easy. In Conference on Robot Learning (CoRL), 2021.
- Meta-world: A benchmark and evaluation for multi-task and meta reinforcement learning. In Conference on Robot Learning (CoRL), 2019.
- Learning visual robotic control efficiently with contrastive pre-training and data augmentation. In International Conference on Intelligent Robots and Systems (IROS), 2022.
- Hengyuan Hu (22 papers)
- Suvir Mirchandani (17 papers)
- Dorsa Sadigh (162 papers)