Pre-Training for Robots: Offline RL Enables Learning New Tasks from a Handful of Trials (2210.05178v3)
Abstract: Progress in deep learning highlights the tremendous potential of utilizing diverse robotic datasets for attaining effective generalization and makes it enticing to consider leveraging broad datasets for attaining robust generalization in robotic learning as well. However, in practice, we often want to learn a new skill in a new environment that is unlikely to be contained in the prior data. Therefore we ask: how can we leverage existing diverse offline datasets in combination with small amounts of task-specific data to solve new tasks, while still enjoying the generalization benefits of training on large amounts of data? In this paper, we demonstrate that end-to-end offline RL can be an effective approach for doing this, without the need for any representation learning or vision-based pre-training. We present pre-training for robots (PTR), a framework based on offline RL that attempts to effectively learn new tasks by combining pre-training on existing robotic datasets with rapid fine-tuning on a new task, with as few as 10 demonstrations. PTR utilizes an existing offline RL method, conservative Q-learning (CQL), but extends it to include several crucial design decisions that enable PTR to actually work and outperform a variety of prior methods. To our knowledge, PTR is the first RL method that succeeds at learning new tasks in a new domain on a real WidowX robot with as few as 10 task demonstrations, by effectively leveraging an existing dataset of diverse multi-task robot data collected in a variety of toy kitchens. We also demonstrate that PTR can enable effective autonomous fine-tuning and improvement in a handful of trials, without needing any demonstrations. An accompanying overview video can be found in the supplementary material and at thi URL: https://sites.google.com/view/ptr-final/
- Do as i can, not as i say: Grounding language in robotic affordances. arXiv preprint arXiv:2204.01691, 2022.
- Hindsight experience replay. Advances in neural information processing systems, 30, 2017.
- CrossNorm: Normalization for Off-Policy TD Reinforcement Learning. arXiv e-prints, art. arXiv:1902.05605, February 2019.
- Towards deeper deep reinforcement learning. arXiv preprint arXiv:2106.01151, 2021.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
- Actionable models: Unsupervised offline reinforcement learning of robotic skills. arXiv preprint arXiv:2104.07749, 2021.
- Randomized Ensembled Double Q-Learning: Learning Fast Without a Model. arXiv e-prints, art. arXiv:2101.05982, January 2021. doi: 10.48550/arXiv.2101.05982.
- Offline meta reinforcement learning. arXiv e-prints, pages arXiv–2008, 2020.
- Bridge data: Boosting generalization of robotic skills with cross-domain datasets. arXiv preprint arXiv:2109.13396, 2021.
- Rvs: What is essential for offline rl via supervised learning? arXiv preprint arXiv:2112.10751, 2021.
- Impala: Scalable distributed deep-rl with importance weighted actor-learner architectures. In International Conference on Machine Learning, pages 1407–1416. PMLR, 2018.
- Replacing rewards with examples: Example-based policy search via recursive classification. Advances in Neural Information Processing Systems, 34:11541–11552, 2021.
- A minimalist approach to offline reinforcement learning. arXiv preprint arXiv:2106.06860, 2021.
- Off-policy deep reinforcement learning without exploration. arXiv preprint arXiv:1812.02900, 2018.
- Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International conference on machine learning, pages 1861–1870. PMLR, 2018.
- Masked autoencoders are scalable vision learners. arxiv. 2021 doi: 10.48550. arXiv preprint arXiv.2111.06377, 2021.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
- Multi-task deep reinforcement learning with popart. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 3796–3803, 2019.
- Way off-policy batch deep reinforcement learning of implicit human preferences in dialog. arXiv preprint arXiv:1907.00456, 2019.
- Never stop learning: The effectiveness of fine-tuning in robotic reinforcement learning. arXiv preprint arXiv:2004.10190, 2020.
- Scalable deep reinforcement learning for vision-based robotic manipulation. In Conference on Robot Learning, pages 651–673, 2018.
- Mt-opt: Continuous multi-task robotic reinforcement learning at scale. arXiv preprint arXiv:2104.08212, 2021.
- Image augmentation is all you need: Regularizing deep reinforcement learning from pixels. arXiv preprint arXiv:2004.13649, 2020.
- Offline reinforcement learning with fisher divergence critic regularization. In International Conference on Machine Learning, pages 5774–5783. PMLR, 2021a.
- Offline reinforcement learning with implicit q-learning. 2021b.
- Stabilizing off-policy q-learning via bootstrapping error reduction. In Advances in Neural Information Processing Systems, pages 11761–11771, 2019.
- Conservative q-learning for offline reinforcement learning. arXiv preprint arXiv:2006.04779, 2020.
- Should i run offline reinforcement learning or behavioral cloning? In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=AP1MKT37rJ.
- How to spend your robot time: Bridging kickstarting and offline reinforcement learning for vision-based robotic manipulation. arXiv preprint arXiv:2205.03353, 2022a.
- Multi-game decision transformers. arXiv preprint arXiv:2205.15241, 2022b.
- Offline-to-online reinforcement learning via balanced replay and pessimistic q-ensemble. In Conference on Robot Learning, pages 1702–1712. PMLR, 2022c.
- End-to-end training of deep visuomotor policies. The Journal of Machine Learning Research, 17(1):1334–1373, 2016.
- Multi-task batch reinforcement learning with metric learning. arXiv preprint arXiv:1909.11373, 2019.
- Model-based offline meta-reinforcement learning with regularization. arXiv preprint arXiv:2202.02929, 2022.
- Vip: Towards universal visual reward and representation via value-implicit pre-training. arXiv preprint arXiv:2210.00030, 2022.
- Iris: Implicit reinforcement without interaction at scale for learning control from offline robot manipulation data. In 2020 IEEE International Conference on Robotics and Automation (ICRA), pages 4414–4420. IEEE, 2020.
- What matters in learning from offline human demonstrations for robot manipulation. In 5th Annual Conference on Robot Learning, 2021. URL https://openreview.net/forum?id=JrsfBJtDFdI.
- Offline Meta-Reinforcement Learning with Advantage Weighting. arXiv e-prints, art. arXiv:2008.06043, August 2020.
- Offline meta-reinforcement learning with advantage weighting. In International Conference on Machine Learning, pages 7780–7791. PMLR, 2021.
- Accelerating online reinforcement learning with offline datasets. arXiv preprint arXiv:2006.09359, 2020.
- R3m: A universal visual representation for robot manipulation. arXiv preprint arXiv:2203.12601, 2022.
- Cal-QL: Calibrated offline rl pre-training for efficient online fine-tuning. arXiv preprint arXiv:2303.05479, 2023.
- Actor-mimic: Deep multitask and transfer reinforcement learning. arXiv preprint arXiv:1511.06342, 2015.
- Advantage-weighted regression: Simple and scalable off-policy reinforcement learning. arXiv preprint arXiv:1910.00177, 2019.
- Offline meta-reinforcement learning with online self-supervision. arXiv preprint arXiv:2107.03974, 2021.
- Real-world robot learning with masked visual pre-training. arXiv preprint arXiv:2210.03109, 2022.
- Behavior transformers: Cloning k𝑘kitalic_k modes with one stone. arXiv preprint arXiv:2206.11251, 2022.
- Keep doing what worked: Behavioral modelling priors for offline reinforcement learning. arXiv preprint arXiv:2002.08396, 2020.
- End-to-end robotic reinforcement learning without reward engineering. Robotics: Science and Systems, 2019.
- Cog: Connecting new skills to past experience with offline reinforcement learning. arXiv preprint arXiv:2010.14500, 2020.
- Distral: Robust multitask reinforcement learning. arXiv preprint arXiv:1707.04175, 2017.
- Leveraging demonstrations for deep reinforcement learning on robotics problems with sparse rewards. arXiv preprint arXiv:1707.08817, 2017.
- Multi-task reinforcement learning: a hierarchical bayesian approach. In Proceedings of the 24th international conference on Machine learning, pages 1015–1022, 2007.
- Behavior regularized offline reinforcement learning. arXiv preprint arXiv:1911.11361, 2019.
- Group normalization. In Proceedings of the European conference on computer vision (ECCV), pages 3–19, 2018.
- Masked visual pre-training for motor control. arXiv preprint arXiv:2203.06173, 2022.
- Lifelong robotic reinforcement learning by retaining experiences. arXiv preprint arXiv:2109.09180, 2021.
- Representation matters: Offline pretraining for sequential decision making. arXiv preprint arXiv:2102.05815, 2021.
- Trail: Near-optimal imitation learning with suboptimal data. arXiv preprint arXiv:2110.14770, 2021.
- Visual imitation made easy, 2020.
- Conservative data sharing for multi-task offline reinforcement learning. NeurIPS, 34, 2021.
- How to leverage unlabeled data in offline rl. arXiv:2202.01741, 2022.