On Pre-Training for Visuo-Motor Control: Revisiting a Learning-from-Scratch Baseline (2212.05749v2)
Abstract: In this paper, we examine the effectiveness of pre-training for visuo-motor control tasks. We revisit a simple Learning-from-Scratch (LfS) baseline that incorporates data augmentation and a shallow ConvNet, and find that this baseline is surprisingly competitive with recent approaches (PVR, MVP, R3M) that leverage frozen visual representations trained on large-scale vision datasets -- across a variety of algorithms, task domains, and metrics in simulation and on a real robot. Our results demonstrate that these methods are hindered by a significant domain gap between the pre-training datasets and current benchmarks for visuo-motor control, which is alleviated by finetuning. Based on our findings, we provide recommendations for future research in pre-training for control and hope that our simple yet strong baseline will aid in accurately benchmarking progress in this area.
- Flamingo: a visual language model for few-shot learning. arXiv preprint arXiv:2204.14198, 2022.
- Rt-1: Robotics transformer for real-world control at scale. In arXiv preprint arXiv:2212.06817, 2022.
- Language models are few-shot learners. ArXiv, abs/2005.14165, 2020.
- When vision transformers outperform resnets without pretraining or strong data augmentations. ArXiv, abs/2106.01548, 2021.
- Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311, 2022.
- Bert: Pre-training of deep bidirectional transformers for language understanding. ArXiv, abs/1810.04805, 2019.
- Unsupervised visual representation learning by context prediction. 2015 IEEE International Conference on Computer Vision (ICCV), pp. 1422–1430, 2015.
- An image is worth 16x16 words: Transformers for image recognition at scale. ArXiv, abs/2010.11929, 2021.
- Ego4d: Around the world in 3,000 hours of egocentric video. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 18995–19012, 2022.
- Generalization in reinforcement learning by soft data augmentation. In International Conference on Robotics and Automation (ICRA), 2021.
- Stabilizing deep q-learning with convnets and vision transformers under data augmentation. In NeurIPS, 2021.
- Modem: Accelerating visual model-based reinforcement learning with demonstrations. arXiv preprint, 2022a.
- Temporal difference learning for model predictive control. In ICML, 2022b.
- Deep residual learning for image recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778, 2015.
- Momentum contrast for unsupervised visual representation learning. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9726–9735, 2020.
- Masked autoencoders are scalable vision learners. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 15979–15988, 2022.
- Image augmentation is all you need: Regularizing deep reinforcement learning from pixels. ArXiv, abs/2004.13649, 2021.
- Reinforcement learning with augmented data. ArXiv, abs/2004.14990, 2020.
- A simple randomization technique for generalization in deep reinforcement learning. ArXiv, abs/1910.05396, 2019.
- Continuous control with deep reinforcement learning. CoRR, abs/1509.02971, 2016.
- A comprehensive survey of data augmentation in visual reinforcement learning. arXiv preprint arXiv:2210.04561, 2022.
- R3m: A universal visual representation for robot manipulation. ArXiv, abs/2203.12601, 2022.
- The unsurprising effectiveness of pre-trained vision models for control. In ICML, 2022.
- Asymmetric actor critic for image-based robot learning. arXiv preprint arXiv:1710.06542, 2017.
- Learning transferable visual models from natural language supervision. In International Conference on Machine Learning, pp. 8748–8763, 2021.
- Automatic data augmentation for generalization in deep reinforcement learning. arXiv preprint arXiv:2006.12862, 2020.
- Learning Complex Dexterous Manipulation with Deep Reinforcement Learning and Demonstrations. In Proceedings of Robotics: Science and Systems (RSS), 2018.
- Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 115:211–252, 2015.
- Proximal policy optimization algorithms. ArXiv, abs/1707.06347, 2017.
- Data-efficient reinforcement learning with self-predictive representations. In ICLR, 2021.
- Unsupervised perceptual rewards for imitation learning. ArXiv, abs/1612.06699, 2016.
- Rrl: Resnet as representation for reinforcement learning. ArXiv, abs/2107.03380, 2021.
- Reinforcement learning with latent flow. In Neural Information Processing Systems, 2021.
- Curl: Contrastive unsupervised representations for reinforcement learning. In ICML, 2020.
- Stanford alpaca: An instruction-following llama model. https://github.com/tatsu-lab/stanford_alpaca, 2023.
- Deepmind control suite. Technical report, DeepMind, 2018.
- Domain randomization for transferring deep neural networks from simulation to the real world. 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Sep 2017.
- Representation learning with contrastive predictive coding. ArXiv, abs/1807.03748, 2018.
- Vrl3: A data-driven framework for visual deep reinforcement learning. ArXiv, abs/2202.10324, 2022.
- Masked visual pre-training for motor control. ArXiv, abs/2203.06173, 2022.
- On the feasibility of cross-task transfer with model-based reinforcement learning. arXiv preprint arXiv:2210.10763, 2022.
- Improving sample efficiency in model-free reinforcement learning from images. 2019.
- Mastering visual continuous control: Improved data-augmented reinforcement learning. arXiv preprint arXiv:2107.09645, 2021.
- Pre-trained image encoder for generalizable visual reinforcement learning. arXiv preprint arXiv:2212.08860, 2022.
- Visual reinforcement learning with self-supervised 3d representations. arXiv preprint arXiv:2210.07241, 2022.
- Nicklas Hansen (22 papers)
- Zhecheng Yuan (18 papers)
- Yanjie Ze (20 papers)
- Tongzhou Mu (19 papers)
- Aravind Rajeswaran (42 papers)
- Hao Su (219 papers)
- Huazhe Xu (93 papers)
- Xiaolong Wang (243 papers)