Pre-training with Synthetic Data Helps Offline Reinforcement Learning (2310.00771v4)
Abstract: Recently, it has been shown that for offline deep reinforcement learning (DRL), pre-training Decision Transformer with a large language corpus can improve downstream performance (Reid et al., 2022). A natural question to ask is whether this performance gain can only be achieved with language pre-training, or can be achieved with simpler pre-training schemes which do not involve language. In this paper, we first show that language is not essential for improved performance, and indeed pre-training with synthetic IID data for a small number of updates can match the performance gains from pre-training with a large language corpus; moreover, pre-training with data generated by a one-step Markov chain can further improve the performance. Inspired by these experimental results, we then consider pre-training Conservative Q-Learning (CQL), a popular offline DRL algorithm, which is Q-learning-based and typically employs a Multi-Layer Perceptron (MLP) backbone. Surprisingly, pre-training with simple synthetic data for a small number of updates can also improve CQL, providing consistent performance improvement on D4RL Gym locomotion datasets. The results of this paper not only illustrate the importance of pre-training for offline DRL but also show that the pre-training data can be synthetic and generated with remarkably simple mechanisms.
- Improving fractal pre-training. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 1300–1309, 2022.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
- Decision transformer: Reinforcement learning via sequence modeling. Advances in neural information processing systems, 34, 2021.
- Bail: Best-action imitation learning for batch deep reinforcement learning. In Advances in Neural Information Processing Systems, volume 33, pp. 18353–18363, 2020.
- On the transferability of pre-trained language models: A study from artificial datasets. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pp. 10518–10525, 2022.
- Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
- Decaf: A deep convolutional activation feature for generic visual recognition. In International conference on machine learning, pp. 647–655. PMLR, 2014.
- D4rl: Datasets for deep data-driven reinforcement learning. arXiv preprint arXiv:2004.07219, 2020.
- Off-policy deep reinforcement learning without exploration. In International Conference on Machine Learning, pp. 2052–2062. PMLR, 2019.
- Generalized decision transformer for offline hindsight information matching. arXiv preprint arXiv:2111.10364, 2021.
- On pre-training for visuo-motor control: Revisiting a learning-from-scratch baseline. arXiv preprint arXiv:2212.05749, 2022.
- Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems, 35:1820–1834, 2022a.
- Synthetic pre-training tasks for neural machine translation. arXiv preprint arXiv:2212.09864, 2022b.
- Synthetic pre-training tasks for neural machine translation. In Anna Rogers, Jordan Boyd-Graber, and Naoaki Okazaki (eds.), Findings of the Association for Computational Linguistics: ACL 2023, pp. 8080–8098, Toronto, Canada, July 2023. Association for Computational Linguistics. doi: 10.18653/v1/2023.findings-acl.512. URL https://aclanthology.org/2023.findings-acl.512.
- What makes imagenet good for transfer learning? arXiv preprint arXiv:1608.08614, 2016.
- When to trust your model: Model-based policy optimization. In Advances in Neural Information Processing Systems, pp. 12519–12530, 2019.
- Offline reinforcement learning as one big sequence modeling problem. Advances in neural information processing systems, 34, 2021.
- Language-driven representation learning for robotics. arXiv preprint arXiv:2302.12766, 2023.
- Pre-training without natural images. In Proceedings of the Asian Conference on Computer Vision, 2020.
- Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- Do better imagenet models transfer better? In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 2661–2671, 2019.
- Offline reinforcement learning with implicit q-learning. arXiv preprint arXiv:2110.06169, 2021.
- Does pretraining for summarization require knowledge transfer? arXiv preprint arXiv:2109.04953, 2021.
- Conservative q-learning for offline reinforcement learning. arXiv preprint arXiv:2006.04779, 2020.
- Offline reinforcement learning: Tutorial, review, and perspectives on open problems. arXiv preprint arXiv:2005.01643, 2020.
- A survey on transformers in reinforcement learning. arXiv preprint arXiv:2301.03044, 2023.
- Decoupled weight decay regularization. In International Conference on Learning Representations, 2017. URL https://api.semanticscholar.org/CorpusID:53592270.
- Pointer sentinel mixture models, 2016.
- R3m: A universal visual representation for robot manipulation. arXiv preprint arXiv:2203.12601, 2022.
- Learning music helps you read: Using transfer to study linguistic structure in language models. arXiv preprint arXiv:2004.14601, 2020.
- The unsurprising effectiveness of pre-trained vision models for control. arXiv preprint arXiv:2203.03580, 2022.
- Improving language understanding by generative pre-training. 2018.
- Real-world robot learning with masked visual pre-training. In Conference on Robot Learning, pp. 416–426. PMLR, 2023.
- Can wikipedia help offline reinforcement learning? arXiv preprint arXiv:2201.12122, 2022.
- Pretraining with artificial language: Studying transferable knowledge in language models. arXiv preprint arXiv:2203.10326, 2022.
- Rrl: Resnet as representation for reinforcement learning. In Self-Supervision for Reinforcement Learning Workshop-ICLR 2021, 2021.
- Shiro Takagi. On the effect of pre-training for transformer in different modality on offline reinforcement learning. arXiv preprint arXiv:2211.09817, 2022.
- Vrl3: A data-driven framework for visual deep reinforcement learning. Advances in Neural Information Processing Systems, 35:32974–32988, 2022.
- Transformers: State-of-the-art natural language processing. In Conference on Empirical Methods in Natural Language Processing: System Demonstrations, 2020.
- Lime: Learning inductive bias for primitives of mathematical reasoning. In International Conference on Machine Learning, pp. 11251–11262. PMLR, 2021.
- Insights into pre-training via simpler synthetic tasks. Advances in Neural Information Processing Systems, 35:21844–21857, 2022.
- Future-conditioned unsupervised pretraining for decision transformer. In International Conference on Machine Learning, pp. 38187–38203. PMLR, 2023.
- Representation matters: Offline pretraining for sequential decision making. arXiv preprint arXiv:2102.05815, 2021.
- A framework for efficient robotic manipulation. In Deep RL Workshop NeurIPS 2021, 2021.