Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
173 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Pre-training with Synthetic Data Helps Offline Reinforcement Learning (2310.00771v4)

Published 1 Oct 2023 in cs.AI and cs.LG

Abstract: Recently, it has been shown that for offline deep reinforcement learning (DRL), pre-training Decision Transformer with a large language corpus can improve downstream performance (Reid et al., 2022). A natural question to ask is whether this performance gain can only be achieved with language pre-training, or can be achieved with simpler pre-training schemes which do not involve language. In this paper, we first show that language is not essential for improved performance, and indeed pre-training with synthetic IID data for a small number of updates can match the performance gains from pre-training with a large language corpus; moreover, pre-training with data generated by a one-step Markov chain can further improve the performance. Inspired by these experimental results, we then consider pre-training Conservative Q-Learning (CQL), a popular offline DRL algorithm, which is Q-learning-based and typically employs a Multi-Layer Perceptron (MLP) backbone. Surprisingly, pre-training with simple synthetic data for a small number of updates can also improve CQL, providing consistent performance improvement on D4RL Gym locomotion datasets. The results of this paper not only illustrate the importance of pre-training for offline DRL but also show that the pre-training data can be synthetic and generated with remarkably simple mechanisms.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (44)
  1. Improving fractal pre-training. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp.  1300–1309, 2022.
  2. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
  3. Decision transformer: Reinforcement learning via sequence modeling. Advances in neural information processing systems, 34, 2021.
  4. Bail: Best-action imitation learning for batch deep reinforcement learning. In Advances in Neural Information Processing Systems, volume 33, pp.  18353–18363, 2020.
  5. On the transferability of pre-trained language models: A study from artificial datasets. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pp.  10518–10525, 2022.
  6. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
  7. Decaf: A deep convolutional activation feature for generic visual recognition. In International conference on machine learning, pp. 647–655. PMLR, 2014.
  8. D4rl: Datasets for deep data-driven reinforcement learning. arXiv preprint arXiv:2004.07219, 2020.
  9. Off-policy deep reinforcement learning without exploration. In International Conference on Machine Learning, pp. 2052–2062. PMLR, 2019.
  10. Generalized decision transformer for offline hindsight information matching. arXiv preprint arXiv:2111.10364, 2021.
  11. On pre-training for visuo-motor control: Revisiting a learning-from-scratch baseline. arXiv preprint arXiv:2212.05749, 2022.
  12. Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems, 35:1820–1834, 2022a.
  13. Synthetic pre-training tasks for neural machine translation. arXiv preprint arXiv:2212.09864, 2022b.
  14. Synthetic pre-training tasks for neural machine translation. In Anna Rogers, Jordan Boyd-Graber, and Naoaki Okazaki (eds.), Findings of the Association for Computational Linguistics: ACL 2023, pp.  8080–8098, Toronto, Canada, July 2023. Association for Computational Linguistics. doi: 10.18653/v1/2023.findings-acl.512. URL https://aclanthology.org/2023.findings-acl.512.
  15. What makes imagenet good for transfer learning? arXiv preprint arXiv:1608.08614, 2016.
  16. When to trust your model: Model-based policy optimization. In Advances in Neural Information Processing Systems, pp. 12519–12530, 2019.
  17. Offline reinforcement learning as one big sequence modeling problem. Advances in neural information processing systems, 34, 2021.
  18. Language-driven representation learning for robotics. arXiv preprint arXiv:2302.12766, 2023.
  19. Pre-training without natural images. In Proceedings of the Asian Conference on Computer Vision, 2020.
  20. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  21. Do better imagenet models transfer better? In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.  2661–2671, 2019.
  22. Offline reinforcement learning with implicit q-learning. arXiv preprint arXiv:2110.06169, 2021.
  23. Does pretraining for summarization require knowledge transfer? arXiv preprint arXiv:2109.04953, 2021.
  24. Conservative q-learning for offline reinforcement learning. arXiv preprint arXiv:2006.04779, 2020.
  25. Offline reinforcement learning: Tutorial, review, and perspectives on open problems. arXiv preprint arXiv:2005.01643, 2020.
  26. A survey on transformers in reinforcement learning. arXiv preprint arXiv:2301.03044, 2023.
  27. Decoupled weight decay regularization. In International Conference on Learning Representations, 2017. URL https://api.semanticscholar.org/CorpusID:53592270.
  28. Pointer sentinel mixture models, 2016.
  29. R3m: A universal visual representation for robot manipulation. arXiv preprint arXiv:2203.12601, 2022.
  30. Learning music helps you read: Using transfer to study linguistic structure in language models. arXiv preprint arXiv:2004.14601, 2020.
  31. The unsurprising effectiveness of pre-trained vision models for control. arXiv preprint arXiv:2203.03580, 2022.
  32. Improving language understanding by generative pre-training. 2018.
  33. Real-world robot learning with masked visual pre-training. In Conference on Robot Learning, pp.  416–426. PMLR, 2023.
  34. Can wikipedia help offline reinforcement learning? arXiv preprint arXiv:2201.12122, 2022.
  35. Pretraining with artificial language: Studying transferable knowledge in language models. arXiv preprint arXiv:2203.10326, 2022.
  36. Rrl: Resnet as representation for reinforcement learning. In Self-Supervision for Reinforcement Learning Workshop-ICLR 2021, 2021.
  37. Shiro Takagi. On the effect of pre-training for transformer in different modality on offline reinforcement learning. arXiv preprint arXiv:2211.09817, 2022.
  38. Vrl3: A data-driven framework for visual deep reinforcement learning. Advances in Neural Information Processing Systems, 35:32974–32988, 2022.
  39. Transformers: State-of-the-art natural language processing. In Conference on Empirical Methods in Natural Language Processing: System Demonstrations, 2020.
  40. Lime: Learning inductive bias for primitives of mathematical reasoning. In International Conference on Machine Learning, pp. 11251–11262. PMLR, 2021.
  41. Insights into pre-training via simpler synthetic tasks. Advances in Neural Information Processing Systems, 35:21844–21857, 2022.
  42. Future-conditioned unsupervised pretraining for decision transformer. In International Conference on Machine Learning, pp. 38187–38203. PMLR, 2023.
  43. Representation matters: Offline pretraining for sequential decision making. arXiv preprint arXiv:2102.05815, 2021.
  44. A framework for efficient robotic manipulation. In Deep RL Workshop NeurIPS 2021, 2021.
Citations (4)

Summary

We haven't generated a summary for this paper yet.