Self-supervised Pretraining for Decision Foundation Model: Formulation, Pipeline and Challenges (2401.00031v2)
Abstract: Decision-making is a dynamic process requiring perception, memory, and reasoning to make choices and find optimal policies. Traditional approaches to decision-making suffer from sample efficiency and generalization, while large-scale self-supervised pretraining has enabled fast adaptation with fine-tuning or few-shot learning in language and vision. We thus argue to integrate knowledge acquired from generic large-scale self-supervised pretraining into downstream decision-making problems. We propose Pretrain-Then-Adapt pipeline and survey recent work on data collection, pretraining objectives and adaptation strategies for decision-making pretraining and downstream inference. Finally, we identify critical challenges and future directions for developing decision foundation model with the help of generic and flexible self-supervised pretraining.
- Waypoint transformer: Reinforcement learning via supervised learning with intermediate targets. arXiv preprint arXiv:2306.14069, 2023.
- Sequential modeling enables scalable learning for large vision models. arXiv preprint arXiv:2312.00785, 2023.
- Beit: Bert pre-training of image transformers. arXiv preprint arXiv:2106.08254, 2021.
- The arcade learning environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research, 47:253–279, 2013.
- Pasta: Pretrained action-state transformer agents. arXiv preprint arXiv:2307.10936, 2023.
- Rt-1: Robotics transformer for real-world control at scale. arXiv preprint arXiv:2212.06817, 2022.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
- Reprem: Representation pre-training with masked model for reinforcement learning. arXiv preprint arXiv:2303.01668, 2023.
- Uni [mask]: Unified inference in sequential decision problems. Advances in neural information processing systems, 35:35365–35378, 2022.
- Learning action representations for reinforcement learning. In International conference on machine learning, pages 941–950. PMLR, 2019.
- A simple framework for contrastive learning of visual representations. In International conference on machine learning, pages 1597–1607. PMLR, 2020.
- Decision transformer: Reinforcement learning via sequence modeling. Advances in neural information processing systems, 34:15084–15097, 2021.
- Wavlm: Large-scale self-supervised pre-training for full stack speech processing. IEEE Journal of Selected Topics in Signal Processing, 16(6):1505–1518, 2022.
- Minimalistic gridworld environment for openai gym. 2018.
- Minigrid & miniworld: Modular & customizable reinforcement learning environments for goal-oriented tasks. CoRR, abs/2306.13831, 2023.
- Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
- Diversity is all you need: Learning skills without a reward function. arXiv preprint arXiv:1802.06070, 2018.
- Robert M French. Catastrophic forgetting in connectionist networks. Trends in cognitive sciences, 3(4):128–135, 1999.
- D4rl: Datasets for deep data-driven reinforcement learning. arXiv preprint arXiv:2004.07219, 2020.
- Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9729–9738, 2020.
- Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685, 2021.
- Urlb: Unsupervised reinforcement learning benchmark. arXiv preprint arXiv:2110.15191, 2021.
- In-context reinforcement learning with algorithm distillation. arXiv preprint arXiv:2210.14215, 2022.
- Multi-game decision transformers. Advances in Neural Information Processing Systems, 35:27921–27936, 2022.
- Supervised pretraining can learn in-context reinforcement learning. arXiv preprint arXiv:2306.14892, 2023.
- Offline reinforcement learning: Tutorial, review, and perspectives on open problems. arXiv preprint arXiv:2005.01643, 2020.
- Prefix-tuning: Optimizing continuous prompts for generation. arXiv preprint arXiv:2101.00190, 2021.
- A deep reinforcement learning network for traffic light cycle control. IEEE Transactions on Vehicular Technology, 68(2):1243–1253, 2019.
- Masked autoencoding for scalable and generalizable decision making. Advances in Neural Information Processing Systems, 35:12608–12618, 2022.
- Few-shot parameter-efficient fine-tuning is better and cheaper than in-context learning. Advances in Neural Information Processing Systems, 35:1950–1965, 2022.
- P-tuning: Prompt tuning can be comparable to fine-tuning across scales and tasks. In Smaranda Muresan, Preslav Nakov, and Aline Villavicencio, editors, Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 61–68, Dublin, Ireland, May 2022. Association for Computational Linguistics.
- Vip: Towards universal visual reward and representation via value-implicit pre-training. arXiv preprint arXiv:2210.00030, 2022.
- Deep reinforcement learning for energy management in a microgrid with flexible demand. Sustainable Energy, Grids and Networks, 25:100413, 2021.
- OpenAI. Gpt-4 technical report. 2023.
- Curiosity-driven exploration by self-supervised prediction. In International conference on machine learning, pages 2778–2787. PMLR, 2017.
- Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 21(1):5485–5551, 2020.
- Learning complex dexterous manipulation with deep reinforcement learning and demonstrations. arXiv preprint arXiv:1709.10087, 2017.
- A generalist agent. arXiv preprint arXiv:2205.06175, 2022.
- Mark Bishop Ring et al. Continual learning in reinforcement environments. 1994.
- Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems, 34:12686–12699, 2021.
- Bigger, better, faster: Human-level atari with human-level efficiency. In International Conference on Machine Learning, pages 30365–30380. PMLR, 2023.
- Mastering the game of go without human knowledge. nature, 550(7676):354–359, 2017.
- Decoupling representation learning from reinforcement learning. In International Conference on Machine Learning, pages 9870–9879. PMLR, 2021.
- Smart: Self-supervised multi-task pretraining with control transformers. arXiv preprint arXiv:2301.09816, 2023.
- Reinforcement Learning: An Introduction. A Bradford Book, Cambridge, MA, USA, 2018.
- What about inputting policy in value function: Policy representation and policy-extended value function approximator. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 8441–8449, 2022.
- Deepmind control suite. arXiv preprint arXiv:1801.00690, 2018.
- Attention is all you need. Advances in neural information processing systems, 30, 2017.
- Meta-reinforcement learning based on self-supervised task representation learning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 10157–10165, 2023.
- Large-scale multi-modal pre-trained models: A comprehensive survey. Machine Intelligence Research, pages 1–36, 2023.
- Large sequence models for sequential decision-making: a survey. Frontiers of Computer Science, 17(6):176349, 2023.
- Masked trajectory models for prediction, representation, and control. arXiv preprint arXiv:2305.02968, 2023.
- Pretraining in deep reinforcement learning: A survey. arXiv preprint arXiv:2211.03959, 2022.
- Future-conditioned unsupervised pretraining for decision transformer. In International Conference on Machine Learning, pages 38187–38203. PMLR, 2023.
- Prompting decision transformer for few-shot policy generalization. In international conference on machine learning, pages 24631–24645. PMLR, 2022.
- Representation matters: Offline pretraining for sequential decision making. In International Conference on Machine Learning, pages 11784–11794. PMLR, 2021.
- Image augmentation is all you need: Regularizing deep reinforcement learning from pixels. In 9th International Conference on Learning Representations, ICLR 2021, 2021.
- Optimization of molecules via deep reinforcement learning. Scientific reports, 9(1):10752, 2019.
- Varibad: A very good method for bayes-adaptive deep rl via meta-learning. arXiv preprint arXiv:1910.08348, 2019.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.