Survey on Large Language Model-Enhanced Reinforcement Learning: Concept, Taxonomy, and Methods (2404.00282v3)
Abstract: With extensive pre-trained knowledge and high-level general capabilities, LLMs emerge as a promising avenue to augment reinforcement learning (RL) in aspects such as multi-task learning, sample efficiency, and high-level task planning. In this survey, we provide a comprehensive review of the existing literature in LLM-enhanced RL and summarize its characteristics compared to conventional RL methods, aiming to clarify the research scope and directions for future studies. Utilizing the classical agent-environment interaction paradigm, we propose a structured taxonomy to systematically categorize LLMs' functionalities in RL, including four roles: information processor, reward designer, decision-maker, and generator. For each role, we summarize the methodologies, analyze the specific RL challenges that are mitigated, and provide insights into future directions. Lastly, a comparative analysis of each role, potential applications, prospective opportunities, and challenges of the LLM-enhanced RL are discussed. By proposing this taxonomy, we aim to provide a framework for researchers to effectively leverage LLMs in the RL field, potentially accelerating RL applications in complex applications such as robotics, autonomous driving, and energy systems.
- J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” in North American Chapter of the Association for Computational Linguistics, 2019. [Online]. Available: https://api.semanticscholar.org/CorpusID:52967399
- A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” Communications of the ACM, vol. 60, pp. 84 – 90, 2012. [Online]. Available: https://api.semanticscholar.org/CorpusID:195908774
- V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller et al., “Human-level control through deep reinforcement learning,” Nature, vol. 518, no. 7540, pp. 529–533, 2015.
- D. Hafner, T. Lillicrap, I. Fischer, R. Villegas, D. Ha, H. Lee, and J. Davidson, “Dreamerv2: Mastering atari with discrete world models,” arXiv preprint arXiv:2106.04647, 2021.
- O. Vinyals, I. Babuschkin, W. M. Czarnecki, M. Mathieu, A. Dudzik, J. Chung, D. H. Choi, R. Powell et al., “Grandmaster level in starcraft ii using multi-agent reinforcement learning,” Nature, vol. 575, no. 7782, pp. 350–354, 2019.
- C. Berner, G. Brockman, B. Chan, V. Cheung, P. Déak, C. Dennison, D. Farhi, Q. Fischer et al., “Dota 2 with large scale deep reinforcement learning,” arXiv preprint arXiv:1912.06680, 2019.
- D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser, I. Antonoglou et al., “Mastering the game of go with deep neural networks and tree search,” Nature, vol. 529, no. 7587, pp. 484–489, 2016.
- J. Schrittwieser, I. Antonoglou, T. Hubert, K. Simonyan, L. Sifre, S. Schmitt, A. Guez, E. Lockhart et al., “Mastering atari, go, chess and shogi by planning with a learned model,” arXiv preprint arXiv:1911.08265, 2020.
- N. B. Schmid, A. Botev, A. Hennig, A. Lerer, Q. Wu, D. Yarats, J. Foerster, T. Rocktäschel et al., “Rebel: A general game playing ai,” Science, vol. 373, no. 6556, pp. 664–670, 2021.
- N. Brown, A. Lerer, S. Gross, and T. Sandholm, “Superhuman ai for multiplayer poker,” Science, vol. 365, no. 6456, pp. 885–890, 2020.
- J. Ibarz, J. Tan, C. Finn, M. Kalakrishnan, P. Pastor, and S. Levine, “How to train your robot with deep reinforcement learning: lessons we have learned,” The International Journal of Robotics Research, vol. 40, no. 4-5, pp. 698–721, 2021.
- J. Chen, B. Yuan, and M. Tomizuka, “Model-free deep reinforcement learning for urban autonomous driving,” in 2019 IEEE intelligent transportation systems conference (ITSC). IEEE, 2019, pp. 2765–2771.
- C. Yu, J. Liu, S. Nemati, and G. Yin, “Reinforcement learning in healthcare: A survey,” ACM Computing Surveys (CSUR), vol. 55, no. 1, pp. 1–36, 2021.
- A. Sharma, I. W. Lin, A. S. Miner, D. C. Atkins, and T. Althoff, “Towards facilitating empathic conversations in online mental health support: A reinforcement learning approach,” in Proceedings of the Web Conference 2021, 2021, pp. 194–205.
- A. Stooke, K. Lee, P. Abbeel, and M. Laskin, “Decoupling representation learning from reinforcement learning,” in International Conference on Machine Learning. PMLR, 2021, pp. 9870–9879.
- J. Luketina, N. Nardelli, G. Farquhar, J. Foerster, J. Andreas, E. Grefenstette, S. Whiteson, and T. Rocktäschel, “A Survey of Reinforcement Learning Informed by Natural Language,” Jun. 2019.
- A. Mandlekar, D. Xu, J. Wong, S. Nasiriany, C. Wang, R. Kulkarni, L. Fei-Fei, S. Savarese et al., “What matters in learning from offline human demonstrations for robot manipulation,” 2021.
- C. Lynch, A. Wahid, J. Tompson, T. Ding, J. Betker, R. Baruch, T. Armstrong, and P. Florence, “Interactive language: Talking to robots in real time,” 2022.
- Z. Yang, K. Ren, X. Luo, M. Liu, W. Liu, J. Bian, W. Zhang, and D. Li, “Towards applicable reinforcement learning: Improving the generalization and sample efficiency with policy ensemble,” in International Joint Conference on Artificial Intelligence, 2022. [Online]. Available: https://api.semanticscholar.org/CorpusID:248887230
- W. B. Knox, A. Allievi, H. Banzhaf, F. Schmitt, and P. Stone, “Reward (mis) design for autonomous driving,” Artificial Intelligence, vol. 316, p. 103829, 2023.
- F. Dworschak, S. Dietze, M. Wittmann, B. Schleich, and S. Wartzack, “Reinforcement learning for engineering design automation,” Advanced Engineering Informatics, vol. 52, p. 101612, 2022.
- S. Booth, W. B. Knox, J. Shah, S. Niekum, P. Stone, and A. Allievi, “The perils of trial-and-error reward design: misdesign through overfitting and invalid task specifications,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, no. 5, 2023, pp. 5920–5929.
- L. L. Di Langosco, J. Koch, L. D. Sharkey, J. Pfau, and D. Krueger, “Goal misgeneralization in deep reinforcement learning,” in International Conference on Machine Learning. PMLR, 2022, pp. 12 004–12 019.
- R. Yang, X. Sun, and K. Narasimhan, “A generalized algorithm for multi-objective reinforcement learning and policy adaptation,” Advances in neural information processing systems, vol. 32, 2019.
- J. Luketina, N. Nardelli, G. Farquhar, J. Foerster, J. Andreas, E. Grefenstette, S. Whiteson, and T. Rocktäschel, “A survey of reinforcement learning informed by natural language,” arXiv preprint arXiv:1906.03926, 2019.
- A. J. Thirunavukarasu, D. S. J. Ting, K. Elangovan, L. Gutierrez, T. F. Tan, and D. S. W. Ting, “Large language models in medicine,” Nature medicine, vol. 29, no. 8, pp. 1930–1940, 2023.
- D. A. Boiko, R. MacKnight, B. Kline, and G. Gomes, “Autonomous chemical research with large language models,” Nature, vol. 624, no. 7992, pp. 570–578, 2023.
- J. Liang, W. Huang, F. Xia, P. Xu, K. Hausman, B. Ichter, P. Florence, and A. Zeng, “Code as policies: Language model programs for embodied control,” in 2023 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2023, pp. 9493–9500.
- T. Webb, K. J. Holyoak, and H. Lu, “Emergent analogical reasoning in large language models,” Nature Human Behaviour, vol. 7, no. 9, pp. 1526–1541, 2023.
- J. Wei, J. Wei, Y. Tay, D. Tran, A. Webson, Y. Lu, X. Chen, H. Liu et al., “Larger language models do in-context learning differently,” arXiv preprint arXiv:2303.03846, 2023.
- J. Huang and K. C.-C. Chang, “Towards reasoning in large language models: A survey,” arXiv preprint arXiv:2212.10403, 2022.
- J. Yu, X. Wang, S. Tu, S. Cao, D. Zhang-Li, X. Lv, H. Peng, Z. Yao et al., “Kola: Carefully benchmarking world knowledge of large language models,” 2023.
- I. Singh, V. Blukis, A. Mousavian, A. Goyal, D. Xu, J. Tremblay, D. Fox, J. Thomason et al., “Progprompt: program generation for situated robot task planning using large language models,” Autonomous Robots, pp. 1–14, 2023.
- N. Stiennon, L. Ouyang, J. Wu, D. M. Ziegler, R. Lowe, C. Voss, A. Radford, D. Amodei et al., “Learning to summarize from human feedback,” 2022.
- N. Bian, X. Han, L. Sun, H. Lin, Y. Lu, and B. He, “Chatgpt is a knowledgeable but inexperienced solver: An investigation of commonsense problem in large language models,” 2023.
- E. Akyürek, D. Schuurmans, J. Andreas, T. Ma, and D. Zhou, “What learning algorithm is in-context learning? investigations with linear models,” 2023.
- Y. Du, O. Watkins, Z. Wang, C. Colas, T. Darrell, P. Abbeel, A. Gupta, and J. Andreas, “Guiding Pretraining in Reinforcement Learning with Large Language Models,” Sep. 2023.
- T. Carta, C. Romac, T. Wolf, S. Lamprier, O. Sigaud, and P.-Y. Oudeyer, “Grounding Large Language Models in Interactive Environments with Online Reinforcement Learning,” Sep. 2023.
- J. Lin, Y. Du, O. Watkins, D. Hafner, P. Abbeel, D. Klein, and A. Dragan, “Learning to Model the World with Language,” Jul. 2023.
- H. Li, X. Yang, Z. Wang, X. Zhu, J. Zhou, Y. Qiao, X. Wang, H. Li et al., “Auto mc-reward: Automated dense reward design with large language models for minecraft,” 2023.
- S. Chakraborty, K. Weerakoon, P. Poddar, M. Elnoor, P. Narayanan, C. Busart, P. Tokekar, A. S. Bedi et al., “RE-MOVE: An Adaptive Policy Design for Robotic Navigation Tasks in Dynamic Environments via Language-Based Feedback,” Sep. 2023.
- J.-C. Pang, X.-Y. Yang, S.-H. Yang, and Y. Yu, “Natural Language-conditioned Reinforcement Learning with Inside-out Task Language Development and Translation,” Feb. 2023.
- D. Kalashnikov, A. Irpan, P. Pastor, J. Ibarz, A. Herzog, E. Jang, D. Quillen, E. Holly et al., “Qt-opt: Scalable deep reinforcement learning for vision-based robotic manipulation,” 2018.
- K. Wang, B. Kang, J. Shao, and J. Feng, “Improving generalization in reinforcement learning with mixture regularization,” Advances in Neural Information Processing Systems, vol. 33, pp. 7968–7978, 2020.
- R. Devidze, P. Kamalaruban, and A. Singla, “Exploration-guided reward shaping for reinforcement learning under sparse rewards,” Advances in Neural Information Processing Systems, vol. 35, pp. 5829–5842, 2022.
- A. Singh, L. Yang, K. Hartikainen, C. Finn, and S. Levine, “End-to-end robotic reinforcement learning without reward engineering,” arXiv preprint arXiv:1904.07854, 2019.
- D. Hadfield-Menell, S. Milli, P. Abbeel, S. J. Russell, and A. Dragan, “Inverse reward design,” Advances in neural information processing systems, vol. 30, 2017.
- C. Xiao, Y. Wu, C. Ma, D. Schuurmans, and M. Müller, “Learning to combat compounding-error in model-based reinforcement learning,” arXiv preprint arXiv:1912.11206, 2019.
- T. M. Moerland, J. Broekens, A. Plaat, C. M. Jonker et al., “Model-based reinforcement learning: A survey,” Foundations and Trends® in Machine Learning, vol. 16, no. 1, pp. 1–118, 2023.
- T. Xiao, H. Chan, P. Sermanet, A. Wahid, A. Brohan, K. Hausman, S. Levine, and J. Tompson, “Robotic Skill Acquisition via Instruction Augmentation with Vision-Language Models,” Jul. 2023.
- H. Yuan, C. Zhang, H. Wang, F. Xie, P. Cai, H. Dong, and Z. Lu, “Plan4mc: Skill reinforcement learning and planning for open-world minecraft tasks,” arXiv preprint arXiv:2303.16563, 2023.
- J. Rocamonde, V. Montesinos, E. Nava, E. Perez, and D. Lindner, “Vision-Language Models are Zero-Shot Reward Models for Reinforcement Learning,” Oct. 2023.
- A. Vaswani, N. M. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need,” in Neural Information Processing Systems, 2017. [Online]. Available: https://api.semanticscholar.org/CorpusID:13756489
- M. Shanahan, “Talking about large language models,” ArXiv, vol. abs/2212.03551, 2022. [Online]. Available: https://api.semanticscholar.org/CorpusID:254366666
- T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam et al., “Language models are few-shot learners,” Advances in neural information processing systems, vol. 33, pp. 1877–1901, 2020.
- R. Taylor, M. Kardas, G. Cucurull, T. Scialom, A. S. Hartshorn, E. Saravia, A. Poulton, V. Kerkez et al., “Galactica: A large language model for science,” ArXiv, vol. abs/2211.09085, 2022. [Online]. Available: https://api.semanticscholar.org/CorpusID:253553203
- A. Chowdhery, S. Narang, J. Devlin, M. Bosma, G. Mishra, A. Roberts, P. Barham, H. W. Chung et al., “Palm: Scaling language modeling with pathways,” J. Mach. Learn. Res., vol. 24, pp. 240:1–240:113, 2022. [Online]. Available: https://api.semanticscholar.org/CorpusID:247951931
- H. Touvron, T. Lavril, G. Izacard, X. Martinet, M.-A. Lachaux, T. Lacroix, B. Rozière, N. Goyal et al., “Llama: Open and efficient foundation language models,” ArXiv, vol. abs/2302.13971, 2023. [Online]. Available: https://api.semanticscholar.org/CorpusID:257219404
- J. Kaplan, S. McCandlish, T. J. Henighan, T. B. Brown, B. Chess, R. Child, S. Gray, A. Radford et al., “Scaling laws for neural language models,” ArXiv, vol. abs/2001.08361, 2020. [Online]. Available: https://api.semanticscholar.org/CorpusID:210861095
- J. Hoffmann, S. Borgeaud, A. Mensch, E. Buchatskaya, T. Cai, E. Rutherford, D. de Las Casas, L. A. Hendricks et al., “Training compute-optimal large language models,” ArXiv, vol. abs/2203.15556, 2022. [Online]. Available: https://api.semanticscholar.org/CorpusID:247778764
- J. Wei, Y. Tay, R. Bommasani, C. Raffel, B. Zoph, S. Borgeaud, D. Yogatama, M. Bosma et al., “Emergent abilities of large language models,” Trans. Mach. Learn. Res., vol. 2022, 2022. [Online]. Available: https://api.semanticscholar.org/CorpusID:249674500
- V. Sanh, A. Webson, C. Raffel, S. H. Bach, L. Sutawika, Z. Alyafeai, A. Chaffin, A. Stiegler et al., “Multitask prompted training enables zero-shot task generalization,” arXiv preprint arXiv:2110.08207, 2021.
- L. Ouyang, J. Wu, X. Jiang, D. Almeida, C. L. Wainwright, P. Mishkin, C. Zhang, S. Agarwal et al., “Training language models to follow instructions with human feedback,” 2022.
- J. Wei, X. Wang, D. Schuurmans, M. Bosma, B. Ichter, F. Xia, E. Chi, Q. Le et al., “Chain-of-thought prompting elicits reasoning in large language models,” 2023.
- S. Yao, D. Yu, J. Zhao, I. Shafran, T. L. Griffiths, Y. Cao, and K. Narasimhan, “Tree of thoughts: Deliberate problem solving with large language models,” 2023.
- M. Besta, N. Blach, A. Kubicek, R. Gerstenberger, L. Gianinazzi, J. Gajda, T. Lehmann, M. Podstawski et al., “Graph of thoughts: Solving elaborate problems with large language models,” 2023.
- Z. Rao, Y. Wu, Z. Yang, W. Zhang, S. Lu, W. Lu, and Z. Zha, “Visual navigation with multiple goals based on deep reinforcement learning,” IEEE Transactions on Neural Networks and Learning Systems, vol. 32, no. 12, pp. 5445–5455, 2021.
- C. Huang, R. Zhang, M. Ouyang, P. Wei, J. Lin, J. Su, and L. Lin, “Deductive reinforcement learning for visual autonomous urban driving navigation,” IEEE Transactions on Neural Networks and Learning Systems, vol. 32, no. 12, pp. 5379–5391, 2021.
- M. Yang, W. Huang, W. Tu, Q. Qu, Y. Shen, and K. Lei, “Multitask learning and reinforcement learning for personalized dialog generation: An empirical study,” IEEE transactions on neural networks and learning systems, vol. 32, no. 1, pp. 49–62, 2020.
- Z. He, J. Li, F. Wu, H. Shi, and K.-S. Hwang, “Derl: Coupling decomposition in action space for reinforcement learning task,” IEEE Transactions on Emerging Topics in Computational Intelligence, 2023.
- Y. Liu, Y. Zhang, Y. Wang, F. Hou, J. Yuan, J. Tian, Y. Zhang, Z. Shi et al., “A survey of visual transformers,” IEEE Transactions on Neural Networks and Learning Systems, 2023.
- A. Srinivas, M. Laskin, and P. Abbeel, “CURL: Contrastive Unsupervised Representations for Reinforcement Learning,” Sep. 2020.
- M. Schwarzer, A. Anand, R. Goel, R. D. Hjelm, A. Courville, and P. Bachman, “Data-efficient reinforcement learning with self-predictive representations,” arXiv preprint arXiv:2007.05929, 2020.
- F. Paischer, T. Adler, V. Patil, A. Bitto-Nemling, M. Holzleitner, S. Lehner, H. Eghbal-zadeh, and S. Hochreiter, “History Compression via Language Models in Reinforcement Learning,” Feb. 2023.
- F. Paischer, T. Adler, M. Hofmarcher, and S. Hochreiter, “Semantic HELM: A Human-Readable Memory for Reinforcement Learning.”
- A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell et al., “Learning transferable visual models from natural language supervision,” 2021.
- A. van den Oord, Y. Li, and O. Vinyals, “Representation learning with contrastive predictive coding,” 2019.
- W. Choi, W. K. Kim, S. Kim, and H. Woo, “Efficient Policy Adaptation with Contrastive Prompt Ensemble for Embodied Agents,” in Thirty-Seventh Conference on Neural Information Processing Systems, Nov. 2023.
- R. P. K. Poudel, H. Pandya, S. Liwicki, and R. Cipolla, “ReCoRe: Regularized Contrastive Representation Learning of World Model,” Dec. 2023.
- “STARLING: Self-supervised Training of Text-based Reinforcement Learning Agent with Large Language Models,” in The Twelfth International Conference on Learning Representations, Oct. 2023.
- R. Patel, E. Pavlick, and S. Tellex, “Grounding language to non-markovian tasks with no supervision of task specifications.” in Robotics: Science and Systems, vol. 2020, 2020.
- T. R. Sumers, M. K. Ho, R. D. Hawkins, K. Narasimhan, and T. L. Griffiths, “Learning rewards from linguistic feedback,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 7, 2021, pp. 6002–6010.
- J. Liang, W. Huang, F. Xia, P. Xu, K. Hausman, B. Ichter, P. Florence, and A. Zeng, “Code as Policies: Language Model Programs for Embodied Control,” May 2023.
- C. H. Song, J. Wu, C. Washington, B. M. Sadler, W.-L. Chao, and Y. Su, “Llm-planner: Few-shot grounded planning for embodied agents with large language models,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 2998–3009.
- “Informing Reinforcement Learning Agents by Grounding Natural Language to Markov Decision Processes,” Oct. 2023.
- J. Eschmann, “Reward function design in reinforcement learning,” Reinforcement Learning Algorithms: Analysis and Applications, pp. 25–33, 2021.
- J. Andreas, D. Klein, and S. Levine, “Modular multitask reinforcement learning with policy sketches,” in International conference on machine learning. PMLR, 2017, pp. 166–175.
- S. Mirchandani, S. Karamcheti, and D. Sadigh, “Ella: Exploration through learned language abstraction,” Advances in Neural Information Processing Systems, vol. 34, pp. 29 529–29 540, 2021.
- M. Kwon, S. M. Xie, K. Bullard, and D. Sadigh, “Reward Design with Language Models,” in The Eleventh International Conference on Learning Representations, Sep. 2022.
- Y. Wu, Y. Fan, P. P. Liang, A. Azaria, Y. Li, and T. M. Mitchell, “Read and Reap the Rewards: Learning to Play Atari with the Help of Instruction Manuals,” Oct. 2023.
- K. Chu, X. Zhao, C. Weber, M. Li, and S. Wermter, “Accelerating Reinforcement Learning of Robotic Manipulations via Feedback from Large Language Models,” Nov. 2023.
- C. Kim, Y. Seo, H. Liu, L. Lee, J. Shin, H. Lee, and K. Lee, “Guide Your Agent with Adaptive Multimodal Rewards,” Oct. 2023.
- “Language Reward Modulation for Pretraining Reinforcement Learning,” in The Twelfth International Conference on Learning Representations, Oct. 2023.
- W. Yu, N. Gileadi, C. Fu, S. Kirmani, K.-H. Lee, M. G. Arenas, H.-T. L. Chiang, T. Erez et al., “Language to Rewards for Robotic Skill Synthesis.”
- A. Madaan, N. Tandon, P. Gupta, S. Hallinan, L. Gao, S. Wiegreffe, U. Alon, N. Dziri et al., “Self-refine: Iterative refinement with self-feedback,” 2023.
- J. Song, Z. Zhou, J. Liu, C. Fang, Z. Shu, and L. Ma, “Self-Refined Large Language Model as Automated Reward Function Designer for Deep Reinforcement Learning in Robotics,” Oct. 2023.
- Y. J. Ma, W. Liang, G. Wang, D.-A. Huang, O. Bastani, D. Jayaraman, Y. Zhu, L. Fan et al., “Eureka: Human-Level Reward Design via Coding Large Language Models,” Oct. 2023.
- “Text2Reward: Reward Shaping with Language Models for Reinforcement Learning,” in The Twelfth International Conference on Learning Representations, Oct. 2023.
- Y. Inoue and H. Ohashi, “Prompter: Utilizing large language model prompting for a data efficient embodied instruction following,” arXiv preprint arXiv:2211.03267, 2022.
- A. Majumdar, A. Shrivastava, S. Lee, P. Anderson, D. Parikh, and D. Batra, “Improving vision-and-language navigation with image-text pairs from the web,” in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part VI 16. Springer, 2020, pp. 259–274.
- M. Ahn, A. Brohan, N. Brown, Y. Chebotar, O. Cortes, B. David, C. Finn, C. Fu et al., “Do As I Can, Not As I Say: Grounding Language in Robotic Affordances,” Aug. 2022.
- M. Janner, Q. Li, and S. Levine, “Offline reinforcement learning as one big sequence modeling problem,” Advances in neural information processing systems, vol. 34, pp. 1273–1286, 2021.
- W. Li, H. Luo, Z. Lin, C. Zhang, Z. Lu, and D. Ye, “A survey on transformers in reinforcement learning,” arXiv preprint arXiv:2301.03044, 2023.
- R. Shi, Y. Liu, Y. Ze, S. S. Du, and H. Xu, “Unleashing the Power of Pre-trained Language Models for Offline Reinforcement Learning,” Nov. 2023.
- M. Reid, Y. Yamada, and S. S. Gu, “Can Wikipedia Help Offline Reinforcement Learning?” Jul. 2022.
- S. Li, X. Puig, C. Paxton, Y. Du, C. Wang, L. Fan, T. Chen, D.-A. Huang et al., “Pre-Trained Language Models for Interactive Decision-Making,” Oct. 2022.
- L. Mezghani, P. Bojanowski, K. Alahari, and S. Sukhbaatar, “Think Before You Act: Unified Policy for Interleaving Language Reasoning with Actions,” Apr. 2023.
- J. Grigsby, L. Fan, and Y. Zhu, “AMAGO: Scalable In-Context Reinforcement Learning for Adaptive Agents,” Dec. 2023.
- A. Brohan, N. Brown, J. Carbajal, Y. Chebotar, X. Chen, K. Choromanski, T. Ding, D. Driess et al., “RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control.”
- S. Yao, R. Rao, M. Hausknecht, and K. Narasimhan, “Keep CALM and Explore: Language Models for Action Generation in Text-based Games,” Oct. 2020.
- H. Hu and D. Sadigh, “Language Instructed Reinforcement Learning for Human-AI Coordination,” Jun. 2023.
- Z. Zhou, B. Hu, P. Zhang, C. Zhao, and B. Liu, “Large Language Model is a Good Policy Teacher for Training Reinforcement Learning Agents,” Nov. 2023.
- E. J. Hu, Y. Shen, P. Wallis, Z. Allen-Zhu, Y. Li, S. Wang, L. Wang, and W. Chen, “Lora: Low-rank adaptation of large language models,” 2021.
- D. Hafner, T. Lillicrap, J. Ba, and M. Norouzi, “Dream to Control: Learning Behaviors by Latent Imagination,” Mar. 2020.
- D. Hafner, T. Lillicrap, M. Norouzi, and J. Ba, “Mastering Atari with Discrete World Models,” Feb. 2022.
- D. Hafner, T. Lillicrap, J. Ba, and M. Norouzi, “Dream to control: Learning behaviors by latent imagination,” 2020.
- Y. Matsuo, Y. LeCun, M. Sahani, D. Precup, D. Silver, M. Sugiyama, E. Uchibe, and J. Morimoto, “Deep learning, reinforcement learning, and world models,” Neural Networks, vol. 152, pp. 267–275, 2022.
- L. Chen, K. Lu, A. Rajeswaran, K. Lee, A. Grover, M. Laskin, P. Abbeel, A. Srinivas et al., “Decision transformer: Reinforcement learning via sequence modeling,” Advances in neural information processing systems, vol. 34, pp. 15 084–15 097, 2021.
- V. Micheli, E. Alonso, and F. Fleuret, “Transformers are Sample-Efficient World Models,” in The Eleventh International Conference on Learning Representations, Sep. 2022.
- J. Robine, M. Höftmann, T. Uelwer, and S. Harmeling, “Transformer-based World Models Are Happy With 100k Interactions,” Mar. 2023.
- C. Chen, Y.-F. Wu, J. Yoon, and S. Ahn, “TransDreamer: Reinforcement Learning with Transformer World Models,” Feb. 2022.
- Y. Seo, K. Lee, S. James, and P. Abbeel, “Reinforcement Learning with Action-Free Pre-Training from Videos,” Jun. 2022.
- Y. Seo, D. Hafner, H. Liu, F. Liu, S. James, K. Lee, and P. Abbeel, “Masked World Models for Visual Control,” May 2023.
- R. P. K. Poudel, H. Pandya, C. Zhang, and R. Cipolla, “LanGWM: Language Grounded World Model,” Nov. 2023.
- S. Milani, N. Topin, M. Veloso, and F. Fang, “A survey of explainable reinforcement learning,” 2022.
- D. Das, S. Chernova, and B. Kim, “State2explanation: Concept-based explanations to benefit agent learning and user understanding,” in Thirty-seventh Conference on Neural Information Processing Systems, 2023. [Online]. Available: https://openreview.net/forum?id=xGz0wAIJrS
- “Understanding Language in the World by Predicting the Future,” in The Twelfth International Conference on Learning Representations, Oct. 2023.
- Yuji Cao (8 papers)
- Huan Zhao (109 papers)
- Yuheng Cheng (10 papers)
- Ting Shu (7 papers)
- Guolong Liu (7 papers)
- Gaoqi Liang (7 papers)
- Junhua Zhao (22 papers)
- Yun Li (154 papers)
- Yue Chen (236 papers)
- Jinyue Yan (12 papers)