Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
51 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Survey on Large Language Model-Enhanced Reinforcement Learning: Concept, Taxonomy, and Methods (2404.00282v3)

Published 30 Mar 2024 in cs.LG, cs.AI, cs.CL, and cs.RO

Abstract: With extensive pre-trained knowledge and high-level general capabilities, LLMs emerge as a promising avenue to augment reinforcement learning (RL) in aspects such as multi-task learning, sample efficiency, and high-level task planning. In this survey, we provide a comprehensive review of the existing literature in LLM-enhanced RL and summarize its characteristics compared to conventional RL methods, aiming to clarify the research scope and directions for future studies. Utilizing the classical agent-environment interaction paradigm, we propose a structured taxonomy to systematically categorize LLMs' functionalities in RL, including four roles: information processor, reward designer, decision-maker, and generator. For each role, we summarize the methodologies, analyze the specific RL challenges that are mitigated, and provide insights into future directions. Lastly, a comparative analysis of each role, potential applications, prospective opportunities, and challenges of the LLM-enhanced RL are discussed. By proposing this taxonomy, we aim to provide a framework for researchers to effectively leverage LLMs in the RL field, potentially accelerating RL applications in complex applications such as robotics, autonomous driving, and energy systems.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (127)
  1. J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” in North American Chapter of the Association for Computational Linguistics, 2019. [Online]. Available: https://api.semanticscholar.org/CorpusID:52967399
  2. A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” Communications of the ACM, vol. 60, pp. 84 – 90, 2012. [Online]. Available: https://api.semanticscholar.org/CorpusID:195908774
  3. V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller et al., “Human-level control through deep reinforcement learning,” Nature, vol. 518, no. 7540, pp. 529–533, 2015.
  4. D. Hafner, T. Lillicrap, I. Fischer, R. Villegas, D. Ha, H. Lee, and J. Davidson, “Dreamerv2: Mastering atari with discrete world models,” arXiv preprint arXiv:2106.04647, 2021.
  5. O. Vinyals, I. Babuschkin, W. M. Czarnecki, M. Mathieu, A. Dudzik, J. Chung, D. H. Choi, R. Powell et al., “Grandmaster level in starcraft ii using multi-agent reinforcement learning,” Nature, vol. 575, no. 7782, pp. 350–354, 2019.
  6. C. Berner, G. Brockman, B. Chan, V. Cheung, P. Déak, C. Dennison, D. Farhi, Q. Fischer et al., “Dota 2 with large scale deep reinforcement learning,” arXiv preprint arXiv:1912.06680, 2019.
  7. D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser, I. Antonoglou et al., “Mastering the game of go with deep neural networks and tree search,” Nature, vol. 529, no. 7587, pp. 484–489, 2016.
  8. J. Schrittwieser, I. Antonoglou, T. Hubert, K. Simonyan, L. Sifre, S. Schmitt, A. Guez, E. Lockhart et al., “Mastering atari, go, chess and shogi by planning with a learned model,” arXiv preprint arXiv:1911.08265, 2020.
  9. N. B. Schmid, A. Botev, A. Hennig, A. Lerer, Q. Wu, D. Yarats, J. Foerster, T. Rocktäschel et al., “Rebel: A general game playing ai,” Science, vol. 373, no. 6556, pp. 664–670, 2021.
  10. N. Brown, A. Lerer, S. Gross, and T. Sandholm, “Superhuman ai for multiplayer poker,” Science, vol. 365, no. 6456, pp. 885–890, 2020.
  11. J. Ibarz, J. Tan, C. Finn, M. Kalakrishnan, P. Pastor, and S. Levine, “How to train your robot with deep reinforcement learning: lessons we have learned,” The International Journal of Robotics Research, vol. 40, no. 4-5, pp. 698–721, 2021.
  12. J. Chen, B. Yuan, and M. Tomizuka, “Model-free deep reinforcement learning for urban autonomous driving,” in 2019 IEEE intelligent transportation systems conference (ITSC).   IEEE, 2019, pp. 2765–2771.
  13. C. Yu, J. Liu, S. Nemati, and G. Yin, “Reinforcement learning in healthcare: A survey,” ACM Computing Surveys (CSUR), vol. 55, no. 1, pp. 1–36, 2021.
  14. A. Sharma, I. W. Lin, A. S. Miner, D. C. Atkins, and T. Althoff, “Towards facilitating empathic conversations in online mental health support: A reinforcement learning approach,” in Proceedings of the Web Conference 2021, 2021, pp. 194–205.
  15. A. Stooke, K. Lee, P. Abbeel, and M. Laskin, “Decoupling representation learning from reinforcement learning,” in International Conference on Machine Learning.   PMLR, 2021, pp. 9870–9879.
  16. J. Luketina, N. Nardelli, G. Farquhar, J. Foerster, J. Andreas, E. Grefenstette, S. Whiteson, and T. Rocktäschel, “A Survey of Reinforcement Learning Informed by Natural Language,” Jun. 2019.
  17. A. Mandlekar, D. Xu, J. Wong, S. Nasiriany, C. Wang, R. Kulkarni, L. Fei-Fei, S. Savarese et al., “What matters in learning from offline human demonstrations for robot manipulation,” 2021.
  18. C. Lynch, A. Wahid, J. Tompson, T. Ding, J. Betker, R. Baruch, T. Armstrong, and P. Florence, “Interactive language: Talking to robots in real time,” 2022.
  19. Z. Yang, K. Ren, X. Luo, M. Liu, W. Liu, J. Bian, W. Zhang, and D. Li, “Towards applicable reinforcement learning: Improving the generalization and sample efficiency with policy ensemble,” in International Joint Conference on Artificial Intelligence, 2022. [Online]. Available: https://api.semanticscholar.org/CorpusID:248887230
  20. W. B. Knox, A. Allievi, H. Banzhaf, F. Schmitt, and P. Stone, “Reward (mis) design for autonomous driving,” Artificial Intelligence, vol. 316, p. 103829, 2023.
  21. F. Dworschak, S. Dietze, M. Wittmann, B. Schleich, and S. Wartzack, “Reinforcement learning for engineering design automation,” Advanced Engineering Informatics, vol. 52, p. 101612, 2022.
  22. S. Booth, W. B. Knox, J. Shah, S. Niekum, P. Stone, and A. Allievi, “The perils of trial-and-error reward design: misdesign through overfitting and invalid task specifications,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, no. 5, 2023, pp. 5920–5929.
  23. L. L. Di Langosco, J. Koch, L. D. Sharkey, J. Pfau, and D. Krueger, “Goal misgeneralization in deep reinforcement learning,” in International Conference on Machine Learning.   PMLR, 2022, pp. 12 004–12 019.
  24. R. Yang, X. Sun, and K. Narasimhan, “A generalized algorithm for multi-objective reinforcement learning and policy adaptation,” Advances in neural information processing systems, vol. 32, 2019.
  25. J. Luketina, N. Nardelli, G. Farquhar, J. Foerster, J. Andreas, E. Grefenstette, S. Whiteson, and T. Rocktäschel, “A survey of reinforcement learning informed by natural language,” arXiv preprint arXiv:1906.03926, 2019.
  26. A. J. Thirunavukarasu, D. S. J. Ting, K. Elangovan, L. Gutierrez, T. F. Tan, and D. S. W. Ting, “Large language models in medicine,” Nature medicine, vol. 29, no. 8, pp. 1930–1940, 2023.
  27. D. A. Boiko, R. MacKnight, B. Kline, and G. Gomes, “Autonomous chemical research with large language models,” Nature, vol. 624, no. 7992, pp. 570–578, 2023.
  28. J. Liang, W. Huang, F. Xia, P. Xu, K. Hausman, B. Ichter, P. Florence, and A. Zeng, “Code as policies: Language model programs for embodied control,” in 2023 IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2023, pp. 9493–9500.
  29. T. Webb, K. J. Holyoak, and H. Lu, “Emergent analogical reasoning in large language models,” Nature Human Behaviour, vol. 7, no. 9, pp. 1526–1541, 2023.
  30. J. Wei, J. Wei, Y. Tay, D. Tran, A. Webson, Y. Lu, X. Chen, H. Liu et al., “Larger language models do in-context learning differently,” arXiv preprint arXiv:2303.03846, 2023.
  31. J. Huang and K. C.-C. Chang, “Towards reasoning in large language models: A survey,” arXiv preprint arXiv:2212.10403, 2022.
  32. J. Yu, X. Wang, S. Tu, S. Cao, D. Zhang-Li, X. Lv, H. Peng, Z. Yao et al., “Kola: Carefully benchmarking world knowledge of large language models,” 2023.
  33. I. Singh, V. Blukis, A. Mousavian, A. Goyal, D. Xu, J. Tremblay, D. Fox, J. Thomason et al., “Progprompt: program generation for situated robot task planning using large language models,” Autonomous Robots, pp. 1–14, 2023.
  34. N. Stiennon, L. Ouyang, J. Wu, D. M. Ziegler, R. Lowe, C. Voss, A. Radford, D. Amodei et al., “Learning to summarize from human feedback,” 2022.
  35. N. Bian, X. Han, L. Sun, H. Lin, Y. Lu, and B. He, “Chatgpt is a knowledgeable but inexperienced solver: An investigation of commonsense problem in large language models,” 2023.
  36. E. Akyürek, D. Schuurmans, J. Andreas, T. Ma, and D. Zhou, “What learning algorithm is in-context learning? investigations with linear models,” 2023.
  37. Y. Du, O. Watkins, Z. Wang, C. Colas, T. Darrell, P. Abbeel, A. Gupta, and J. Andreas, “Guiding Pretraining in Reinforcement Learning with Large Language Models,” Sep. 2023.
  38. T. Carta, C. Romac, T. Wolf, S. Lamprier, O. Sigaud, and P.-Y. Oudeyer, “Grounding Large Language Models in Interactive Environments with Online Reinforcement Learning,” Sep. 2023.
  39. J. Lin, Y. Du, O. Watkins, D. Hafner, P. Abbeel, D. Klein, and A. Dragan, “Learning to Model the World with Language,” Jul. 2023.
  40. H. Li, X. Yang, Z. Wang, X. Zhu, J. Zhou, Y. Qiao, X. Wang, H. Li et al., “Auto mc-reward: Automated dense reward design with large language models for minecraft,” 2023.
  41. S. Chakraborty, K. Weerakoon, P. Poddar, M. Elnoor, P. Narayanan, C. Busart, P. Tokekar, A. S. Bedi et al., “RE-MOVE: An Adaptive Policy Design for Robotic Navigation Tasks in Dynamic Environments via Language-Based Feedback,” Sep. 2023.
  42. J.-C. Pang, X.-Y. Yang, S.-H. Yang, and Y. Yu, “Natural Language-conditioned Reinforcement Learning with Inside-out Task Language Development and Translation,” Feb. 2023.
  43. D. Kalashnikov, A. Irpan, P. Pastor, J. Ibarz, A. Herzog, E. Jang, D. Quillen, E. Holly et al., “Qt-opt: Scalable deep reinforcement learning for vision-based robotic manipulation,” 2018.
  44. K. Wang, B. Kang, J. Shao, and J. Feng, “Improving generalization in reinforcement learning with mixture regularization,” Advances in Neural Information Processing Systems, vol. 33, pp. 7968–7978, 2020.
  45. R. Devidze, P. Kamalaruban, and A. Singla, “Exploration-guided reward shaping for reinforcement learning under sparse rewards,” Advances in Neural Information Processing Systems, vol. 35, pp. 5829–5842, 2022.
  46. A. Singh, L. Yang, K. Hartikainen, C. Finn, and S. Levine, “End-to-end robotic reinforcement learning without reward engineering,” arXiv preprint arXiv:1904.07854, 2019.
  47. D. Hadfield-Menell, S. Milli, P. Abbeel, S. J. Russell, and A. Dragan, “Inverse reward design,” Advances in neural information processing systems, vol. 30, 2017.
  48. C. Xiao, Y. Wu, C. Ma, D. Schuurmans, and M. Müller, “Learning to combat compounding-error in model-based reinforcement learning,” arXiv preprint arXiv:1912.11206, 2019.
  49. T. M. Moerland, J. Broekens, A. Plaat, C. M. Jonker et al., “Model-based reinforcement learning: A survey,” Foundations and Trends® in Machine Learning, vol. 16, no. 1, pp. 1–118, 2023.
  50. T. Xiao, H. Chan, P. Sermanet, A. Wahid, A. Brohan, K. Hausman, S. Levine, and J. Tompson, “Robotic Skill Acquisition via Instruction Augmentation with Vision-Language Models,” Jul. 2023.
  51. H. Yuan, C. Zhang, H. Wang, F. Xie, P. Cai, H. Dong, and Z. Lu, “Plan4mc: Skill reinforcement learning and planning for open-world minecraft tasks,” arXiv preprint arXiv:2303.16563, 2023.
  52. J. Rocamonde, V. Montesinos, E. Nava, E. Perez, and D. Lindner, “Vision-Language Models are Zero-Shot Reward Models for Reinforcement Learning,” Oct. 2023.
  53. A. Vaswani, N. M. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need,” in Neural Information Processing Systems, 2017. [Online]. Available: https://api.semanticscholar.org/CorpusID:13756489
  54. M. Shanahan, “Talking about large language models,” ArXiv, vol. abs/2212.03551, 2022. [Online]. Available: https://api.semanticscholar.org/CorpusID:254366666
  55. T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam et al., “Language models are few-shot learners,” Advances in neural information processing systems, vol. 33, pp. 1877–1901, 2020.
  56. R. Taylor, M. Kardas, G. Cucurull, T. Scialom, A. S. Hartshorn, E. Saravia, A. Poulton, V. Kerkez et al., “Galactica: A large language model for science,” ArXiv, vol. abs/2211.09085, 2022. [Online]. Available: https://api.semanticscholar.org/CorpusID:253553203
  57. A. Chowdhery, S. Narang, J. Devlin, M. Bosma, G. Mishra, A. Roberts, P. Barham, H. W. Chung et al., “Palm: Scaling language modeling with pathways,” J. Mach. Learn. Res., vol. 24, pp. 240:1–240:113, 2022. [Online]. Available: https://api.semanticscholar.org/CorpusID:247951931
  58. H. Touvron, T. Lavril, G. Izacard, X. Martinet, M.-A. Lachaux, T. Lacroix, B. Rozière, N. Goyal et al., “Llama: Open and efficient foundation language models,” ArXiv, vol. abs/2302.13971, 2023. [Online]. Available: https://api.semanticscholar.org/CorpusID:257219404
  59. J. Kaplan, S. McCandlish, T. J. Henighan, T. B. Brown, B. Chess, R. Child, S. Gray, A. Radford et al., “Scaling laws for neural language models,” ArXiv, vol. abs/2001.08361, 2020. [Online]. Available: https://api.semanticscholar.org/CorpusID:210861095
  60. J. Hoffmann, S. Borgeaud, A. Mensch, E. Buchatskaya, T. Cai, E. Rutherford, D. de Las Casas, L. A. Hendricks et al., “Training compute-optimal large language models,” ArXiv, vol. abs/2203.15556, 2022. [Online]. Available: https://api.semanticscholar.org/CorpusID:247778764
  61. J. Wei, Y. Tay, R. Bommasani, C. Raffel, B. Zoph, S. Borgeaud, D. Yogatama, M. Bosma et al., “Emergent abilities of large language models,” Trans. Mach. Learn. Res., vol. 2022, 2022. [Online]. Available: https://api.semanticscholar.org/CorpusID:249674500
  62. V. Sanh, A. Webson, C. Raffel, S. H. Bach, L. Sutawika, Z. Alyafeai, A. Chaffin, A. Stiegler et al., “Multitask prompted training enables zero-shot task generalization,” arXiv preprint arXiv:2110.08207, 2021.
  63. L. Ouyang, J. Wu, X. Jiang, D. Almeida, C. L. Wainwright, P. Mishkin, C. Zhang, S. Agarwal et al., “Training language models to follow instructions with human feedback,” 2022.
  64. J. Wei, X. Wang, D. Schuurmans, M. Bosma, B. Ichter, F. Xia, E. Chi, Q. Le et al., “Chain-of-thought prompting elicits reasoning in large language models,” 2023.
  65. S. Yao, D. Yu, J. Zhao, I. Shafran, T. L. Griffiths, Y. Cao, and K. Narasimhan, “Tree of thoughts: Deliberate problem solving with large language models,” 2023.
  66. M. Besta, N. Blach, A. Kubicek, R. Gerstenberger, L. Gianinazzi, J. Gajda, T. Lehmann, M. Podstawski et al., “Graph of thoughts: Solving elaborate problems with large language models,” 2023.
  67. Z. Rao, Y. Wu, Z. Yang, W. Zhang, S. Lu, W. Lu, and Z. Zha, “Visual navigation with multiple goals based on deep reinforcement learning,” IEEE Transactions on Neural Networks and Learning Systems, vol. 32, no. 12, pp. 5445–5455, 2021.
  68. C. Huang, R. Zhang, M. Ouyang, P. Wei, J. Lin, J. Su, and L. Lin, “Deductive reinforcement learning for visual autonomous urban driving navigation,” IEEE Transactions on Neural Networks and Learning Systems, vol. 32, no. 12, pp. 5379–5391, 2021.
  69. M. Yang, W. Huang, W. Tu, Q. Qu, Y. Shen, and K. Lei, “Multitask learning and reinforcement learning for personalized dialog generation: An empirical study,” IEEE transactions on neural networks and learning systems, vol. 32, no. 1, pp. 49–62, 2020.
  70. Z. He, J. Li, F. Wu, H. Shi, and K.-S. Hwang, “Derl: Coupling decomposition in action space for reinforcement learning task,” IEEE Transactions on Emerging Topics in Computational Intelligence, 2023.
  71. Y. Liu, Y. Zhang, Y. Wang, F. Hou, J. Yuan, J. Tian, Y. Zhang, Z. Shi et al., “A survey of visual transformers,” IEEE Transactions on Neural Networks and Learning Systems, 2023.
  72. A. Srinivas, M. Laskin, and P. Abbeel, “CURL: Contrastive Unsupervised Representations for Reinforcement Learning,” Sep. 2020.
  73. M. Schwarzer, A. Anand, R. Goel, R. D. Hjelm, A. Courville, and P. Bachman, “Data-efficient reinforcement learning with self-predictive representations,” arXiv preprint arXiv:2007.05929, 2020.
  74. F. Paischer, T. Adler, V. Patil, A. Bitto-Nemling, M. Holzleitner, S. Lehner, H. Eghbal-zadeh, and S. Hochreiter, “History Compression via Language Models in Reinforcement Learning,” Feb. 2023.
  75. F. Paischer, T. Adler, M. Hofmarcher, and S. Hochreiter, “Semantic HELM: A Human-Readable Memory for Reinforcement Learning.”
  76. A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell et al., “Learning transferable visual models from natural language supervision,” 2021.
  77. A. van den Oord, Y. Li, and O. Vinyals, “Representation learning with contrastive predictive coding,” 2019.
  78. W. Choi, W. K. Kim, S. Kim, and H. Woo, “Efficient Policy Adaptation with Contrastive Prompt Ensemble for Embodied Agents,” in Thirty-Seventh Conference on Neural Information Processing Systems, Nov. 2023.
  79. R. P. K. Poudel, H. Pandya, S. Liwicki, and R. Cipolla, “ReCoRe: Regularized Contrastive Representation Learning of World Model,” Dec. 2023.
  80. “STARLING: Self-supervised Training of Text-based Reinforcement Learning Agent with Large Language Models,” in The Twelfth International Conference on Learning Representations, Oct. 2023.
  81. R. Patel, E. Pavlick, and S. Tellex, “Grounding language to non-markovian tasks with no supervision of task specifications.” in Robotics: Science and Systems, vol. 2020, 2020.
  82. T. R. Sumers, M. K. Ho, R. D. Hawkins, K. Narasimhan, and T. L. Griffiths, “Learning rewards from linguistic feedback,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 7, 2021, pp. 6002–6010.
  83. J. Liang, W. Huang, F. Xia, P. Xu, K. Hausman, B. Ichter, P. Florence, and A. Zeng, “Code as Policies: Language Model Programs for Embodied Control,” May 2023.
  84. C. H. Song, J. Wu, C. Washington, B. M. Sadler, W.-L. Chao, and Y. Su, “Llm-planner: Few-shot grounded planning for embodied agents with large language models,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 2998–3009.
  85. “Informing Reinforcement Learning Agents by Grounding Natural Language to Markov Decision Processes,” Oct. 2023.
  86. J. Eschmann, “Reward function design in reinforcement learning,” Reinforcement Learning Algorithms: Analysis and Applications, pp. 25–33, 2021.
  87. J. Andreas, D. Klein, and S. Levine, “Modular multitask reinforcement learning with policy sketches,” in International conference on machine learning.   PMLR, 2017, pp. 166–175.
  88. S. Mirchandani, S. Karamcheti, and D. Sadigh, “Ella: Exploration through learned language abstraction,” Advances in Neural Information Processing Systems, vol. 34, pp. 29 529–29 540, 2021.
  89. M. Kwon, S. M. Xie, K. Bullard, and D. Sadigh, “Reward Design with Language Models,” in The Eleventh International Conference on Learning Representations, Sep. 2022.
  90. Y. Wu, Y. Fan, P. P. Liang, A. Azaria, Y. Li, and T. M. Mitchell, “Read and Reap the Rewards: Learning to Play Atari with the Help of Instruction Manuals,” Oct. 2023.
  91. K. Chu, X. Zhao, C. Weber, M. Li, and S. Wermter, “Accelerating Reinforcement Learning of Robotic Manipulations via Feedback from Large Language Models,” Nov. 2023.
  92. C. Kim, Y. Seo, H. Liu, L. Lee, J. Shin, H. Lee, and K. Lee, “Guide Your Agent with Adaptive Multimodal Rewards,” Oct. 2023.
  93. “Language Reward Modulation for Pretraining Reinforcement Learning,” in The Twelfth International Conference on Learning Representations, Oct. 2023.
  94. W. Yu, N. Gileadi, C. Fu, S. Kirmani, K.-H. Lee, M. G. Arenas, H.-T. L. Chiang, T. Erez et al., “Language to Rewards for Robotic Skill Synthesis.”
  95. A. Madaan, N. Tandon, P. Gupta, S. Hallinan, L. Gao, S. Wiegreffe, U. Alon, N. Dziri et al., “Self-refine: Iterative refinement with self-feedback,” 2023.
  96. J. Song, Z. Zhou, J. Liu, C. Fang, Z. Shu, and L. Ma, “Self-Refined Large Language Model as Automated Reward Function Designer for Deep Reinforcement Learning in Robotics,” Oct. 2023.
  97. Y. J. Ma, W. Liang, G. Wang, D.-A. Huang, O. Bastani, D. Jayaraman, Y. Zhu, L. Fan et al., “Eureka: Human-Level Reward Design via Coding Large Language Models,” Oct. 2023.
  98. “Text2Reward: Reward Shaping with Language Models for Reinforcement Learning,” in The Twelfth International Conference on Learning Representations, Oct. 2023.
  99. Y. Inoue and H. Ohashi, “Prompter: Utilizing large language model prompting for a data efficient embodied instruction following,” arXiv preprint arXiv:2211.03267, 2022.
  100. A. Majumdar, A. Shrivastava, S. Lee, P. Anderson, D. Parikh, and D. Batra, “Improving vision-and-language navigation with image-text pairs from the web,” in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part VI 16.   Springer, 2020, pp. 259–274.
  101. M. Ahn, A. Brohan, N. Brown, Y. Chebotar, O. Cortes, B. David, C. Finn, C. Fu et al., “Do As I Can, Not As I Say: Grounding Language in Robotic Affordances,” Aug. 2022.
  102. M. Janner, Q. Li, and S. Levine, “Offline reinforcement learning as one big sequence modeling problem,” Advances in neural information processing systems, vol. 34, pp. 1273–1286, 2021.
  103. W. Li, H. Luo, Z. Lin, C. Zhang, Z. Lu, and D. Ye, “A survey on transformers in reinforcement learning,” arXiv preprint arXiv:2301.03044, 2023.
  104. R. Shi, Y. Liu, Y. Ze, S. S. Du, and H. Xu, “Unleashing the Power of Pre-trained Language Models for Offline Reinforcement Learning,” Nov. 2023.
  105. M. Reid, Y. Yamada, and S. S. Gu, “Can Wikipedia Help Offline Reinforcement Learning?” Jul. 2022.
  106. S. Li, X. Puig, C. Paxton, Y. Du, C. Wang, L. Fan, T. Chen, D.-A. Huang et al., “Pre-Trained Language Models for Interactive Decision-Making,” Oct. 2022.
  107. L. Mezghani, P. Bojanowski, K. Alahari, and S. Sukhbaatar, “Think Before You Act: Unified Policy for Interleaving Language Reasoning with Actions,” Apr. 2023.
  108. J. Grigsby, L. Fan, and Y. Zhu, “AMAGO: Scalable In-Context Reinforcement Learning for Adaptive Agents,” Dec. 2023.
  109. A. Brohan, N. Brown, J. Carbajal, Y. Chebotar, X. Chen, K. Choromanski, T. Ding, D. Driess et al., “RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control.”
  110. S. Yao, R. Rao, M. Hausknecht, and K. Narasimhan, “Keep CALM and Explore: Language Models for Action Generation in Text-based Games,” Oct. 2020.
  111. H. Hu and D. Sadigh, “Language Instructed Reinforcement Learning for Human-AI Coordination,” Jun. 2023.
  112. Z. Zhou, B. Hu, P. Zhang, C. Zhao, and B. Liu, “Large Language Model is a Good Policy Teacher for Training Reinforcement Learning Agents,” Nov. 2023.
  113. E. J. Hu, Y. Shen, P. Wallis, Z. Allen-Zhu, Y. Li, S. Wang, L. Wang, and W. Chen, “Lora: Low-rank adaptation of large language models,” 2021.
  114. D. Hafner, T. Lillicrap, J. Ba, and M. Norouzi, “Dream to Control: Learning Behaviors by Latent Imagination,” Mar. 2020.
  115. D. Hafner, T. Lillicrap, M. Norouzi, and J. Ba, “Mastering Atari with Discrete World Models,” Feb. 2022.
  116. D. Hafner, T. Lillicrap, J. Ba, and M. Norouzi, “Dream to control: Learning behaviors by latent imagination,” 2020.
  117. Y. Matsuo, Y. LeCun, M. Sahani, D. Precup, D. Silver, M. Sugiyama, E. Uchibe, and J. Morimoto, “Deep learning, reinforcement learning, and world models,” Neural Networks, vol. 152, pp. 267–275, 2022.
  118. L. Chen, K. Lu, A. Rajeswaran, K. Lee, A. Grover, M. Laskin, P. Abbeel, A. Srinivas et al., “Decision transformer: Reinforcement learning via sequence modeling,” Advances in neural information processing systems, vol. 34, pp. 15 084–15 097, 2021.
  119. V. Micheli, E. Alonso, and F. Fleuret, “Transformers are Sample-Efficient World Models,” in The Eleventh International Conference on Learning Representations, Sep. 2022.
  120. J. Robine, M. Höftmann, T. Uelwer, and S. Harmeling, “Transformer-based World Models Are Happy With 100k Interactions,” Mar. 2023.
  121. C. Chen, Y.-F. Wu, J. Yoon, and S. Ahn, “TransDreamer: Reinforcement Learning with Transformer World Models,” Feb. 2022.
  122. Y. Seo, K. Lee, S. James, and P. Abbeel, “Reinforcement Learning with Action-Free Pre-Training from Videos,” Jun. 2022.
  123. Y. Seo, D. Hafner, H. Liu, F. Liu, S. James, K. Lee, and P. Abbeel, “Masked World Models for Visual Control,” May 2023.
  124. R. P. K. Poudel, H. Pandya, C. Zhang, and R. Cipolla, “LanGWM: Language Grounded World Model,” Nov. 2023.
  125. S. Milani, N. Topin, M. Veloso, and F. Fang, “A survey of explainable reinforcement learning,” 2022.
  126. D. Das, S. Chernova, and B. Kim, “State2explanation: Concept-based explanations to benefit agent learning and user understanding,” in Thirty-seventh Conference on Neural Information Processing Systems, 2023. [Online]. Available: https://openreview.net/forum?id=xGz0wAIJrS
  127. “Understanding Language in the World by Predicting the Future,” in The Twelfth International Conference on Learning Representations, Oct. 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (10)
  1. Yuji Cao (8 papers)
  2. Huan Zhao (109 papers)
  3. Yuheng Cheng (10 papers)
  4. Ting Shu (7 papers)
  5. Guolong Liu (7 papers)
  6. Gaoqi Liang (7 papers)
  7. Junhua Zhao (22 papers)
  8. Yun Li (154 papers)
  9. Yue Chen (236 papers)
  10. Jinyue Yan (12 papers)
Citations (20)
X Twitter Logo Streamline Icon: https://streamlinehq.com