Large Language Model as a Policy Teacher for Training Reinforcement Learning Agents (2311.13373v6)
Abstract: Recent studies have uncovered the potential of LLMs in addressing complex sequential decision-making tasks through the provision of high-level instructions. However, LLM-based agents lack specialization in tackling specific target problems, particularly in real-time dynamic environments. Additionally, deploying an LLM-based agent in practical scenarios can be both costly and time-consuming. On the other hand, reinforcement learning (RL) approaches train agents that specialize in the target task but often suffer from low sampling efficiency and high exploration costs. In this paper, we introduce a novel framework that addresses these challenges by training a smaller, specialized student RL agent using instructions from an LLM-based teacher agent. By incorporating the guidance from the teacher agent, the student agent can distill the prior knowledge of the LLM into its own model. Consequently, the student agent can be trained with significantly less data. Moreover, through further training with environment feedback, the student agent surpasses the capabilities of its teacher for completing the target task. We conducted experiments on challenging MiniGrid and Habitat environments, specifically designed for embodied AI research, to evaluate the effectiveness of our framework. The results clearly demonstrate that our approach achieves superior performance compared to strong baseline methods. Our code is available at https://github.com/ZJLAB-AMMI/LLM4Teach.
- Reincarnating reinforcement learning: Reusing prior computation to accelerate progress. Advances in Neural Information Processing Systems, 35:28955–28971, 2022.
- Do as I can, not as I say: Grounding language in robotic affordances. arXiv preprint arXiv:2204.01691, 2022.
- Tell me where to go: A composable framework for context-aware embodied robot navigation. arXiv preprint arXiv:2306.09523, 2023.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
- Grounding large language models in interactive environments with online reinforcement learning. arXiv preprint arXiv:2302.02662, 2023.
- Minigrid & miniworld: Modular & customizable reinforcement learning environments for goal-oriented tasks. CoRR, abs/2306.13831, 2023.
- Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality, March 2023.
- Augmenting autotelic agents with large language models. arXiv preprint arXiv:2305.12487, 2023.
- Simultaneously learning and advising in multiagent reinforcement learning. In Proceedings of the 16th conference on autonomous agents and multiagent systems, pages 1100–1108, 2017.
- Agents teaching agents: a survey on inter-agent transfer learning. Autonomous Agents and Multi-Agent Systems, 34:1–17, 2020.
- Palm-e: An embodied multimodal language model. arXiv preprint arXiv:2303.03378, 2023.
- Glm: General language model pretraining with autoregressive blank infilling. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 320–335, 2022.
- Guiding pretraining in reinforcement learning with large language models. arXiv preprint arXiv:2302.06692, 2023.
- Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2015.
- Enabling efficient interaction between an agent and an llm: A reinforcement learning approach. arXiv preprint arXiv:2306.03604, 2023.
- Language models as zero-shot planners: Extracting actionable knowledge for embodied agents. In International Conference on Machine Learning, pages 9118–9147. PMLR, 2022.
- Motif: Intrinsic motivation from artificial intelligence feedback. arXiv preprint arXiv:2310.00166, 2023.
- Interactively shaping agents via human reinforcement: The tamer framework. In Proceedings of the fifth international conference on Knowledge capture, pages 9–16, 2009.
- Reward design with language models. arXiv preprint arXiv:2303.00001, 2023.
- Gpt-driver: Learning to drive with gpt. arXiv preprint arXiv:2310.01415, 2023.
- Hierarchical kickstarting for skill transfer in reinforcement learning. arXiv preprint arXiv:2207.11584, 2022.
- Selective perception: Optimizing state descriptions with reinforcement learning for language model actors. arXiv preprint arXiv:2307.11922, 2023.
- Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
- Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
- Model-free reinforcement learning from expert demonstrations: a survey. Artificial Intelligence Review, 55(4):3213–3241, 2022.
- Stefan Schaal. Learning from demonstration. Advances in neural information processing systems, 9, 1996.
- Kickstarting deep reinforcement learning. arXiv preprint arXiv:1803.03835, 2018.
- Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
- Languagempc: Large language models as decision makers for autonomous driving. arXiv preprint arXiv:2310.03026, 2023.
- Reflexion: Language agents with verbal reinforcement learning. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
- Llm-planner: Few-shot grounded planning for embodied agents with large language models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 2998–3009, 2023.
- Between mdps and semi-mdps: A framework for temporal abstraction in reinforcement learning. Artificial intelligence, 112(1-2):181–211, 1999.
- Habitat 2.0: Training home assistants to rearrange their habitat. Advances in Neural Information Processing Systems, 34:251–266, 2021.
- Jump-start reinforcement learning. In International Conference on Machine Learning, pages 34556–34583. PMLR, 2023.
- Self-consistency improves chain of thought reasoning in language models. arXiv preprint arXiv:2203.11171, 2022.
- Voyager: An open-ended embodied agent with large language models. arXiv preprint arXiv:2305.16291, 2023.
- A survey on large language model based autonomous agents. arXiv preprint arXiv:2308.11432, 2023.
- Chain of thought prompting elicits reasoning in large language models. arXiv preprint arXiv:2201.11903, 2022.
- Dd-ppo: Learning near-perfect pointgoal navigators from 2.5 billion frames. arXiv preprint arXiv:1911.00357, 2019.
- The rise and potential of large language model based agents: A survey. arXiv preprint arXiv:2309.07864, 2023.
- Foundation models for decision making: Problems, methods, and opportunities. arXiv preprint arXiv:2303.04129, 2023.
- React: Synergizing reasoning and acting in language models. In NeurIPS 2022 Foundation Models for Decision Making Workshop, 2022.
- Language to rewards for robotic skill synthesis. arXiv preprint arXiv:2306.08647, 2023.
- Robot task planning based on large language model representing knowledge with directed graph structures. arXiv preprint arXiv:2306.05171, 2023.
- Zihao Zhou (32 papers)
- Bin Hu (217 papers)
- Chenyang Zhao (39 papers)
- Pu Zhang (79 papers)
- Bin Liu (441 papers)