True Knowledge Comes from Practice: Aligning LLMs with Embodied Environments via Reinforcement Learning (2401.14151v2)
Abstract: Despite the impressive performance across numerous tasks, LLMs often fail in solving simple decision-making tasks due to the misalignment of the knowledge in LLMs with environments. On the contrary, reinforcement learning (RL) agents learn policies from scratch, which makes them always align with environments but difficult to incorporate prior knowledge for efficient explorations. To narrow the gap, we propose TWOSOME, a novel general online framework that deploys LLMs as decision-making agents to efficiently interact and align with embodied environments via RL without requiring any prepared datasets or prior knowledge of the environments. Firstly, we query the joint probabilities of each valid action with LLMs to form behavior policies. Then, to enhance the stability and robustness of the policies, we propose two normalization methods and summarize four prompt design principles. Finally, we design a novel parameter-efficient training architecture where the actor and critic share one frozen LLM equipped with low-rank adapters (LoRA) updated by PPO. We conduct extensive experiments to evaluate TWOSOME. i) TWOSOME exhibits significantly better sample efficiency and performance compared to the conventional RL method, PPO, and prompt tuning method, SayCan, in both classical decision-making environment, Overcooked, and simulated household environment, VirtualHome. ii) Benefiting from LLMs' open-vocabulary feature, TWOSOME shows superior generalization ability to unseen tasks. iii) Under our framework, there is no significant loss of the LLMs' original ability during online PPO finetuning.
- Imitating interactive intelligence. arXiv preprint arXiv:2012.05672, 2020.
- Do as I can, not as I say: Grounding language in robotic affordances. In Conference on Robot Learning, pp. 287–318, 2023.
- Language models are few-shot learners. In Proceedings of the 34th International Conference on Neural Information Processing Systems, pp. 1877–1901, 2020.
- Grounding large language models in interactive environments with online reinforcement learning. arXiv preprint arXiv:2302.02662, 2023.
- Babyai: A platform to study the sample efficiency of grounded language learning. arXiv preprint arXiv:1810.08272, 2018.
- Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311, 2022.
- Collaborating with language models for embodied reasoning. In Second Workshop on Language and Reinforcement Learning, 2022.
- Parameter-efficient fine-tuning of large-scale pre-trained language models. Nature Machine Intelligence, 5(3):220–235, 2023.
- PaLM-E: An embodied multimodal language model. arXiv preprint arXiv:2303.03378, 2023.
- MineDojo: Building open-ended embodied agents with internet-scale knowledge. arXiv preprint arXiv:2206.08853, 2022.
- A theoretical analysis of the repetition problem in text generation. In Proceedings of the AAAI Conference on Artificial Intelligence, pp. 12848–12856, 2021.
- A framework for few-shot language model evaluation, September 2021. URL https://doi.org/10.5281/zenodo.5371628.
- Vln-bert: A recurrent vision-and-language bert for navigation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1643–1653, 2021.
- Parameter-efficient transfer learning for NLP. In International Conference on Machine Learning, pp. 2790–2799, 2019.
- LoRA: Low-rank adaptation of large language models. In International Conference on Learning Representations, 2022.
- Language instructed reinforcement learning for human-AI coordination. arXiv preprint arXiv:2304.07297, 2023.
- CleanRL: High-quality single-file implementations of deep reinforcement learning algorithms. The Journal of Machine Learning Research, 23(1):12585–12602, 2022a.
- Language models as zero-shot planners: Extracting actionable knowledge for embodied agents. In International Conference on Machine Learning, pp. 9118–9147, 2022b.
- Inner monologue: Embodied reasoning through planning with language models. In Conference on Robot Learning, 2022c.
- Vima: General robot manipulation with multimodal prompts. arXiv preprint arXiv:2210.03094, 2022.
- LILA: Language-informed latent actions. In Conference on Robot Learning, pp. 1379–1390, 2021.
- Simple but effective: CLIP embeddings for embodied AI. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 14809–14818, 2022.
- Language models can solve computer tasks. arXiv preprint arXiv:2303.17491, 2023.
- Autonomous skill acquisition on a mobile manipulator. In Proceedings of the AAAI Conference on Artificial Intelligence, pp. 1468–1473, 2011.
- Offline Q-learning on diverse multi-task data both scales and generalizes. In The Eleventh International Conference on Learning Representations, 2023.
- The power of scale for parameter-efficient prompt tuning. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp. 3045–3059, 2021.
- Pre-trained language models for interactive decision-making. In Advances in Neural Information Processing Systems, 2022.
- Prefix-tuning: Optimizing continuous prompts for generation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 4582–4597, 2021.
- Code as policies: Language model programs for embodied control. ArXiv, abs/2209.07753, 2022.
- Chameleon: Plug-and-play compositional reasoning with large language models. arXiv preprint arXiv:2304.09842, 2023.
- Improving vision-and-language navigation with image-text pairs from the web. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part VI 16, pp. 259–274. Springer, 2020.
- OpenAI. GPT-4 technical report. arXiv preprint arXiv:2303.08774, 2023.
- Training language models to follow instructions with human feedback. In Advances in Neural Information Processing Systems, 2022.
- The unsurprising effectiveness of pre-trained vision models for control. In International Conference on Machine Learning, 2022.
- Virtualhome: Simulating household activities via programs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8494–8502, 2018.
- Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 21(1):5485–5551, 2020.
- Leveraging language for accelerated learning of tool manipulation. In Conference on Robot Learning, pp. 1531–1541, 2023.
- Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
- HuggingGPT: Solving AI tasks with ChatGPT and its friends in Huggingface. arXiv preprint arXiv:2303.17580, 2023.
- Reflexion: An autonomous agent with dynamic memory and self-reflection. arXiv preprint arXiv:2303.11366, 2023.
- Cliport: What and where pathways for robotic manipulation. In Conference on Robot Learning, pp. 894–906. PMLR, 2022.
- Progprompt: Generating situated robot task plans using large language models. arXiv preprint arXiv:2209.11302, 2022.
- Reinforcement Learning: An Introduction. MIT press, 2018.
- Approximate planning in pomdps with macro-actions. Advances in neural information processing systems, 16, 2003.
- LLaMA: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023.
- Voyager: An open-ended embodied agent with large language models. arXiv preprint arXiv:2305.16291, 2023a.
- Describe, explain, plan and select: Interactive planning with large language models enables open-world multi-task agents. arXiv preprint arXiv:2302.01560, 2023b.
- Too many cooks: Bayesian inference for coordinating multi-agent collaboration. Topics in Cognitive Science, 13(2):414–432, 2021.
- Language models meet world models: Embodied experiences enhance language models. arXiv preprint arXiv:2305.10626, 2023.
- Asynchronous actor-critic for multi-agent reinforcement learning. In Advances in Neural Information Processing Systems, 2022.
- Learning to break the loop: Analyzing and mitigating repetitions for neural text generation. arXiv preprint arXiv:2206.02369, 2022.
- ReAct: Synergizing reasoning and acting in language models. In The Eleventh International Conference on Learning Representations, 2023.
- Socratic models: Composing zero-shot multimodal reasoning with language. ArXiv, abs/2204.00598, 2022.
- Synapse: Trajectory-as-exemplar prompting with memory for computer control. 2023.
- Weihao Tan (7 papers)
- Wentao Zhang (261 papers)
- Shanqi Liu (15 papers)
- Longtao Zheng (10 papers)
- Xinrun Wang (39 papers)
- Bo An (127 papers)