AdaRefiner: Refining Decisions of Language Models with Adaptive Feedback (2309.17176v3)
Abstract: LLMs have demonstrated significant success across various domains. However, their application in complex decision-making tasks frequently necessitates intricate prompt engineering or fine-tuning, leading to challenges in unseen downstream tasks and heavy demands on computational resources. Meanwhile, Reinforcement Learning (RL) has been recognized as effective in decision-making problems but struggles in environments with sparse rewards, such as open-world games. To overcome these challenges, we introduce AdaRefiner, a novel framework designed to enhance the synergy between LLMs and RL feedback. The key component of AdaRefiner is a lightweight Adapter LLM (LM), which automatically refines task comprehension based on feedback from RL agents. This method mitigates the need for intricate prompt engineering and intensive LLM fine-tuning while maintaining the LLMs' generalization abilities and enhancing their decision-making capabilities in downstream tasks. Empirical evaluations of AdaRefiner on 22 diverse tasks within the open-world game Crafter have demonstrated its superior effectiveness, especially in guiding agents towards higher-level and common-sense skills. Our work makes contributions to the automatic self-refinement of LLMs with RL feedback, offering a more adaptable and efficient solution for complex decision-making problems.
- Do as I can, not as I say: Grounding language in robotic affordances. In CoRL.
- On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258.
- Exploration by random network distillation. In Seventh International Conference on Learning Representations, pages 1–17.
- Actrce: Augmenting experience via teacher’s advice for multi-goal reinforcement learning. arXiv preprint arXiv:1902.04546.
- LMPriors: Pre-trained language models as task-specific priors. arXiv preprint arXiv:2210.12530.
- Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311.
- Qlora: Efficient finetuning of quantized llms. arXiv preprint arXiv:2305.14314.
- Guiding pretraining in reinforcement learning with large language models. arXiv preprint arXiv:2302.06692.
- Jonas Eschmann. 2021. Reward function design in reinforcement learning. Reinforcement Learning Algorithms: Analysis and Applications, pages 25–33.
- Llama rider: Spurring large language models to explore the open world. arXiv preprint arXiv:2310.08922.
- Danijar Hafner. 2021. Benchmarking the spectrum of agent capabilities. arXiv preprint arXiv:2109.06780.
- Mastering diverse domains through world models. arXiv preprint arXiv:2301.04104.
- Rainbow: Combining improvements in deep reinforcement learning. In Thirty-second AAAI conference on artificial intelligence.
- Human instruction-following with deep reinforcement learning via transfer-learning from text. arXiv preprint arXiv:2005.09382.
- Language models as zero-shot planners: Extracting actionable knowledge for embodied agents. arXiv preprint arXiv:2201.07207.
- Inner monologue: Embodied reasoning through planning with language models. arXiv preprint arXiv:2207.05608.
- Mistral 7b. arXiv preprint arXiv:2310.06825.
- Exploration in deep reinforcement learning: A survey. Information Fusion.
- Flexkbqa: A flexible llm-powered framework for few-shot knowledge base question answering. arXiv preprint arXiv:2308.12060.
- Corey Lynch and Pierre Sermanet. 2020. Language conditioned imitation learning over unstructured data. arXiv preprint arXiv:2005.07648.
- Human-level control through deep reinforcement learning. nature, 518(7540):529–533.
- Do embodied agents dream of pixelated sheep?: Embodied decision making using language guided world modelling. arXiv preprint arXiv:2301.12050.
- OpenAI. 2023. Gpt-4 technical report. arXiv preprint arXiv:2303.08774.
- Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744.
- Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR.
- Nils Reimers and Iryna Gurevych. 2019. Sentence-bert: Sentence embeddings using siamese bert-networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics.
- Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347.
- Planning to explore via self-supervised world models. In International Conference on Machine Learning, pages 8583–8592. PMLR.
- Skill induction and planning with latent language. arXiv preprint arXiv:2110.01517.
- Reflexion: an autonomous agent with dynamic memory and self-reflection. arXiv preprint arXiv:2303.11366.
- Mastering the game of go without human knowledge. nature, 550(7676):354–359.
- Adaplanner: Adaptive planning from feedback with language models. arXiv preprint arXiv:2305.16653.
- Lamda: Language models for dialog applications. arXiv preprint arXiv:2201.08239.
- Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971.
- Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288.
- Can large language models play text games well? current state-of-the-art and open questions. arXiv preprint arXiv:2304.02868.
- Voyager: An open-ended embodied agent with large language models. arXiv preprint arXiv:2305.16291.
- Grounding language to entities and dynamics for generalization in reinforcement learning. arXiv preprint arXiv:2101.07393.
- Preserving in-context learning ability in large language model fine-tuning. arXiv preprint arXiv:2211.00635.
- Describe, explain, plan and select: Interactive planning with large language models enables open-world multi-task agents. arXiv preprint arXiv:2302.01560.
- Emergent abilities of large language models. arXiv preprint arXiv:2206.07682.
- Bloomberggpt: A large language model for finance. arXiv preprint arXiv:2303.17564.
- Spring: Gpt-4 out-performs rl algorithms by studying papers and reasoning. arXiv preprint arXiv:2305.15486.
- ReAct: Synergizing reasoning and acting in language models. In International Conference on Learning Representations (ICLR).
- Plan4mc: Skill reinforcement learning and planning for open-world minecraft tasks. arXiv preprint arXiv:2303.16563.
- Glm-130b: An open bilingual pre-trained model. arXiv preprint arXiv:2210.02414.
- Llama-adapter: Efficient fine-tuning of language models with zero-init attention. arXiv preprint arXiv:2303.16199.
- Silg: The multi-environment symbolic interactive language grounding benchmark. arXiv preprint arXiv:2110.10661.
- Wanpeng Zhang (12 papers)
- Zongqing Lu (88 papers)