Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

AdaRefiner: Refining Decisions of Language Models with Adaptive Feedback (2309.17176v3)

Published 29 Sep 2023 in cs.AI and cs.CL

Abstract: LLMs have demonstrated significant success across various domains. However, their application in complex decision-making tasks frequently necessitates intricate prompt engineering or fine-tuning, leading to challenges in unseen downstream tasks and heavy demands on computational resources. Meanwhile, Reinforcement Learning (RL) has been recognized as effective in decision-making problems but struggles in environments with sparse rewards, such as open-world games. To overcome these challenges, we introduce AdaRefiner, a novel framework designed to enhance the synergy between LLMs and RL feedback. The key component of AdaRefiner is a lightweight Adapter LLM (LM), which automatically refines task comprehension based on feedback from RL agents. This method mitigates the need for intricate prompt engineering and intensive LLM fine-tuning while maintaining the LLMs' generalization abilities and enhancing their decision-making capabilities in downstream tasks. Empirical evaluations of AdaRefiner on 22 diverse tasks within the open-world game Crafter have demonstrated its superior effectiveness, especially in guiding agents towards higher-level and common-sense skills. Our work makes contributions to the automatic self-refinement of LLMs with RL feedback, offering a more adaptable and efficient solution for complex decision-making problems.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (48)
  1. Do as I can, not as I say: Grounding language in robotic affordances. In CoRL.
  2. On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258.
  3. Exploration by random network distillation. In Seventh International Conference on Learning Representations, pages 1–17.
  4. Actrce: Augmenting experience via teacher’s advice for multi-goal reinforcement learning. arXiv preprint arXiv:1902.04546.
  5. LMPriors: Pre-trained language models as task-specific priors. arXiv preprint arXiv:2210.12530.
  6. Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311.
  7. Qlora: Efficient finetuning of quantized llms. arXiv preprint arXiv:2305.14314.
  8. Guiding pretraining in reinforcement learning with large language models. arXiv preprint arXiv:2302.06692.
  9. Jonas Eschmann. 2021. Reward function design in reinforcement learning. Reinforcement Learning Algorithms: Analysis and Applications, pages 25–33.
  10. Llama rider: Spurring large language models to explore the open world. arXiv preprint arXiv:2310.08922.
  11. Danijar Hafner. 2021. Benchmarking the spectrum of agent capabilities. arXiv preprint arXiv:2109.06780.
  12. Mastering diverse domains through world models. arXiv preprint arXiv:2301.04104.
  13. Rainbow: Combining improvements in deep reinforcement learning. In Thirty-second AAAI conference on artificial intelligence.
  14. Human instruction-following with deep reinforcement learning via transfer-learning from text. arXiv preprint arXiv:2005.09382.
  15. Language models as zero-shot planners: Extracting actionable knowledge for embodied agents. arXiv preprint arXiv:2201.07207.
  16. Inner monologue: Embodied reasoning through planning with language models. arXiv preprint arXiv:2207.05608.
  17. Mistral 7b. arXiv preprint arXiv:2310.06825.
  18. Exploration in deep reinforcement learning: A survey. Information Fusion.
  19. Flexkbqa: A flexible llm-powered framework for few-shot knowledge base question answering. arXiv preprint arXiv:2308.12060.
  20. Corey Lynch and Pierre Sermanet. 2020. Language conditioned imitation learning over unstructured data. arXiv preprint arXiv:2005.07648.
  21. Human-level control through deep reinforcement learning. nature, 518(7540):529–533.
  22. Do embodied agents dream of pixelated sheep?: Embodied decision making using language guided world modelling. arXiv preprint arXiv:2301.12050.
  23. OpenAI. 2023. Gpt-4 technical report. arXiv preprint arXiv:2303.08774.
  24. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744.
  25. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR.
  26. Nils Reimers and Iryna Gurevych. 2019. Sentence-bert: Sentence embeddings using siamese bert-networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics.
  27. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347.
  28. Planning to explore via self-supervised world models. In International Conference on Machine Learning, pages 8583–8592. PMLR.
  29. Skill induction and planning with latent language. arXiv preprint arXiv:2110.01517.
  30. Reflexion: an autonomous agent with dynamic memory and self-reflection. arXiv preprint arXiv:2303.11366.
  31. Mastering the game of go without human knowledge. nature, 550(7676):354–359.
  32. Adaplanner: Adaptive planning from feedback with language models. arXiv preprint arXiv:2305.16653.
  33. Lamda: Language models for dialog applications. arXiv preprint arXiv:2201.08239.
  34. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971.
  35. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288.
  36. Can large language models play text games well? current state-of-the-art and open questions. arXiv preprint arXiv:2304.02868.
  37. Voyager: An open-ended embodied agent with large language models. arXiv preprint arXiv:2305.16291.
  38. Grounding language to entities and dynamics for generalization in reinforcement learning. arXiv preprint arXiv:2101.07393.
  39. Preserving in-context learning ability in large language model fine-tuning. arXiv preprint arXiv:2211.00635.
  40. Describe, explain, plan and select: Interactive planning with large language models enables open-world multi-task agents. arXiv preprint arXiv:2302.01560.
  41. Emergent abilities of large language models. arXiv preprint arXiv:2206.07682.
  42. Bloomberggpt: A large language model for finance. arXiv preprint arXiv:2303.17564.
  43. Spring: Gpt-4 out-performs rl algorithms by studying papers and reasoning. arXiv preprint arXiv:2305.15486.
  44. ReAct: Synergizing reasoning and acting in language models. In International Conference on Learning Representations (ICLR).
  45. Plan4mc: Skill reinforcement learning and planning for open-world minecraft tasks. arXiv preprint arXiv:2303.16563.
  46. Glm-130b: An open bilingual pre-trained model. arXiv preprint arXiv:2210.02414.
  47. Llama-adapter: Efficient fine-tuning of language models with zero-init attention. arXiv preprint arXiv:2303.16199.
  48. Silg: The multi-environment symbolic interactive language grounding benchmark. arXiv preprint arXiv:2110.10661.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Wanpeng Zhang (12 papers)
  2. Zongqing Lu (88 papers)
Citations (5)
X Twitter Logo Streamline Icon: https://streamlinehq.com