Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

AutoManual: Constructing Instruction Manuals by LLM Agents via Interactive Environmental Learning (2405.16247v4)

Published 25 May 2024 in cs.AI and cs.CL

Abstract: LLMs (LLM) based agents have shown promise in autonomously completing tasks across various domains, e.g., robotics, games, and web navigation. However, these agents typically require elaborate design and expert prompts to solve tasks in specific domains, which limits their adaptability. We introduce AutoManual, a framework enabling LLM agents to autonomously build their understanding through interaction and adapt to new environments. AutoManual categorizes environmental knowledge into diverse rules and optimizes them in an online fashion by two agents: 1) The Planner codes actionable plans based on current rules for interacting with the environment. 2) The Builder updates the rules through a well-structured rule system that facilitates online rule management and essential detail retention. To mitigate hallucinations in managing rules, we introduce a case-conditioned prompting strategy for the Builder. Finally, the Formulator agent compiles these rules into a comprehensive manual. The self-generated manual can not only improve the adaptability but also guide the planning of smaller LLMs while being human-readable. Given only one simple demonstration, AutoManual significantly improves task success rates, achieving 97.4\% with GPT-4-turbo and 86.2\% with GPT-3.5-turbo on ALFWorld benchmark tasks. The code is available at https://github.com/minghchen/automanual.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (35)
  1. Do as i can, not as i say: Grounding language in robotic affordances. In Conference on Robot Learning, 2022.
  2. Autoguide: Automated generation and selection of state-aware guidelines for large language model agents. ArXiv, abs/2403.08978, 2024.
  3. Metagpt: Meta programming for multi-agent collaborative framework. ArXiv, abs/2308.00352, 2023.
  4. Rap: Retrieval-augmented planning with contextual memory for multimodal llm agents. ArXiv, abs/2402.03610, 2024.
  5. Language models can solve computer tasks. In Neural Information Processing Systems, 2023.
  6. Offline reinforcement learning: Tutorial, review, and perspectives on open problems. ArXiv, abs/2005.01643, 2020.
  7. Code as policies: Language model programs for embodied control. 2023 IEEE International Conference on Robotics and Automation (ICRA), pages 9493–9500, 2022.
  8. Reinforcement learning on web interfaces using workflow-guided exploration. In International Conference on Learning Representations (ICLR), 2018.
  9. Clin: A continually learning language agent for rapid task adaptation and generalization. ArXiv, abs/2310.10134, 2023.
  10. Webgpt: Browser-assisted question-answering with human feedback. ArXiv, abs/2112.09332, 2021.
  11. OpenAI. Gpt-4 technical report. ArXiv, abs/2303.08774, 2023.
  12. Training language models to follow instructions with human feedback. In Neural Information Processing Systems, 2022.
  13. Memgpt: Towards llms as operating systems. ArXiv, abs/2310.08560, 2023.
  14. Generative agents: Interactive simulacra of human behavior. Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology, 2023.
  15. Communicative agents for software development. ArXiv, abs/2307.07924, 2023.
  16. Reflexion: language agents with verbal reinforcement learning. In Neural Information Processing Systems, 2023.
  17. Alfworld: Aligning text and embodied environments for interactive learning. In Proceedings of the International Conference on Learning Representations (ICLR), 2021.
  18. Progprompt: Generating situated robot task plans using large language models. 2023 IEEE International Conference on Robotics and Automation (ICRA), pages 11523–11530, 2022.
  19. Llm-planner: Few-shot grounded planning for embodied agents with large language models. 2023 IEEE/CVF International Conference on Computer Vision (ICCV), pages 2986–2997, 2022.
  20. Adaplanner: Adaptive planning from feedback with language models. In Neural Information Processing Systems, 2023.
  21. Chatgpt for robotics: Design principles and model abilities. IEEE Access, 12:55682–55696, 2023.
  22. Chatgpt empowered long-step robot control in various environments: A case application. IEEE Access, 11:95060–95078, 2023.
  23. Voyager: An open-ended embodied agent with large language models. ArXiv, abs/2305.16291, 2023.
  24. Executable code actions elicit better llm agents. ArXiv, abs/2402.01030, 2024.
  25. Describe, explain, plan and select: Interactive planning with large language models enables open-world multi-task agents. In Neural Information Processing Systems, 2023.
  26. Chain of thought prompting elicits reasoning in large language models. In Neural Information Processing Systems, 2022.
  27. Ronald J Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning, 8:229–256, 1992.
  28. The rise and potential of large language model based agents: A survey. ArXiv, abs/2309.07864, 2023.
  29. Tree of thoughts: Deliberate problem solving with large language models. In Neural Information Processing Systems, 2023.
  30. ReAct: Synergizing reasoning and acting in language models. In International Conference on Learning Representations (ICLR), 2023.
  31. Agent-pro: Learning to evolve via policy-level reflection and optimization. ArXiv, abs/2402.17574, 2024.
  32. Expel: Llm agents are experiential learners. In AAAI Conference on Artificial Intelligence (AAAI), 2024.
  33. Language agent tree search unifies reasoning acting and planning in language models. ArXiv, abs/2310.04406, 2023.
  34. Ghost in the minecraft: Generally capable agents for open-world environments via large language models with text-based knowledge and memory. ArXiv, abs/2305.17144, 2023.
  35. Large language models can learn rules. ArXiv, abs/2310.07064, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Minghao Chen (37 papers)
  2. Yihang Li (18 papers)
  3. Yanting Yang (10 papers)
  4. Shiyu Yu (3 papers)
  5. Binbin Lin (50 papers)
  6. Xiaofei He (70 papers)
Citations (1)
X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets