Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Agent-Pro: Learning to Evolve via Policy-Level Reflection and Optimization (2402.17574v3)

Published 27 Feb 2024 in cs.AI and cs.CL

Abstract: LLMs exhibit robust problem-solving capabilities for diverse tasks. However, most LLM-based agents are designed as specific task solvers with sophisticated prompt engineering, rather than agents capable of learning and evolving through interactions. These task solvers necessitate manually crafted prompts to inform task rules and regulate LLM behaviors, inherently incapacitating to address complex dynamic scenarios e.g., large interactive games. In light of this, we propose Agent-Pro: an LLM-based Agent with Policy-level Reflection and Optimization that can learn a wealth of expertise from interactive experiences and progressively elevate its behavioral policy. Specifically, it involves a dynamic belief generation and reflection process for policy evolution. Rather than action-level reflection, Agent-Pro iteratively reflects on past trajectories and beliefs, fine-tuning its irrational beliefs for a better policy. Moreover, a depth-first search is employed for policy optimization, ensuring continual enhancement in policy payoffs. Agent-Pro is evaluated across two games: Blackjack and Texas Hold'em, outperforming vanilla LLM and specialized models. Our results show Agent-Pro can learn and evolve in complex and dynamic scenes, which also benefits numerous LLM-based applications.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (69)
  1. Qwen technical report. arXiv preprint arXiv:2309.16609.
  2. Human-level play in the game of diplomacy by combining language models with strategic reasoning. Science, 378:1067 – 1074.
  3. Graph of thoughts: Solving elaborate problems with large language models. arXiv preprint arXiv:2308.09687.
  4. Large language models can implement policy iteration. In Thirty-seventh Conference on Neural Information Processing Systems.
  5. Introspective tips: Large language model for in-context decision making. arXiv preprint arXiv:2305.11598.
  6. Agentverse: Facilitating multi-agent collaboration and exploring emergent behaviors in agents. ArXiv, abs/2308.10848.
  7. Black-box prompt optimization: Aligning large language models without model training. arXiv preprint arXiv:2311.04155.
  8. Palm: Scaling language modeling with pathways. ArXiv, abs/2204.02311.
  9. Improving factuality and reasoning in language models through multiagent debate. ArXiv, abs/2305.14325.
  10. An interactive agent foundation model. arXiv preprint arXiv:2402.05929.
  11. Can large language models serve as rational players in game theory? a systematic analysis. arXiv preprint arXiv:2312.05488.
  12. Improving language model negotiation with self-play and in-context learning from ai feedback. ArXiv, abs/2305.10142.
  13. Mindagent: Emergent gaming interaction. arXiv preprint arXiv:2309.09971.
  14. Suspicion-agent: Playing imperfect information games with theory of mind aware gpt-4. arXiv preprint arXiv:2309.17277.
  15. Connecting large language models with evolutionary algorithms yields powerful prompt optimizers. arXiv preprint arXiv:2309.08532.
  16. Metagpt: Meta programming for multi-agent collaborative framework. ArXiv, abs/2308.00352.
  17. Automatic engineering of long prompts. arXiv preprint arXiv:2311.10117.
  18. Large language models can self-improve. ArXiv, abs/2210.11610.
  19. Camel: Communicative agents for ”mind” exploration of large language model society. In Thirty-seventh Conference on Neural Information Processing Systems.
  20. Theory of mind for multi-agent collaboration via large language models. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 180–192, Singapore. Association for Computational Linguistics.
  21. Nunzio Lorè and Babak Heydari. 2023. Strategic behavior of large language models: Game structure vs. contextual framing. arXiv preprint arXiv:2309.05898.
  22. Large language models play starcraft ii: Benchmarks and a chain of summarization approach. arXiv preprint arXiv:2312.11865.
  23. Self-refine: Iterative refinement with self-feedback. ArXiv, abs/2303.17651.
  24. Human-level control through deep reinforcement learning. nature, 518(7540):529–533.
  25. OpenAI. 2022. Chatgpt.
  26. OpenAI. 2023. Gpt-4 technical report.
  27. Automatically correcting large language models: Surveying the landscape of diverse self-correction strategies. ArXiv, abs/2308.03188.
  28. Generative agents: Interactive simulacra of human behavior. In In the 36th Annual ACM Symposium on User Interface Software and Technology (UIST ’23), UIST ’23, New York, NY, USA. Association for Computing Machinery.
  29. Refiner: Reasoning feedback on intermediate representations. ArXiv, abs/2304.01904.
  30. David Premack and Guy Woodruff. 1978. Does the chimpanzee have a theory of mind? Behavioral and brain sciences, 1(4):515–526.
  31. Automatic prompt optimization with” gradient descent” and beam search. arXiv preprint arXiv:2305.03495.
  32. Investigate-consolidate-exploit: A general strategy for inter-task agent self-evolution. arXiv preprint arXiv:2401.13996.
  33. Tool learning with foundation models.
  34. Toolllm: Facilitating large language models to master 16000+ real-world apis. arXiv preprint arXiv:2307.16789.
  35. Toolformer: Language Models Can Teach Themselves to Use Tools. ArXiv, abs/2302.04761.
  36. Hugginggpt: Solving ai tasks with chatgpt and its friends in hugging face. In Advances in Neural Information Processing Systems.
  37. Taskbench: Benchmarking large language models for task automation. arXiv preprint arXiv:2311.18760.
  38. Reflexion: an autonomous agent with dynamic memory and self-reflection. ArXiv, abs/2303.11366.
  39. Llama: Open and Efficient Foundation Language Models. ArXiv, abs/2302.13971.
  40. Llama 2: Open foundation and fine-tuned chat models. ArXiv, abs/2307.09288.
  41. Voyager: An open-ended embodied agent with large language models. arXiv preprint arXiv:2305.16291.
  42. Avalon’s game of thoughts: Battle against deception through recursive contemplation. arXiv preprint arXiv:2310.01320.
  43. Promptagent: Strategic planning with language models enables expert-level prompt optimization. arXiv preprint arXiv:2310.16427.
  44. Self-consistency improves chain of thought reasoning in language models. ArXiv, abs/2203.11171.
  45. Emergent abilities of large language models. CoRR, abs/2206.07682.
  46. Chain of Thought Prompting Elicits Reasoning in Large Language Models. In Conference on Neural Information Processing Systems (NeurIPS).
  47. Visual chatgpt: Talking, drawing and editing with visual foundation models. arXiv preprint arXiv:2303.04671.
  48. Autogen: Enabling next-gen llm applications via multi-agent conversation framework. ArXiv, abs/2308.08155.
  49. The rise and potential of large language model based agents: A survey. arXiv preprint arXiv:2309.07864.
  50. Exploring large language models for communication games: An empirical study on werewolf. arXiv preprint arXiv:2309.04658.
  51. Language agents with reinforcement learning for strategic play in the werewolf game. arXiv preprint arXiv:2310.18940.
  52. Large language models as optimizers. arXiv preprint arXiv:2309.03409.
  53. Tree of thoughts: Deliberate problem solving with large language models. ArXiv, abs/2305.10601.
  54. React: Synergizing reasoning and acting in language models. ArXiv, abs/2210.03629.
  55. Prompt engineering a prompt engineer. arXiv preprint arXiv:2311.05661.
  56. Glm-130b: An Open Bilingual Pre-trained Model. ICLR 2023 poster.
  57. Rlcard: A toolkit for reinforcement learning in card games. arXiv preprint arXiv:1910.04376.
  58. Douzero: Mastering doudizhu with self-play deep reinforcement learning. In international conference on machine learning, pages 12333–12344. PMLR.
  59. Proagent: Building proactive cooperative ai with large language models. arXiv preprint arXiv:2308.11339.
  60. Appagent: Multimodal agents as smartphone users.
  61. Large language model is semi-parametric reinforcement learning agent. arXiv preprint arXiv:2306.07929.
  62. Opt: Open Pre-trained Transformer Language Models. ArXiv, abs/2205.01068.
  63. Data-copilot: Bridging billions of data and humans with autonomous workflow. arXiv preprint arXiv:2306.07209.
  64. Self-contrast: Better reflection through inconsistent solving perspectives. arXiv preprint arXiv:2401.02009.
  65. A closed-loop perception, decision-making and reasoning mechanism for human-like navigation. In Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI-22, pages 4717–4724. International Joint Conferences on Artificial Intelligence Organization. Main Track.
  66. Learning to navigate in a vuca environment: Hierarchical multi-expert approach. 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 9254–9261.
  67. Expel: Llm agents are experiential learners. arXiv preprint arXiv:2308.10144.
  68. Least-to-most prompting enables complex reasoning in large language models. ArXiv, abs/2205.10625.
  69. Large language models are human-level prompt engineers. arXiv preprint arXiv:2211.01910.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (10)
  1. Wenqi Zhang (41 papers)
  2. Ke Tang (107 papers)
  3. Hai Wu (19 papers)
  4. Mengna Wang (3 papers)
  5. Yongliang Shen (47 papers)
  6. Guiyang Hou (12 papers)
  7. Zeqi Tan (18 papers)
  8. Peng Li (390 papers)
  9. Yueting Zhuang (164 papers)
  10. Weiming Lu (54 papers)
Citations (20)
X Twitter Logo Streamline Icon: https://streamlinehq.com