Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
98 tokens/sec
GPT-4o
61 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

An In-depth Survey of Large Language Model-based Artificial Intelligence Agents (2309.14365v1)

Published 23 Sep 2023 in cs.CL and cs.AI
An In-depth Survey of Large Language Model-based Artificial Intelligence Agents

Abstract: Due to the powerful capabilities demonstrated by LLM, there has been a recent surge in efforts to integrate them with AI agents to enhance their performance. In this paper, we have explored the core differences and characteristics between LLM-based AI agents and traditional AI agents. Specifically, we first compare the fundamental characteristics of these two types of agents, clarifying the significant advantages of LLM-based agents in handling natural language, knowledge storage, and reasoning capabilities. Subsequently, we conducted an in-depth analysis of the key components of AI agents, including planning, memory, and tool use. Particularly, for the crucial component of memory, this paper introduced an innovative classification scheme, not only departing from traditional classification methods but also providing a fresh perspective on the design of an AI agent's memory system. We firmly believe that in-depth research and understanding of these core components will lay a solid foundation for the future advancement of AI agent technology. At the end of the paper, we provide directional suggestions for further research in this field, with the hope of offering valuable insights to scholars and researchers in the field.

Introduction

The field of AI has seen a dramatic shift with the advent of LLMs, especially in the context of AI agents. This transition is fueled by the remarkable capabilities of LLMs in natural language understanding, reasoning, and knowledge recall. The integration of these models with AI agents suggests a new breed of intelligent systems capable of sophisticated behaviors that were once beyond the reach of traditional rule-based methods. The paper under discussion embarks on a thorough examination of how LLMs, compared to traditional AGI methodologies, significantly bolster the design and functionality of AI agents.

Fundamental Differences

AI agents historically rely on built-in rules and algorithms tailored to specific tasks, often resulting in competent yet rigid performances. LLMs have disrupted this landscape by enabling AI agents to understand and generate language, exhibit robust generalization, and leverage a vast knowledge base. The paper explores these differences, indicating that, unlike their predecessors, LLM-based agents can flexibly adapt to different tasks without the necessity of task-specific training, exemplified by the VOYAGER agent's performance in the game Minecraft.

Core Components

Integral components such as planning, memory, and tool use serve as the foundation of AI agent capabilities. Planning involves strategizing a sequence of actions to achieve a goal, for which LLMs have shown an improved capability through nuanced task decomposition and self-reflection. Memory has been reclassified based on LLM characteristics, into training memory (knowledge learned during pre-training), short-term memory (temporary, task-specific information), and long-term memory (information stored in external systems). The paper also emphasizes the importance of integrating these components for optimal agent performance.

Applications and Vision

AI agents have penetrated various domains, from chatbots offering both productivity tools and emotional companionship, like Pi, to game agents like Voyager with dynamic learning capabilities. Coding aid from GPT Engineer, design platforms like Diagram, and research-focused agents such as ChemCrow and Agent exhibit the diversity of LLM-based agent applications. The paper also touches upon collaborative systems where AI agents work in synergy to accomplish complex tasks. The vision for these agents is not just to perform assigned tasks but to engage in a wider range of general-purpose applications, culminating in a step closer to true artificial general intelligence.

Conclusion

In summary, this paper provides a detailed exploration of LLM-based AI agents and highlights the clear distinction from their traditional counterparts. It offers a deep dive into the mechanics underpinning these agents, the pivotal role of their core components, and a broad spectrum of applications that have emerged. The survey aims to aid readers in familiarizing themselves with current advancements and inspires the trajectory of future research in AI agent technology.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (113)
  1. Pddl— the planning domain definition language. Technical Report, Tech. Rep.
  2. RL4F: Generating natural language feedback with reinforcement learning for repairing model outputs. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics, pages 7716–7733.
  3. Meta reinforcement learning for sim-to-real domain adaptation. In 2020 IEEE International Conference on Robotics and Automation (ICRA), pages 2725–2731. IEEE.
  4. Deep reinforcement learning: A brief survey. IEEE Signal Processing Magazine, 34(6):26–38.
  5. Alan D Baddeley. 1997. Human memory: Theory and practice. psychology press.
  6. Alan David Baddeley. 1983. Working memory. Philosophical Transactions of the Royal Society of London. B, Biological Sciences, 302(1110):311–324.
  7. Benchmarking llm powered chatbots: Methods and metrics. arXiv preprint arXiv:2308.04624.
  8. Christopher Berner and Brockman et al. 2019. Dota 2 with large scale deep reinforcement learning. arXiv preprint arXiv:1912.06680.
  9. Chatgpt is a knowledgeable but inexperienced solver: An investigation of commonsense problem in large language models. arXiv preprint arXiv:2303.16421.
  10. Emergent autonomous scientific research capabilities of large language models. arXiv preprint arXiv:2304.05332.
  11. Chemcrow: Augmenting large-language models with chemistry tools. arXiv preprint arXiv:2304.05376.
  12. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.
  13. Large language models as tool makers. arXiv preprint arXiv:2305.17126.
  14. Eduardo Camina and Francisco Güell. 2017. The neuroanatomical, neurophysiological and psychological basis of memory: Current models and their origins. Frontiers in pharmacology, 8:438.
  15. Optimal mixed discrete-continuous planning for linear hybrid systems. In Proceedings of the 24th International Conference on Hybrid Systems: Computation and Control, pages 1–12.
  16. When do you need chain-of-thought prompting for chatgpt? arXiv preprint arXiv:2304.03262.
  17. Introspective tips: Large language model for in-context decision making. arXiv preprint arXiv:2305.11598.
  18. Po-Lin Chen and Cheng-Shang Chang. 2023. Interact: Exploring the potentials of chatgpt as a cooperative agent. arXiv preprint arXiv:2308.01552.
  19. Chatcot: Tool-augmented chain-of-thought reasoning on\\\backslash\\\\backslash\chat-based large language models. arXiv preprint arXiv:2305.14323.
  20. Pretrained language model embryology: The birth of albert. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pages 6813–6828.
  21. Training verifiers to solve math word problems. arXiv preprint arXiv:2110.14168.
  22. Analyzing commonsense emergence in few-shot knowledge models. arXiv preprint arXiv:2101.00297.
  23. Commonsense knowledge mining from pretrained models. In Proceedings of the conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing, pages 1173–1178.
  24. Clip-nav: Using clip for zero-shot vision-and-language navigation. arXiv preprint arXiv:2211.16649.
  25. Palm-e: An embodied multimodal language model. In Proceedings of the International Conference on Machine Learning, pages 8469–8488.
  26. Htn planning: complexity and expressivity. In Proceedings of the Twelfth AAAI National Conference on Artificial Intelligence, pages 1123–1128.
  27. Minedojo: Building open-ended embodied agents with internet-scale knowledge. Advances in Neural Information Processing Systems, 35:18343–18362.
  28. Maria Fox and Derek Long. 2003. Pddl2. 1: An extension to pddl for expressing temporal planning domains. Journal of artificial intelligence research, 20:61–124.
  29. Working memory capacity of chatgpt: An empirical study.
  30. Leveraging pre-trained large language models to construct and utilize world models for model-based task planning. arXiv preprint arXiv:2305.14909.
  31. Recent trends in task and motion planning for robotics: A survey. ACM Computing Surveys.
  32. A universal modular actor formalism for artificial intelligence. In Proceedings of the 3rd international joint conference on Artificial intelligence, pages 235–245.
  33. Enabling efficient interaction between an algorithm agent and an llm: A reinforcement learning approach. arXiv preprint arXiv:2306.03604.
  34. Jie Huang and Kevin Chen-Chuan Chang. 2022. Towards reasoning in large language models: A survey. arXiv preprint arXiv:2212.10403.
  35. Language models as zero-shot planners: Extracting actionable knowledge for embodied agents. In International Conference on Machine Learning, pages 9118–9147.
  36. Inner monologue: Embodied reasoning through planning with language models. arXiv preprint arXiv:2207.05608.
  37. Ian ML Hunter. 1957. Memory: Facts and fallacies.
  38. Task planning in robotics: an empirical comparison of pddl-and asp-based systems. Frontiers of Information Technology & Electronic Engineering, 20:363–373.
  39. Bioinspired electronics for artificial sensory systems. Advanced Materials, 31(34):1803637.
  40. Think before you act: Decision transformers with internal working memory. arXiv preprint arXiv:2305.16338.
  41. Mrkl systems: A modular, neuro-symbolic architecture that combines large language models, external knowledge sources and discrete reasoning. arXiv preprint arXiv:2205.00445.
  42. An emotion understanding framework for intelligent agents based on episodic and semantic memories. Autonomous agents and multi-agent systems, 28:126–153.
  43. Simple but effective: Clip embeddings for embodied ai. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14829–14838.
  44. A machine with short-term, episodic, and semantic memory systems. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 48–56.
  45. Large language models are zero-shot reasoners. Advances in neural information processing systems, 35:22199–22213.
  46. Prompted llms as chatbot modules for long open-domain conversation. arXiv preprint arXiv:2305.04533.
  47. Llava-med: Training a large language-and-vision assistant for biomedicine in one day. arXiv preprint arXiv:2306.00890.
  48. Large language models with controllable working memory. In Findings of the Association for Computational Linguistics: ACL, pages 1774–1793.
  49. Haizhen Li and Xilun Ding. 2023. Adaptive and intelligent robot task planning for home service: A review. Engineering Applications of Artificial Intelligence, 117:105618.
  50. Api-bank: A benchmark for tool-augmented llms. arXiv preprint arXiv:2304.08244.
  51. Yuxi Li. 2017. Deep reinforcement learning: An overview. arXiv preprint arXiv:1701.07274.
  52. Unleashing infinite-length input capacity for large-scale language models with self-controlled memory system. arXiv preprint arXiv:2304.13343.
  53. Taskmatrix. ai: Completing tasks by connecting foundation models with millions of apis. arXiv preprint arXiv:2303.16434.
  54. Decision-oriented dialogue for human-ai collaboration. arXiv preprint arXiv:2305.20076.
  55. Agentsims: An open-source sandbox for large language model evaluation. arXiv preprint arXiv:2308.04026.
  56. Llm+ p: Empowering large language models with optimal planning proficiency. arXiv preprint arXiv:2304.11477.
  57. Chain of hindsight aligns language models with feedback. arXiv preprint arXiv:2302.02676, 3.
  58. Agentbench: Evaluating llms as agents. arXiv preprint arXiv:2308.03688.
  59. Bolaa: Benchmarking and orchestrating llm-augmented autonomous agents. arXiv preprint arXiv:2308.05960.
  60. Petlon: planning efficiently for task-level-optimal navigation. In Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems, pages 220–228.
  61. Few-shot subgoal planning with language models. arXiv preprint arXiv:2205.14288.
  62. Jieyi Long. 2023. Large language model guided tree-of-thought. arXiv preprint arXiv:2305.08291.
  63. Chameleon: Plug-and-play compositional reasoning with large language models. arXiv preprint arXiv:2304.09842.
  64. Video-chatgpt: Towards detailed video understanding via large vision and language models. arXiv preprint arXiv:2306.05424.
  65. Zson: Zero-shot object-goal navigation using multimodal goal embeddings. Advances in Neural Information Processing Systems, pages 32340–32352.
  66. J McCarthy. 1959. Programs with common sense. In Proc. Teddington Conference on the Mechanization of Thought Processes, 1959, pages 75–91.
  67. Marvin L. Minsky. 1988. The Society of Mind. Simon & Schuster, New York.
  68. Playing atari with deep reinforcement learning. CoRR, abs/1312.5602.
  69. Webgpt: Browser-assisted question-answering with human feedback. arXiv preprint arXiv:2112.09332.
  70. Andrew M Nuxoll and John E Laird. 2007. Extending cognitive architecture with episodic memory. In Proceedings of the 22nd national conference on Artificial intelligence-Volume 2, pages 1560–1565.
  71. Amin Omidvar and Aijun An. 2023. Empowering conversational agents using semantic in-context learning. In Proceedings of the 18th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2023), pages 766–771.
  72. OpenAI. 2023. Gpt-4 technical report.
  73. Art: Automatic multi-step reasoning and tool-use for large language models. arXiv preprint arXiv:2303.09014.
  74. Talm: Tool augmented language models. arXiv preprint arXiv:2205.12255.
  75. Generative agents: Interactive simulacra of human behavior. arXiv preprint arXiv:2304.03442.
  76. Gorilla: Large language model connected with massive apis. arXiv preprint arXiv:2305.15334.
  77. Check your facts and try again: Improving large language models with external knowledge and automated feedback. arXiv preprint arXiv:2302.12813.
  78. Language models as knowledge bases? In Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, pages 2463–2473.
  79. Learning transferable visual models from natural language supervision. In Proceedings of the 38th International Conference on Machine Learning, pages 8748–8763.
  80. A primer in bertology: What we know about how bert works. Transactions of the Association for Computational Linguistics, 8:842–866.
  81. Learning representations by back-propagating errors. nature, 323(6088):533–536.
  82. Stuart Russell and Peter Norvig. 2010. Artificial Intelligence: A Modern Approach, 3 edition. Prentice Hall.
  83. Tara Safavi and Danai Koutra. 2021. Relational world knowledge representation in contextual language models: A review. arXiv preprint arXiv:2104.05837.
  84. Toolformer: Language models can teach themselves to use tools. arXiv preprint arXiv:2302.04761.
  85. Hugginggpt: Solving ai tasks with chatgpt and its friends in huggingface. arXiv preprint arXiv:2303.17580.
  86. Reflexion: an autonomous agent with dynamic memory and self-reflection. arXiv preprint arXiv:2303.11366.
  87. Blenderbot 3: a deployed conversational agent that continually learns to responsibly engage. arXiv preprint arXiv:2208.03188.
  88. Mastering the game of go with deep neural networks and tree search. nature, 529(7587):484–489.
  89. Progprompt: Generating situated robot task plans using large language models. In Proceedings of IEEE International Conference on Robotics and Automation, pages 11523–11530.
  90. Interleaving hierarchical task planning and motion constraint testing for dual-arm manipulation. In 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 4061–4066.
  91. Adaplanner: Adaptive planning from feedback with language models. arXiv preprint arXiv:2305.16653.
  92. Graspgpt: Leveraging semantic knowledge from a large language model for task-oriented grasping. arXiv preprint arXiv:2307.13204.
  93. Lamda: Language models for dialog applications. arXiv preprint arXiv:2201.08239.
  94. Endel Tulving. 1983. Elements of episodic memory.
  95. Endel Tulving et al. 1972. Episodic and semantic memory. Organization of memory, 1(381-403):1.
  96. On the planning abilities of large language models (a critical investigation with a proposed benchmark). arXiv preprint arXiv:2302.06706.
  97. Steven Vere and Timothy Bickmore. 1990. A basic agent. Computational intelligence, 6(1):41–60.
  98. Grandmaster level in starcraft ii using multi-agent reinforcement learning. Nature, 575(7782):350–354.
  99. Artificial sensory memory. Advanced Materials, 32(15):1902434.
  100. Voyager: An open-ended embodied agent with large language models. arXiv preprint arXiv:2305.16291.
  101. Plan-and-solve prompting: Improving zero-shot chain-of-thought reasoning by large language models. In Proceedings of the 61st Annual Meeting of the Association for Computational, pages 2609–2634.
  102. Emergent abilities of large language models. arXiv preprint arXiv:2206.07682.
  103. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837.
  104. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:2302.05128.
  105. Gentopia: A collaborative platform for tool-augmented llms. arXiv preprint arXiv:2308.04030.
  106. Tree of thoughts: Deliberate problem solving with large language models. arXiv preprint arXiv:2305.10601.
  107. React: Synergizing reasoning and acting in language models. arXiv preprint arXiv:2210.03629.
  108. Retroformer: Retrospective large language agents with policy gradient optimization. arXiv preprint arXiv:2308.02151.
  109. Investigating chain-of-thought with chatgpt for stance detection on social media. arXiv preprint arXiv:2304.03087.
  110. Large language model is semi-parametric reinforcement learning agent. arXiv preprint arXiv:2306.07929.
  111. Automatic chain of thought prompting in large language models. In Proceedings of the Eleventh International Conference on Learning Representations.
  112. Memorybank: Enhancing large language models with long-term memory. arXiv preprint arXiv:2305.10250.
  113. Mingchen Zhuge and Haozhe Liu et al. 2023. Mindstorms in natural language-based societies of mind.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Pengyu Zhao (10 papers)
  2. Zijian Jin (5 papers)
  3. Ning Cheng (96 papers)
Citations (12)