Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
133 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

AgentKit: Structured LLM Reasoning with Dynamic Graphs (2404.11483v2)

Published 17 Apr 2024 in cs.AI and cs.LG

Abstract: We propose an intuitive LLM prompting framework (AgentKit) for multifunctional agents. AgentKit offers a unified framework for explicitly constructing a complex "thought process" from simple natural language prompts. The basic building block in AgentKit is a node, containing a natural language prompt for a specific subtask. The user then puts together chains of nodes, like stacking LEGO pieces. The chains of nodes can be designed to explicitly enforce a naturally structured "thought process". For example, for the task of writing a paper, one may start with the thought process of 1) identify a core message, 2) identify prior research gaps, etc. The nodes in AgentKit can be designed and combined in different ways to implement multiple advanced capabilities including on-the-fly hierarchical planning, reflection, and learning from interactions. In addition, due to the modular nature and the intuitive design to simulate explicit human thought process, a basic agent could be implemented as simple as a list of prompts for the subtasks and therefore could be designed and tuned by someone without any programming experience. Quantitatively, we show that agents designed through AgentKit achieve SOTA performance on WebShop and Crafter. These advances underscore AgentKit's potential in making LLM agents effective and accessible for a wider range of applications. https://github.com/holmeswww/AgentKit

Definition Search Book Streamline Icon: https://streamlinehq.com
References (49)
  1. Do as i can, not as i say: Grounding language in robotic affordances, 2022. URL https://arxiv.org/abs/2204.01691.
  2. Neural module networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.  39–48, 2016.
  3. Graph of thoughts: Solving elaborate problems with large language models. arXiv preprint arXiv:2308.09687, 2023.
  4. Prompting is programming: A query language for large language models. Proceedings of the ACM on Programming Languages, 7(PLDI):1946–1969, 2023.
  5. Language models are few-shot learners. arXiv preprint arXiv:2005.14165, 2020.
  6. Sparks of artificial general intelligence: Early experiments with gpt-4. arXiv preprint arXiv:2303.12712, 2023.
  7. Exploration by random network distillation. arXiv preprint arXiv:1810.12894, 2018.
  8. Harrison Chase. LangChain, October 2022. URL https://github.com/langchain-ai/langchain.
  9. Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311, 2022.
  10. Guiding pretraining in reinforcement learning with large language models. arXiv preprint arXiv:2302.06692, 2023.
  11. Danijar Hafner. Benchmarking the spectrum of agent capabilities. arXiv preprint arXiv:2109.06780, 2021.
  12. Mastering atari with discrete world models. arXiv preprint arXiv:2010.02193, 2020.
  13. Mastering diverse domains through world models. arXiv preprint arXiv:2301.04104, 2023.
  14. Measuring massive multitask language understanding. arXiv preprint arXiv:2009.03300, 2020.
  15. Rainbow: Combining improvements in deep reinforcement learning. In Thirty-second AAAI conference on artificial intelligence, 2018.
  16. Language models as zero-shot planners: Extracting actionable knowledge for embodied agents, 2022. URL https://arxiv.org/abs/2201.07207.
  17. Mistral 7b. arXiv preprint arXiv:2310.06825, 2023.
  18. Uncertainty-driven exploration for generalization in reinforcement learning. In Deep Reinforcement Learning Workshop NeurIPS 2022, 2022.
  19. Arthur B Kahn. Topological sorting of large networks. Communications of the ACM, 5(11):558–562, 1962.
  20. Dspy: Compiling declarative language model calls into self-improving pipelines. arXiv preprint arXiv:2310.03714, 2023.
  21. Soar: An architecture for general intelligence. Artificial intelligence, 33(1):1–64, 1987.
  22. Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems, 33:9459–9474, 2020.
  23. Textbooks are all you need ii: phi-1.5 technical report. arXiv preprint arXiv:2309.05463, 2023.
  24. Agentbench: Evaluating llms as agents. In The Twelfth International Conference on Learning Representations, 2023.
  25. Self-refine: Iterative refinement with self-feedback. In A. Oh, T. Neumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine (eds.), Advances in Neural Information Processing Systems, volume 36, pp.  46534–46594. Curran Associates, Inc., 2023. URL https://proceedings.neurips.cc/paper_files/paper/2023/file/91edff07232fb1b55a505a9e9f6c0ff3-Paper-Conference.pdf.
  26. James Manyika. An overview of bard: an early experiment with generative ai. https://ai.google/static/documents/google-about-bard.pdf. Accessed: May 27, 2023.
  27. OpenAI. Gpt-4 technical report, 2023.
  28. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744, 2022.
  29. Nemotron-4 15b technical report, 2024.
  30. Stuart J Russell. Artificial intelligence a modern approach. Pearson Education, Inc., 2010.
  31. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
  32. Planning to explore via self-supervised world models. In International Conference on Machine Learning, pp.  8583–8592. PMLR, 2020.
  33. Reflexion: Language agents with verbal reinforcement learning. In Advances in Neural Information Processing Systems, volume 36, 2023.
  34. Significant-Gravitas. Significant-gravitas/auto-gpt: An experimental open-source attempt to make gpt-4 fully autonomous. URL https://github.com/Significant-Gravitas/Auto-GPT.
  35. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:2212.04088, 2022.
  36. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023.
  37. Voyager: An open-ended embodied agent with large language models. arXiv preprint arXiv:2305.16291, 2023a.
  38. Jarvis-1: Open-world multi-task agents with memory-augmented multimodal language models. arXiv preprint arXiv:2311.05997, 2023b.
  39. Describe, explain, plan and select: Interactive planning with large language models enables open-world multi-task agents. arXiv preprint arXiv:2302.01560, 2023c.
  40. Chain-of-thought prompting elicits reasoning in large language models. Advances in neural information processing systems, 35:24824–24837, 2022.
  41. Intelligent agents: Theory and practice. The knowledge engineering review, 10(2):115–152, 1995.
  42. Plan, eliminate, and track–language models are good teachers for embodied agents. arXiv preprint arXiv:2305.02412, 2023a.
  43. Spring: Studying papers and reasoning to play games. In Advances in Neural Information Processing Systems, volume 36, 2023b.
  44. Read and reap the rewards: Learning to play atari with the help of instruction manuals. Advances in Neural Information Processing Systems, 36, 2024a.
  45. Smartplay: A benchmark for llms as intelligent agents. In The Twelfth International Conference on Learning Representations, 2024b.
  46. Webshop: Towards scalable real-world web interaction with grounded language agents. Advances in Neural Information Processing Systems, 35:20744–20757, 2022a.
  47. React: Synergizing reasoning and acting in language models. In The Eleventh International Conference on Learning Representations, 2022b.
  48. Tree of thoughts: Deliberate problem solving with large language models, 2023. URL https://arxiv. org/pdf/2305.10601. pdf, 2023.
  49. Plan4mc: Skill reinforcement learning and planning for open-world minecraft tasks. arXiv preprint arXiv:2303.16563, 2023.
Citations (1)

Summary

  • The paper introduces a framework that uses a dynamic DAG-based structure to chain LLM prompts into complex reasoning sequences.
  • It employs a two-step process where nodes compose context and query an LLM, then post-process responses for actionable decisions.
  • Empirical tests on benchmarks like Crafter demonstrate its efficacy in real-time adaptability, lowering the barrier for sophisticated AI agent design.

Overview of the AgentKit Framework for Constructing LLM-based Agents

Introduction to AgentKit

AgentKit is a novel framework designed to leverage LLMs for creating complex agent behaviors. This framework facilitates the construction of agents capable of multifunctional roles through a structured prompting mechanism that resembles human thought processes. By enabling the chaining of simple natural language prompts into comprehensive solutions, AgentKit opens up possibilities for advanced agent capabilities such as on-the-fly hierarchical planning, reflection, and learning from interaction—all without the need for programming skills by the end user.

Architecture and Implementation

Core Concept

The core abstraction in AgentKit is the "node." Each node encapsulates a prompt representing a subtask and can be linked to other nodes to form a Directed Acyclic Graph (DAG). This graph structures the flow of tasks, enabling complex reasoning sequences reflective of explicit procedural thinking. Nodes can dynamically add or remove other nodes or dependencies at runtime, which provides flexibility to adapt to various scenarios, such as those encountered in real-time applications like self-driving cars or dynamic game environments.

Node Operational Flow

Nodes operate through a two-step process:

  1. Compose: This step involves gathering and formatting data from dependencies and possibly a centralized database, culminating in a structured prompt ready to be processed by the LLM.
  2. Query and After-query: After posing the prompt to the LLM, the output is optionally post-processed to fit the required action or decision format.

Dynamic Graph Modification

AgentKit supports dynamic modifications of the DAG during inference, allowing for runtime adaptation. This includes conditional branching and node adjustments based on the responses from the LLM, enhancing the model’s ability to handle complex, situation-dependent reasoning.

Empirical Results and Applications

Performance Metrics

The framework was empirically tested on benchmark tasks such as the WebShop and Crafter simulations, where it demonstrated state-of-the-art performance. In Crafter, AgentKit not only facilitated complex strategic gameplay but also enabled the agent to learn from its environment, thereby incrementally improving its performance.

Practical Implications

From a practical standpoint, AgentKit significantly lowers the barrier to creating sophisticated LLM-based agents. The framework's intuitive design allows users without coding expertise to construct and adjust agents according to specific needs, making advanced AI capabilities more accessible.

Future Prospects

Looking forward, the modular and flexible nature of AgentKit suggests extensive potential applications and improvements. Future enhancements could include more sophisticated node types with enhanced natural language understanding capabilities or deeper integration with external databases for real-time knowledge updating. Additional research could also explore the scalability of AgentKit in more complex domains or its integration with other AI technologies.

Conclusion

AgentKit represents a significant step forward in the design and implementation of intelligent agents through LLMs. By structuring agent behavior through easily configurable natural language prompts, it offers a robust framework that blends ease of use with powerful functionality, making sophisticated AI agent design more accessible to a broader audience. By continuing to develop and refine such frameworks, the field can move closer to creating highly adaptable, intelligent systems capable of performing a wide range of real-world tasks.

Youtube Logo Streamline Icon: https://streamlinehq.com

HackerNews