Sibyl: Simple yet Effective Agent Framework for Complex Real-world Reasoning

Published 15 Jul 2024 in cs.AI and cs.CL | (2407.10718v2)

Abstract: Existing agents based on LLMs demonstrate robust problem-solving capabilities by integrating LLMs' inherent knowledge, strong in-context learning and zero-shot capabilities, and the use of tools combined with intricately designed LLM invocation workflows by humans. However, these agents still exhibit shortcomings in long-term reasoning and under-use the potential of existing tools, leading to noticeable deficiencies in complex real-world reasoning scenarios. To address these limitations, we introduce Sibyl, a simple yet powerful LLM-based agent framework designed to tackle complex reasoning tasks by efficiently leveraging a minimal set of tools. Drawing inspiration from Global Workspace Theory, Sibyl incorporates a global workspace to enhance the management and sharing of knowledge and conversation history throughout the system. Furthermore, guided by Society of Mind Theory, Sibyl implements a multi-agent debate-based jury to self-refine the final answers, ensuring a comprehensive and balanced approach. This approach aims to reduce system complexity while expanding the scope of problems solvable-from matters typically resolved by humans in minutes to those requiring hours or even days, thus facilitating a shift from System-1 to System-2 thinking. Sibyl has been designed with a focus on scalability and ease of debugging by incorporating the concept of reentrancy from functional programming from its inception, with the aim of seamless and low effort integration in other LLM applications to improve capabilities. Our experimental results on the GAIA benchmark test set reveal that the Sibyl agent instantiated with GPT-4 achieves state-of-the-art performance with an average score of 34.55%, compared to other agents based on GPT-4. We hope that Sibyl can inspire more reliable and reusable LLM-based agent solutions to address complex real-world reasoning tasks.

Abstract PDF HTML Upgrade to Chat

Citations (3)

View on Semantic Scholar

Summary

The paper presents a novel LLM-based agent framework that overcomes long-term reasoning deficits by integrating a multi-agent debate and modular design.
It replaces complex dialogues with stateless QA functions and leverages minimal tools like a web browser and Python environment to simplify operations.
Empirical results on the GAIA benchmark demonstrate state-of-the-art performance, highlighting Sibyl’s efficient and streamlined approach to solving complex problems.

Sibyl: Simple yet Effective Agent Framework for Complex Real-world Reasoning

This essay discusses "Sibyl: Simple yet Effective Agent Framework for Complex Real-world Reasoning," a paper presenting a novel LLM-based agent framework designed to address the limitations of current agents in handling complex reasoning tasks. Authored by Yulong Wang, Tianhao Shen, Lifeng Liu, and Jian Xie, the paper focuses on overcoming deficits in long-term reasoning and the under-utilization of existing tools in real-world scenarios.

Introduction

The paper introduces Sibyl, an LLM-based agent framework that improves upon existing frameworks by integrating a minimal set of tools and incorporating theories like Global Workspace Theory and Society of Mind Theory. The framework aims to simplify system complexity while expanding the scope of problems that can be solved, facilitating a shift from rapid, intuitive (System-1) thinking to slow, deliberate (System-2) thinking.

Design Philosophy

Sibyl's design philosophy is centered on simplicity, modularity, and reusability. This is realized through several strategic approaches:

Human-oriented Browser Interface Instead of RAG: Sibyl utilizes a human-oriented browser interface to retrieve information, rather than relying on traditional Retrieval Augmented Generation (RAG) methods, which often result in significant information loss.
QA Function Instead of Dialogues: Replacing dialogues with stateless, reentrant QA functions simplifies the architecture and facilitates easier debugging and prompt engineering.
Limited Tools Instead of Specialized Tools: Sibyl primarily employs a web browser and Python environments, optimizing existing tools rather than adding specialized ones.
System-1 to System-2 Thinking: Incorporating long-term memory, planning, and error correction features, Sibyl aims to handle more complex tasks that require extended reasoning steps effectively.

Framework Modules

The Sibyl framework comprises four main modules:

Tool Planner: This module selects appropriate tools, functions, and parameters tailored to each specific subtask, aiming to minimize system complexity.
External Information Acquisition Channel: This component gathers and selectively compresses external information to maintain relevant data efficiently.
Multi-agent Debate-based Jury: Inspired by the Society of Mind Theory, this module uses multi-agent debate to refine answers, providing a comprehensive and balanced approach.
Global Workspace: Facilitated by the Global Workspace Theory, this component shares and manages knowledge across the system, improving long-term and complex reasoning capabilities.

Empirical Results

The experimental results on the GAIA benchmark test set show that the Sibyl agent instantiated with GPT-4 achieves state-of-the-art performance, with an average score of 34.55%, outperforming competitors such as AutoGen and AutoGPT-4. Significant improvements were observed particularly in challenging Level 2 and Level 3 scenarios. Additionally, Sibyl demonstrates strong reasoning efficiency compared to humans, often requiring fewer steps to solve problems.

Implications and Future Directions

Practical Implications: Sibyl can inspire more reliable and reusable LLM-based agent solutions, enabling their application in a wider range of complex real-world reasoning tasks. The focus on enhancing existing tools and simplifying the architecture ensures that Sibyl is easily adaptable and scalable.

Theoretical Implications: The multi-agent debate-based jury and global workspace incorporation provide new insights into effectively managing complex cognitive processes in LLM-based agents. The integration of theories like Global Workspace Theory and Society of Mind Theory exemplifies innovative approaches to improving agent frameworks.

Future Developments: There are several avenues for future research. Integrating vision-based LLMs to handle multimedia content, enhancing the browser capabilities to mimic human interactions more closely, and incorporating adaptive learning mechanisms would further bolster the agent's problem-solving capabilities. Developing specialized LLMs to improve efficiency and effectiveness in complex reasoning tasks remains a key area of focus.

Conclusion

Sibyl represents a significant advancement in the development of LLM-based agents, aiming to bridge the gap between System-1 and System-2 thinking. By simplifying system complexity and enhancing long-term reasoning capabilities, Sibyl provides a robust framework for tackling complex real-world tasks, setting a new benchmark for future LLM-based agent solutions.