- The paper introduces a framework that decouples reasoning and retrieval using schema-guided queries to minimize hallucinations in LLM outputs.
- The methodology employs a multi-agent system where a Reasoner handles abstract task planning and a Retriever executes code for precise graph information extraction.
- Evaluations on simulation tasks, including BabyAI and VirtualHome, demonstrate improved performance in numerical Q&A and traversal planning with reduced computational load.
Schema-Guided Scene-Graph Reasoning with Multi-Agent Systems
Introduction to Scene-Graph Reasoning
Scene graphs, serving as structured, high-level representations, have become integral in grounding spatial reasoning tasks for LLMs. This paper presents an innovative approach named Schema-Guided Scene-Graph Reasoning, employing a multi-agent LLM system to enhance environmental reasoning tasks. The principal objective of the framework is to bridge the representational and reasoning gap by efficiently utilizing scene graphs without fully exposing LLMs to graph data, thus minimizing the risks of hallucination by omitting irrelevant information.
Framework Architecture
The proposed framework is architectured into two main modules within a multi-agent system: the Reasoner and the Retriever. The Reasoner specializes in abstract task decomposition and creating graph information queries, while the Retriever focuses on executing code to extract relevant graph information aligned with the queries. This separation ensures a dynamic and iterative interaction, optimizing both reasoning and data retrieval processes.
Key Features:
- Schema-Guidance: A scene graph schema is employed to guide both reasoning and retrieval processes, enabling structured and schema-aware reasoning.
- Reduced Data Dependency: The system avoids prompting LLMs with the full graph data, thereby reducing potential distractions from irrelevant information.
- Autonomous Collaboration: The multi-agent setup allows for efficient task solving through independent yet cooperative processes between reasoning and data retrieval.
Methodology
The modular design features distinct roles for different agents. Within the Reasoner, a Task Planner agent orchestrates the problem-solving iterations, issuing queries and interfacing with retrieval components. In the Retriever module, the Code Writer is responsible for generating executable programs based on the schema and queries, facilitating precise information extraction without relying on extensive API sets.
The iterative reasoning process is akin to the Reason-while-Retrieve strategy, but with significant improvements:
- Schema-driven Abstraction: By focusing on abstract reasoning over the schema rather than raw data, the system achieves more robust and scalable performance.
- Agent Decoupling: Separating reasoning from data retrieval contexts reduces unnecessary accumulation of historical context, which can impair reasoning efficiency.
Evaluation
The framework was tested across multiple environments, including the BabyAI and VirtualHome simulation environments, both featuring complex spatial reasoning and planning tasks. Performance evaluation emphasized success rates in numerical Q&A tasks and practical planning scenarios.
Results
- Numerical Q&A: Demonstrated superior reasoning capabilities over baseline methods, achieving high success rates by focusing only on relevant information for task-specific queries.
- Traversal Planning: Achieved the highest success in complex traversal tasks by effectively decomposing and addressing sub-problems iteratively.
- Robustness to API Constraints: Even when constrained by limited API functionalities, the multi-agent framework maintained superior performance, underscoring the effectiveness of program-based data interactions guided by schemas.
Computational Efficiency
The computational cost analysis revealed that, compared to baseline methods, the framework efficiently filters and processes graph information, reducing unnecessary computation while maintaining robust reasoning paths. For simple tasks, it scales down computational requirements effectively, and for complex tasks, it demonstrates the ability to scale appropriately and dynamically during inference.
Conclusion
The schema-guided, multi-agent LLM approach to scene-graph reasoning showcases significant improvements in spatial reasoning tasks. By leveraging structured reasoning processes and efficient data retrieval mechanisms, this framework offers a scalable and robust solution to complex reasoning tasks requiring structured environmental understanding. Future research directions include expanding agent capabilities and exploring enhanced learning strategies for complex, real-world task scenarios.