Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 73 tok/s
Gemini 2.5 Pro 39 tok/s Pro
GPT-5 Medium 27 tok/s Pro
GPT-5 High 19 tok/s Pro
GPT-4o 115 tok/s Pro
Kimi K2 226 tok/s Pro
GPT OSS 120B 461 tok/s Pro
Claude Sonnet 4 38 tok/s Pro
2000 character limit reached

Schema-Guided Scene-Graph Reasoning based on Multi-Agent Large Language Model System (2502.03450v2)

Published 5 Feb 2025 in cs.LG, cs.AI, cs.MA, and cs.RO

Abstract: Scene graphs have emerged as a structured and serializable environment representation for grounded spatial reasoning with LLMs. In this work, we propose SG2, an iterative Schema-Guided Scene-Graph reasoning framework based on multi-agent LLMs. The agents are grouped into two modules: a (1) Reasoner module for abstract task planning and graph information queries generation, and a (2) Retriever module for extracting corresponding graph information based on code-writing following the queries. Two modules collaborate iteratively, enabling sequential reasoning and adaptive attention to graph information. The scene graph schema, prompted to both modules, serves to not only streamline both reasoning and retrieval process, but also guide the cooperation between two modules. This eliminates the need to prompt LLMs with full graph data, reducing the chance of hallucination due to irrelevant information. Through experiments in multiple simulation environments, we show that our framework surpasses existing LLM-based approaches and baseline single-agent, tool-based Reason-while-Retrieve strategy in numerical Q&A and planning tasks.

Summary

  • The paper introduces a framework that decouples reasoning and retrieval using schema-guided queries to minimize hallucinations in LLM outputs.
  • The methodology employs a multi-agent system where a Reasoner handles abstract task planning and a Retriever executes code for precise graph information extraction.
  • Evaluations on simulation tasks, including BabyAI and VirtualHome, demonstrate improved performance in numerical Q&A and traversal planning with reduced computational load.

Schema-Guided Scene-Graph Reasoning with Multi-Agent Systems

Introduction to Scene-Graph Reasoning

Scene graphs, serving as structured, high-level representations, have become integral in grounding spatial reasoning tasks for LLMs. This paper presents an innovative approach named Schema-Guided Scene-Graph Reasoning, employing a multi-agent LLM system to enhance environmental reasoning tasks. The principal objective of the framework is to bridge the representational and reasoning gap by efficiently utilizing scene graphs without fully exposing LLMs to graph data, thus minimizing the risks of hallucination by omitting irrelevant information.

Framework Architecture

The proposed framework is architectured into two main modules within a multi-agent system: the Reasoner and the Retriever. The Reasoner specializes in abstract task decomposition and creating graph information queries, while the Retriever focuses on executing code to extract relevant graph information aligned with the queries. This separation ensures a dynamic and iterative interaction, optimizing both reasoning and data retrieval processes.

Key Features:

  1. Schema-Guidance: A scene graph schema is employed to guide both reasoning and retrieval processes, enabling structured and schema-aware reasoning.
  2. Reduced Data Dependency: The system avoids prompting LLMs with the full graph data, thereby reducing potential distractions from irrelevant information.
  3. Autonomous Collaboration: The multi-agent setup allows for efficient task solving through independent yet cooperative processes between reasoning and data retrieval.

Methodology

The modular design features distinct roles for different agents. Within the Reasoner, a Task Planner agent orchestrates the problem-solving iterations, issuing queries and interfacing with retrieval components. In the Retriever module, the Code Writer is responsible for generating executable programs based on the schema and queries, facilitating precise information extraction without relying on extensive API sets.

The iterative reasoning process is akin to the Reason-while-Retrieve strategy, but with significant improvements:

  • Schema-driven Abstraction: By focusing on abstract reasoning over the schema rather than raw data, the system achieves more robust and scalable performance.
  • Agent Decoupling: Separating reasoning from data retrieval contexts reduces unnecessary accumulation of historical context, which can impair reasoning efficiency.

Evaluation

The framework was tested across multiple environments, including the BabyAI and VirtualHome simulation environments, both featuring complex spatial reasoning and planning tasks. Performance evaluation emphasized success rates in numerical Q&A tasks and practical planning scenarios.

Results

  • Numerical Q&A: Demonstrated superior reasoning capabilities over baseline methods, achieving high success rates by focusing only on relevant information for task-specific queries.
  • Traversal Planning: Achieved the highest success in complex traversal tasks by effectively decomposing and addressing sub-problems iteratively.
  • Robustness to API Constraints: Even when constrained by limited API functionalities, the multi-agent framework maintained superior performance, underscoring the effectiveness of program-based data interactions guided by schemas.

Computational Efficiency

The computational cost analysis revealed that, compared to baseline methods, the framework efficiently filters and processes graph information, reducing unnecessary computation while maintaining robust reasoning paths. For simple tasks, it scales down computational requirements effectively, and for complex tasks, it demonstrates the ability to scale appropriately and dynamically during inference.

Conclusion

The schema-guided, multi-agent LLM approach to scene-graph reasoning showcases significant improvements in spatial reasoning tasks. By leveraging structured reasoning processes and efficient data retrieval mechanisms, this framework offers a scalable and robust solution to complex reasoning tasks requiring structured environmental understanding. Future research directions include expanding agent capabilities and exploring enhanced learning strategies for complex, real-world task scenarios.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.