LLM-MAS Integration Architecture

Updated 23 December 2025

LLM-MAS is an integration framework that configures large language models as specialized agents within a multi-agent system, enabling collaborative AI workflows.
The architecture employs modular, layered pipelines with role-specialized agents and API-driven communication to enhance precision and adaptability in zero-shot tasks.
Robust communication protocols using JSON schemas and consensus mechanisms ensure reliable data exchange and schema-constrained inference, validated by empirical performance metrics.

A LLM–Multi-Agent System (LLM-MAS) integration architecture defines the technical and organizational framework by which LLMs are configured and orchestrated as explicitly defined agents within a multi-agent system, enabling collaborative, robust, and specialized AI workflows. This integration leverages the unique reasoning, data extraction, and specialization capabilities of state-of-the-art LLMs while preserving the modularity, parallelism, and dynamic coordination principles inherent to MAS. Modern LLM-MAS architectures span layered API-driven pipelines, adaptive agent role assignment, consensus and reliability protocols, and human-in-the-loop (HITL) components, supporting zero/few-shot reasoning, tool use, and seamless scaling across domains.

1. Layered System Composition and Data Flow

Contemporary LLM-MAS architectures adopt a modular, layered approach, wherein agent specialization, workflow coordination, and data management are disentangled across dedicated layers and APIs. The Urban-MAS framework exemplifies this, with three core agentic layers: Predictive Factor Guidance Agents (task-relevant factor surfacing), Reliable UrbanInfo Extraction Agents (robust LLM-based urban knowledge extraction with conflict checking), and Multi-UrbanInfo Inference Agents (integrative, schema-constrained prediction synthesis). Data and control propagate from user task input to final prediction through the following protocolized steps:

Task description (τ) and spatial context (ℓ) are submitted.
Factor guidance agents invoke LLMs (e.g., GPT-4o via LangChain) to enumerate salient predictive factors for each (dimension, scale) tuple, producing collections $P_{d,r}$ .
Extraction agents run paired, zero-shot LLM queries for each (d, r) to yield candidate outputs, compared via a soft similarity metric: $\mathrm{soft\_sim}(a, b) = 0.4 \cdot \mathrm{Jaccard}(a, b) + 0.6 \cdot \mathrm{SequenceMatcher}(a, b)$ . Fields with similarity below 0.72 are recursively refined via targeted re-prompting.
The four dimension-scale knowledge objects $U^*_{d,r}$ are fused by a final inference agent, with reasoning invoked in a schema-constrained, zero-shot LLM call to produce final task predictions (e.g., numeric quantities).
All communication occurs via JSON or plain-text messages sent over a shared API; no bespoke RPC or external coordination layer is necessary (Lou, 30 Oct 2025).

2. Agent Role Specialization and Workflow Patterns

LLM-MAS integration architectures enable precise role-focused decomposition, with each agent encapsulating both functional logic and a distinct prompt template, facilitating modularity and parallelization. Roles are often hierarchically layered:

Predictive Factor Guidance Agents: Use chain-of-thought prompting to surface latent features critical for downstream reasoning, offering robustness to domain shifts and obviating fine-tuning (e.g., GPT-4o as in Urban-MAS).
Extraction/Evaluator/Refiner Agents: Control extraction quality by running multiple LLM instances per query, evaluating consistency, and repairing non-concordant fields. Extraction is parameterized via task- and context-aware prompts.
Inference/Decision Agents: Aggregate agent outputs using schema-aware LLM reasoning, enforcing output format constraints and producing task-specific quantitative or categorical responses.

Empirical ablation demonstrates the centrality of factor-guidance: removing this layer causes mean absolute error to rise by 46.9–102.9% across core prediction tasks, far exceeding the contribution of reliability-boosted extraction. Agent specialization thus underpins both robustness and extensibility (Lou, 30 Oct 2025).

3. Communication Protocols and Consistency Mechanisms

LLM-MAS systems serialize inter-agent interaction using formal, yet lightweight, communication schemas (JSON, structured text) and systematic protocols for consensus and reliability:

Paired Output Comparison: For critical extraction steps, two independent LLM outputs are produced and compared fieldwise using a weighted soft similarity. If similarity falls below a threshold in any field, only the offending fields are re-extracted—a conflict-resolving loop.
Standardized Message Envelopes: Each transmission encapsulates the sender, recipient, message type, payload, and timestamp, ensuring traceability and facilitating auditing.
Schema-Constrained Inference: At the final inference stage, agent-synthesized knowledge is collated and passed to the LLM with explicit schema constraints contained in the prompt (e.g., expected output fields/range), minimizing hallucinations and enforcing type safety.

The protocol design minimizes dependencies on external orchestration logic; all coordination, validation, and re-extraction are performed in-prompt with explicit intermediate outputs (Lou, 30 Oct 2025).

4. LLM Usage Patterns and Zero-Shot Generalization

Modern LLM-MAS architectures rely on a carefully structured sequence of prompt templates, explicit in agent layer design, to maximize the throughput and reliability of zero/few-shot LLM reasoning without recourse to task- or domain-specific fine-tuning:

LLM Model Choices: Factor guidance employs state-of-the-art models (e.g., GPT-4o) with chain-of-thought prompting, while extraction and inference steps exploit more recent (e.g., GPT-5) APIs capable of enforcing JSON-style structured output.
Prompt Templates: Prompts for each role encode the agent's expertise, task, and required output, e.g., for factor guidance, "List six most influential factors with justification," for extraction "Output UrbanInfo as JSON based on provided factors and location."
Preprocessing/Postprocessing: Pre-inference normalization (case-folding, whitespace normalization, punctuation removal) and post-inference validation (JSON schema checking, range validation) are enforced to align stochastic LLM outputs to deterministic MAS expectations.

Zero-shot deployment is made scalable by ensuring modularity in agent creation and API-driven scaling, allowing data scientists to extend to new city/task contexts without retraining or workflow redesign (Lou, 30 Oct 2025).

5. Scalability, Modularity, and Extension Principles

Scalability is supported by a fully API-driven architecture; every agent interaction and data flow occurs via plain text or structured JSON, and agent layers can be readily replicated or extended. For new tasks, adding a new dimension (e.g., time, mobility) or scale is achieved by instantiating the same agent design with additional (d, r) pairs, reusing all existing logic and templates.

The tightly modular decomposition—distinct agent types, no cross-layer coupling, elastic scaling via parallel subagent invocation—enables:

Domain extension by swapping coordinates and context inputs;
Integration with different LLMs as new models emerge;
Simple interleaving of human-in-the-loop verification or hybrid symbolic-LLM agents as needed by emerging application requirements.

No agent or process is hard-coded to a specific urban context or prediction task, supporting robust transferability (Lou, 30 Oct 2025).

6. Quantitative Evaluation and Core Performance Insights

The Urban-MAS architecture is evaluated on multiple real-world city datasets (Tokyo, Milan, Seattle), demonstrating absolute performance advantages over single-LLM baselines:

For "running amount" prediction (MAE): Full Urban-MAS 2.97 vs. –PredictiveFactors 4.53 (52.8% increase without PF guidance)
For urban "liveliness" perception (MAE): Full 1.73 vs. –PredictiveFactors 2.54 (46.9% increase)
Removal of reliability boosting shows marginal effect (≤+0.3% MAE in "running amount")

The highest impact is attributed to the predictive factor guidance agent, confirming that domain-guided prompt engineering and modular LLM queries are essential for high precision in zero-shot, domain-specific prediction (Lou, 30 Oct 2025).

References:

"Urban-MAS: Human-Centered Urban Prediction with LLM-Based Multi-Agent System" (Lou, 30 Oct 2025)

PDF Markdown Chat (Pro)

References (1)

Urban-MAS: Human-Centered Urban Prediction with LLM-Based Multi-Agent System (2025)

Whiteboard

Generate a whiteboard explanation of this topic.

Follow Topic

Get notified by email when new papers are published related to LLM-MAS Integration Architecture.