MACRO-LLM: Decentralized Multi-Agent Coordination
- MACRO-LLM is a decentralized multi-agent system that integrates LLM reasoning with statistical aggregation to overcome spatiotemporal partial observability.
- It employs modular components—CoProposer, Negotiator, and Introspector—to generate, refine, and adapt action proposals in dynamic, real-world settings.
- Empirical results in tasks like adaptive cruise control and pandemic control demonstrate significant performance improvements over traditional MARL and LLM-based methods.
A MACRO-LLM is a LLM-empowered, multi-agent system specifically designed for collaborative reasoning and coordination in environments characterized by spatiotemporal partial observability. This paradigm addresses the challenges of distributed awareness, dynamic local context, and finite memory/temporal windows, which fundamentally constrain agent collaboration in real-world decentralized scenarios. MACRO-LLM systems systematically decompose the complex information bottlenecks along spatial and temporal axes, combining LLM reasoning with statistical strategies to achieve robust and adaptive coordination (Chen et al., 14 Jan 2026).
1. Spatiotemporal Partial Observability: Formulation and Challenges
Spatiotemporal partial observability refers to the dual limitations faced by distributed agents:
- Spatial Partial Observability: Each agent can access only local observations %%%%1%%%%, which represent a proper subset of the full global state . Communication occurs exclusively with local neighbors in the underlying connectivity graph . No agent has direct access to the complete global state at any time, leading to myopia regarding broader system dynamics.
- Temporal Partial Observability: Agents operate with bounded temporal windows, imposed by finite-context LLM prompt lengths and inherently uncertain long-term dynamics . Predictive capacity is fundamentally limited for horizons extending beyond this window, resulting in uncertainty regarding distant outcomes.
These constraints produce coordination bottlenecks distinctive from traditional fully observable settings. In multi-agent reinforcement learning (MARL), centralized critic architectures or global communication are often assumed; in contrast, MACRO-LLM systems are explicitly decentralized, enhancing scalability and accommodating real-world deployment limitations (Chen et al., 14 Jan 2026).
2. Architectural Overview and Modular Decomposition
The MACRO-LLM framework is instantiated as a collection of LLM-driven agents operating on a graph without a central aggregator. Each agent cyclically executes three core modules:
- CoProposer: Generates, verifies, and scores candidate action proposals using both temporal and spatial strategy LLMs, coupled with predictive rollout simulation over a fixed lookahead horizon .
- Negotiator: Aggregates neighboring proposals using mean-field statistical summaries (e.g., Welford updates for mean/variance), computes semantic distances, and refines or regenerates its own proposal through weighted averaging and short rollout verification.
- Introspector: Performs continual self-adaptation by computing a “semantic gradient” in vectorized plan space, updating internal policy parameters via a textual analog of gradient descent based on recent outcome trajectories and negotiation logs.
This modular structure explicitly separates prediction/verification (temporal reasoning), aggregation (spatial reasoning), and adaptation (experience-driven optimization) (Chen et al., 14 Jan 2026).
3. Core Algorithms and Mathematical Formulation
3.1. CoProposer: Predictive Rollout Verification
Given observation and current strategy , CoProposer generates candidate proposals for self and neighbors, simulates their rollout over steps:
Each step is evaluated with a scoring function:
Safety constraints are enforced exactly at and relaxed for . The optimization loop seeks that maximizes while satisfying all constraints.
3.2. Negotiator: Mean-Field Statistical Aggregation
For each agent, neighbor statistics are exchanged; agent updates its belief via one-pass mean and variance updates:
Semantic distances trigger confidence-weighted averaging. If semantic consensus is not achieved, the regenerated proposal is produced using a softmax weighting of distances and variances, followed by rollout verification.
3.3. Introspector: Semantic Gradient Descent
Upon execution, Introspector calculates the reflection learning rate:
A semantic embedding maps plan text to ; gradients are computed over plan embeddings:
Semantic gradients are instantiated as LLM-generated textual corrections, closing the loop for continual refinement.
4. Experimental Protocols, Metrics, and Empirical Results
MACRO-LLM is validated on two high-dimensional, long-horizon coordination tasks:
- Cooperative Adaptive Cruise Control (CACC) — vehicles on a single-lane highway, steps. Local state: headway and velocity . Action: m/s². Reward penalizes deviation from target spacing and velocity: .
- Pandemic Control (PC) — 7 facility-type agents per city, days, state: infection counts. Actions: restriction policy . Reward: .
Evaluation against deep MARL baselines (DPPO, DMPO, IC3Net) and LLM-based agents (ToM-Belief, ChatEval, LAMEN):
Table 1: CACC RMSE and SD (Lower is better)
| Method | RMSE-H | RMSE-V | SD-H | SD-V |
|---|---|---|---|---|
| MACRO-LLM | 1.212/0.561 | 0.897/3.325 | 0.455/0.209 | 0.283/0.345 |
| DPPO | 1.603/1.720 | 0.978/3.121 | 0.994/1.410 | 0.474/1.176 |
Table 2: PC (Three Cities) — Infection/Death/Policy Deviation
| Method | Iₙ | PIₙ | Dₙ | PD |
|---|---|---|---|---|
| MACRO-LLM | .010 | .008 | .000 | 9 |
| DMPO | .010 | .008 | .001 | 13 |
Statistical analysis (two-tailed paired -test, ) confirms MACRO-LLM outperforms all best MARL and LLM-MAS baselines for all major metrics (Chen et al., 14 Jan 2026).
5. Comparison with Related Multi-Agent and Macro-Level LLM Architectures
MACRO-LLM diverges from hierarchical, centrally orchestrated architectures by employing per-agent statistical summary exchanges and semantic plan negotiation, contrasting with systems that utilize global memory or explicit leader agents. Related paradigms, such as LLM-driven MAC protocol emergence via reinforcement learning (LLM4MAC), utilize LLMs under partial observability but focus on language-tokenized POMG for specific signal coordination, with distinct use of structured identity embeddings and semantic tokenization to facilitate zero-shot generalization (Tan et al., 11 Mar 2025). In market research domains, multi-agent LLM systems, exemplified by MaRGen, use agent roles (Researcher, Writer, Reviewer, Retriever) with message-passing but do not explicitly address spatiotemporal observational bottlenecks (Koshkin et al., 2 Aug 2025).
A plausible implication is that the modular, statistical-aggregation-centric design of MACRO-LLM offers superior scaling and adaptability in settings with fragmented observations, where traditional MARL or task-specific LLM orchestration strategies may be insufficient.
6. Impact, Interpretation, and Limitations
MACRO-LLM achieves:
- Robust mitigation of information bottlenecks arising from fragmented local and temporal information through systematic modularization.
- Empirical superiority over state-of-the-art MARL and LLM-based baselines in both control (CACC) and policy (PC) coordination across varying topologies and horizons.
- Zero-shot generalization capability to new network graphs/topologies without downstream retraining; per-agent communication cost.
However, interpretability of LLM-generated proposals and semantic gradients remains limited, especially as the number of agents and scenario complexity scale. There is a reliance on the quality of LLM semantic embeddings and rollout simulation fidelity. Scaling to even larger or fully open environments may introduce further challenges in maintaining coherent semantic aggregation and adaptation. Continuous research is directed toward improving the interpretability of emergent strategies, optimizing communication sparsity, and integrating hybrid statistical-symbolic protocols (Chen et al., 14 Jan 2026, Tan et al., 11 Mar 2025, Koshkin et al., 2 Aug 2025).
7. Future Directions and Generalizations
Potential avenues include:
- Extending semantic gradient introspection with probing or explicit symbolic extraction modules to enhance interpretability.
- Hierarchical or federated deployment of MACRO-LLM agents to accommodate multi-level or heterogeneously structured environments.
- Integration with retrieval-augmented or prompt-structured macroeconomic scenario pipelines for policy or autonomy tasks requiring both open-domain reasoning and precise adherence to operational or regulatory constraints (Soleimani, 26 Nov 2025).
- Exploration of principled convergence and stability guarantees in dynamic, heterogeneous networks, particularly under adversarial or unmodeled regime shifts.
MACRO-LLM thus defines a new direction for LLM-based multi-agent systems, characterized by decentralized, modular, and semantically-driven collaborative reasoning under stringent information constraints (Chen et al., 14 Jan 2026).