Papers
Topics
Authors
Recent
2000 character limit reached

MACRO-LLM: Decentralized Multi-Agent Coordination

Updated 21 January 2026
  • MACRO-LLM is a decentralized multi-agent system that integrates LLM reasoning with statistical aggregation to overcome spatiotemporal partial observability.
  • It employs modular components—CoProposer, Negotiator, and Introspector—to generate, refine, and adapt action proposals in dynamic, real-world settings.
  • Empirical results in tasks like adaptive cruise control and pandemic control demonstrate significant performance improvements over traditional MARL and LLM-based methods.

A MACRO-LLM is a LLM-empowered, multi-agent system specifically designed for collaborative reasoning and coordination in environments characterized by spatiotemporal partial observability. This paradigm addresses the challenges of distributed awareness, dynamic local context, and finite memory/temporal windows, which fundamentally constrain agent collaboration in real-world decentralized scenarios. MACRO-LLM systems systematically decompose the complex information bottlenecks along spatial and temporal axes, combining LLM reasoning with statistical strategies to achieve robust and adaptive coordination (Chen et al., 14 Jan 2026).

1. Spatiotemporal Partial Observability: Formulation and Challenges

Spatiotemporal partial observability refers to the dual limitations faced by distributed agents:

  • Spatial Partial Observability: Each agent nn can access only local observations %%%%1%%%%, which represent a proper subset of the full global state sts^t. Communication occurs exclusively with local neighbors NnN_n in the underlying connectivity graph G=(V,E)G=(V,E). No agent has direct access to the complete global state at any time, leading to myopia regarding broader system dynamics.
  • Temporal Partial Observability: Agents operate with bounded temporal windows, imposed by finite-context LLM prompt lengths and inherently uncertain long-term dynamics R(s,a)R(s,a). Predictive capacity is fundamentally limited for horizons extending beyond this window, resulting in uncertainty regarding distant outcomes.

These constraints produce coordination bottlenecks distinctive from traditional fully observable settings. In multi-agent reinforcement learning (MARL), centralized critic architectures or global communication are often assumed; in contrast, MACRO-LLM systems are explicitly decentralized, enhancing scalability and accommodating real-world deployment limitations (Chen et al., 14 Jan 2026).

2. Architectural Overview and Modular Decomposition

The MACRO-LLM framework is instantiated as a collection of LLM-driven agents operating on a graph GG without a central aggregator. Each agent cyclically executes three core modules:

  • CoProposer: Generates, verifies, and scores candidate action proposals using both temporal and spatial strategy LLMs, coupled with predictive rollout simulation over a fixed lookahead horizon KK.
  • Negotiator: Aggregates neighboring proposals using mean-field statistical summaries (e.g., Welford updates for mean/variance), computes semantic distances, and refines or regenerates its own proposal through weighted averaging and short rollout verification.
  • Introspector: Performs continual self-adaptation by computing a “semantic gradient” in vectorized plan space, updating internal policy parameters via a textual analog of gradient descent based on recent outcome trajectories and negotiation logs.

This modular structure explicitly separates prediction/verification (temporal reasoning), aggregation (spatial reasoning), and adaptation (experience-driven optimization) (Chen et al., 14 Jan 2026).

3. Core Algorithms and Mathematical Formulation

3.1. CoProposer: Predictive Rollout Verification

Given observation onto_n^t and current strategy Πn,tltmp\Pi_{n,t}^{\text{ltmp}}, CoProposer generates candidate proposals PntP_n^t for self and neighbors, simulates their rollout over KK steps:

τn=R(Pnt;K)=((ont,ant,Rnt),,(ont+K,ant+K,Rnt+K))\tau_n = \mathcal{R}(P_n^t; K) = ((o_n^t, a_n^t, R_n^t), \ldots, (o_n^{t+K}, a_n^{t+K}, R_n^{t+K}))

Each step is evaluated with a scoring function:

S(Pnt)=R(τn)=i=0KγiRn(ont+i,ant+i)S(P_n^t) = \mathcal{R}(\tau_n) = \sum_{i=0}^K \gamma^i R_n(o_n^{t+i}, a_n^{t+i})

Safety constraints Cn(o,a)C_n(o,a) are enforced exactly at i=0i=0 and relaxed for i>0i>0. The optimization loop seeks PntP_n^t that maximizes SS while satisfying all constraints.

3.2. Negotiator: Mean-Field Statistical Aggregation

For each agent, neighbor statistics (μm,σm2,Wm)(\mu_m, \sigma_m^2, W_m) are exchanged; agent nn updates its belief via one-pass mean and variance updates:

μnWmμm+wnsntWm+wn σn2Wmσm2+wn(sntμm)2Wm+wn\mu_n \leftarrow \frac{W_m \mu_m + w_n s_n^t}{W_m + w_n} \ \sigma_n^2 \leftarrow \frac{W_m \sigma_m^2 + w_n (s_n^t - \mu_m)^2}{W_m + w_n}

Semantic distances d(Pnt,Pmt)d(P_n^t, P_m^t) trigger confidence-weighted averaging. If semantic consensus is not achieved, the regenerated proposal PntP_n^{t\prime} is produced using a softmax weighting of distances and variances, followed by rollout verification.

3.3. Introspector: Semantic Gradient Descent

Upon execution, Introspector calculates the reflection learning rate:

lrnt=1Γnt1ΓntΓnt1Γnt[0,2]\text{lr}_n^t = 1 - \frac{\Gamma_n^{t-1} \cdot \Gamma_n^t}{\|\Gamma_n^{t-1}\|\|\Gamma_n^t\|} \in [0,2]

A semantic embedding EE maps plan text to RD\mathbb{R}^D; gradients are computed over plan embeddings:

L(θn)=E(Πn,tltmp;θn)E(Πn,tltmp+gnt,semantic)2 θnt+1=θntηθnL(θnt),ηlrntL(\theta_n) = \|E(\Pi_{n,t}^{\text{ltmp}}; \theta_n) - E(\Pi_{n,t}^{\text{ltmp}} + g_n^{t,\text{semantic}})\|^2 \ \theta_n^{t+1} = \theta_n^t - \eta \nabla_{\theta_n} L(\theta_n^t), \quad \eta \propto \text{lr}_n^t

Semantic gradients gnt,semanticg_n^{t,\text{semantic}} are instantiated as LLM-generated textual corrections, closing the loop for continual refinement.

4. Experimental Protocols, Metrics, and Empirical Results

MACRO-LLM is validated on two high-dimensional, long-horizon coordination tasks:

  1. Cooperative Adaptive Cruise Control (CACC)N=8N=8 vehicles on a single-lane highway, T=120T=120 steps. Local state: headway hith_i^t and velocity vitv_i^t. Action: ait[3,+2]a_i^t \in [-3,+2] m/s². Reward penalizes deviation from target spacing and velocity: Rit=[(hit20)2+(vitvleader)2]R_i^t = -[(h_i^t-20)^2 + (v_i^t-v_{\text{leader}})^2].
  2. Pandemic Control (PC) — 7 facility-type agents per city, T=120T=120 days, state: infection counts. Actions: restriction policy {0,1,2}\in \{0,1,2\}. Reward: (ΔInewt/Ptotal+λpolicy_cost)-\left(\Delta I_{\text{new}}^t/P_{\text{total}} + \lambda \cdot \text{policy\_cost}\right).

Evaluation against deep MARL baselines (DPPO, DMPO, IC3Net) and LLM-based agents (ToM-Belief, ChatEval, LAMEN):

Table 1: CACC RMSE and SD (Lower is better)

Method RMSE-H RMSE-V SD-H SD-V
MACRO-LLM 1.212/0.561 0.897/3.325 0.455/0.209 0.283/0.345
DPPO 1.603/1.720 0.978/3.121 0.994/1.410 0.474/1.176

Table 2: PC (Three Cities) — Infection/Death/Policy Deviation

Method Iₙ PIₙ Dₙ PD
MACRO-LLM .010 .008 .000 9
DMPO .010 .008 .001 13

Statistical analysis (two-tailed paired tt-test, p<0.01p<0.01) confirms MACRO-LLM outperforms all best MARL and LLM-MAS baselines for all major metrics (Chen et al., 14 Jan 2026).

MACRO-LLM diverges from hierarchical, centrally orchestrated architectures by employing per-agent statistical summary exchanges and semantic plan negotiation, contrasting with systems that utilize global memory or explicit leader agents. Related paradigms, such as LLM-driven MAC protocol emergence via reinforcement learning (LLM4MAC), utilize LLMs under partial observability but focus on language-tokenized POMG for specific signal coordination, with distinct use of structured identity embeddings and semantic tokenization to facilitate zero-shot generalization (Tan et al., 11 Mar 2025). In market research domains, multi-agent LLM systems, exemplified by MaRGen, use agent roles (Researcher, Writer, Reviewer, Retriever) with message-passing but do not explicitly address spatiotemporal observational bottlenecks (Koshkin et al., 2 Aug 2025).

A plausible implication is that the modular, statistical-aggregation-centric design of MACRO-LLM offers superior scaling and adaptability in settings with fragmented observations, where traditional MARL or task-specific LLM orchestration strategies may be insufficient.

6. Impact, Interpretation, and Limitations

MACRO-LLM achieves:

  • Robust mitigation of information bottlenecks arising from fragmented local and temporal information through systematic modularization.
  • Empirical superiority over state-of-the-art MARL and LLM-based baselines in both control (CACC) and policy (PC) coordination across varying topologies and horizons.
  • Zero-shot generalization capability to new network graphs/topologies without downstream retraining; per-agent O(1)O(1) communication cost.

However, interpretability of LLM-generated proposals and semantic gradients remains limited, especially as the number of agents and scenario complexity scale. There is a reliance on the quality of LLM semantic embeddings and rollout simulation fidelity. Scaling to even larger or fully open environments may introduce further challenges in maintaining coherent semantic aggregation and adaptation. Continuous research is directed toward improving the interpretability of emergent strategies, optimizing communication sparsity, and integrating hybrid statistical-symbolic protocols (Chen et al., 14 Jan 2026, Tan et al., 11 Mar 2025, Koshkin et al., 2 Aug 2025).

7. Future Directions and Generalizations

Potential avenues include:

  • Extending semantic gradient introspection with probing or explicit symbolic extraction modules to enhance interpretability.
  • Hierarchical or federated deployment of MACRO-LLM agents to accommodate multi-level or heterogeneously structured environments.
  • Integration with retrieval-augmented or prompt-structured macroeconomic scenario pipelines for policy or autonomy tasks requiring both open-domain reasoning and precise adherence to operational or regulatory constraints (Soleimani, 26 Nov 2025).
  • Exploration of principled convergence and stability guarantees in dynamic, heterogeneous networks, particularly under adversarial or unmodeled regime shifts.

MACRO-LLM thus defines a new direction for LLM-based multi-agent systems, characterized by decentralized, modular, and semantically-driven collaborative reasoning under stringent information constraints (Chen et al., 14 Jan 2026).

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to MACRO-LLM.