Multi-Agent Collaboration Network (MacNet)

Updated 13 November 2025

MacNet is a graph-based paradigm that coordinates reasoning among LLM agents via directed acyclic graph structures.
It orchestrates agent interactions in a topologically ordered, resource-efficient manner, ensuring scalable and refined solution propagation.
It demonstrates a collaborative scaling law where optimal collective performance emerges with 16–32 agents and adapts through dynamic graph learning.

A Multi-Agent Collaboration Network (MacNet) is a graph-based paradigm for orchestrating distributed, interactive reasoning among autonomous agents—typically instantiated as LLMs—in a manner structurally and functionally analogous to neural networks. MacNets leverage explicit graph topologies (frequently directed acyclic graphs, DAGs) to coordinate communication, reflection, and refinement among agents, supporting scalable, emergent collective intelligence with predictable scaling behavior. Designed to efficiently propagate concise, refined solutions through topologically ordered interactions, this architecture enables both (i) resource-efficient collaboration among hundreds to thousands of agents and (ii) the systematic emergence of qualitatively new reasoning capabilities well before the parameter thresholds typical in monolithic neural scaling.

1. Formal Definition and Motivation

A MacNet is classically formalized as a DAG, $G=(V, E)$ , with $V = \{v_i\}$ nodes and $E \subset V \times V$ directed edges. Each node $v_i$ is assigned an assistant agent $a_i=\rho(v_i)$ and each edge $(v_i \rightarrow v_j)$ is assigned an instructor agent $a_{ij}=\rho(e)$ , where $\rho(\cdot)$ denotes an agentization procedure wrapping an LLM backbone (e.g., GPT-3.5-turbo, GPT-4) with a role-specific prompt, optional tool-use bindings, and a short-term memory buffer (Qian et al., 2024).

This construction is directly motivated by the neural scaling law, which reveals that system-level capabilities in deep learning appear abruptly after exceeding critical parameter, data, or compute thresholds. MacNet posits an analogous question: can the repeated, structured addition of collaborative LLM agents produce “collaborative emergence” in task performance, akin to emergent phenomena in neural scaling but with far fewer agent-units (Hu et al., 2024)?

2. Topological Orchestration and Interaction Protocol

MacNet execution adheres to a strict topological order determined by the DAG, namely, for every directed edge $(v_i \rightarrow v_j)$ :

Assistant $a_i$ generates or refines a solution ${T}$ .
Instructor $a_{ij}$ critiques and offers suggestions ${F}$ .
Assistant $a_i$ produces a refined version ${T'}$ .
Instructor $a_{ij}$ issues a prompt to assistant $a_j$ , who generates response ${V}$ .

The latest refined artifact ${S[v_j]}$ is the only solution propagated downstream; previous dialogue history is pruned, alleviating LLM context window bottlenecks and ensuring scalability to large agent populations.

Each edge’s local memory buffers multi-turn instruction–response exchanges, typically limited to three turns before memory clearance. Convergent nodes (with multiple in-edges) resolve $k$ upstream solutions by soliciting the assistant at that node to synthesize and critique, implementing a form of non-linear aggregation.

The general MacNet orchestration pseudocode is:

Input: DAG G=(V,E), initial prompt P0 at source nodes
1. topo_order = TopologicalSort(agents={a_i} ∪ {a_{ij}})
2. solutions S[v] = ⊥ for all v

3. for X in topo_order:
    if X is (a_i, a_{ij}):
        T = a_i.generate(S[v_i] or P0)
        F = a_{ij}.instruct(T)
        T_prime = a_i.refine(F)
        S[e] = T_prime
        
    if X is (a_{ij}, a_j):
        U = a_{ij}.generate(S[e])
        V = a_j.respond(U)
        S[v_j] = V

4. Final solution(s) reside at sink or convergent nodes

Scaling up, interactions are batched, and messages are pruned aggressively to maintain $O(1)$ per-agent context.

3. Collaborative Scaling Law and Emergence

Empirical studies establish a “collaborative scaling law” for MacNet: as the agent count $n$ increases, normalized solution quality $Q(n)$ exhibits sigmoidal (logistic) growth,

$Q(n) \approx \frac{\alpha}{1 + e^{-\beta(n - \gamma)}} + \delta,$

where $\alpha$ (amplitude), $\beta$ (growth rate), $\gamma$ (inflection), and $\delta$ (shift) are fit parameters; for instance, $\gamma \approx 18.2$ was reported for a representative mesh topology (Qian et al., 2024).

Crucially, collaborative emergence—i.e., the appearance of qualitatively superior, collective intelligence—arises for $n \approx 16$ –$32$ agents (orders of magnitude smaller than corresponding neural scale thresholds, typically $10^{18+}$ parameters [Kaplan et al., 2020]).

Beyond $n \sim 32$ , further agent addition can induce slight quality degradation (2–6%) due to meta-context drift, paralleling oversaturation effects in logistic curves.

4. Graph Topologies and the Small-World Effect

Comparative experiments across topological families demonstrate that irregular (random/Erdős–Rényi) graphs—characterized by small-world properties, reduced path length, and clustering—outperform both highly structured meshes and sparse chains. For example, “MacNet-Random” yielded a 1–3% absolute computational quality gain over MacNet-Mesh (Qian et al., 2024).

Reverse topology tests, such as flipping star-shaped graphs to enforce premature convergence, degrade performance by 4–6%. This underscores that rapid information divergence (distribution of parallel reasoning) to specialists is preferable to early convergence.

5. Applications, Empirical Performance, and Comparative Evaluation

MacNet and its extensions have been empirically validated on a range of reasoning and generation tasks:

MMLU (multiple-choice reasoning): accuracy
HumanEval (code generation): pass@1
SRDD (repository-level software development): quality score in $[0,1]$
CommonGen-Hard (commonsense generation): composite score in $[0,1]$

A direct comparison (with GPT-3.5-turbo, 50-node MacNet) yields the following representative mean scores:

Method	Composite Score
CoT	0.576
AutoGPT	0.566
GPTSwarm	0.516
AgentVerse	0.581
MacNet-Chain	0.608
MacNet-Star	0.627
MacNet-Tree	0.602
MacNet-Mesh	0.632
MacNet-Layered	0.563
MacNet-Random	0.652

MacNet-Random provided a 7% absolute improvement over AgentVerse and a 13% margin over single-agent CoT (Qian et al., 2024).

6. Adaptive and Self-Evolving MacNets

Standard MacNets operate with fixed, human-devised graph topology. Adaptive MacNets, exemplified by “Unrolled Graph Learning for Multi-Agent Collaboration” (Zhang et al., 2022), dynamically infer the collaboration adjacency matrix $A \in \mathbb{R}^{N \times N}$ via attention-weighted, per-coordinate similarity:

$D_i = [ (\theta_{i,m} - \theta_{j,m})^2 ]_{m, j}$

$\min_{a_i} \frac12 \| Q D_i a_i \|_2^2 \quad \text{s.t.}~ \| a_i \|_1 = 1, a_{ii} = 0, a_{ij} \ge 0$

Graph weights are updated by unrolled (truncated) proximal gradient steps, feed-forwarded as a learned neural module. Agents alternate between updating their outgoing edges (collaborator selection) and fusing models from neighbors via convex combination.

Self-evolving MacNets, as in EvoMAC (Hu et al., 2024), introduce closed-loop test-time optimization: after each feed-forward pass producing candidate artifacts (e.g., code), an independent testing network generates unit tests, and textual “gradient” agents analyze logs for error localization and workflow/agent-prompt rewrites. This realizes “textual backpropagation,” allowing dynamic agent addition/removal and prompt rewriting to minimize observed failures.

Empirically, EvoMAC achieves substantial improvements over prior methods on both function-level and software-level benchmarks (e.g., rSDE-Bench, HumanEval). On rSDE-Bench, EvoMAC yielded up to +34.78 percentage points improvement over the strongest single-agent baseline.

7. Practical Considerations, Limitations, and Future Directions

MacNets exhibit key practical advantages: context-efficient memory usage (only the latest solution per edge), fine-grained orchestration via topological sorting, and empirically predictable scaling behavior. However, several constraints and phenomena must be acknowledged:

Reverse Degradation: Exceeding optimal network size (beyond $n\sim32$ in typical settings) can induce performance drops due to context drift or excessive splitting of meta-context.
Workflow Design: Topology must match task—irregular, small-world graphs often outperform regular structures; premature convergence is detrimental.
Adaptivity: Learned adaptive graphs and “textual backpropagation” provide autonomy, but global convergence and optimality are not guaranteed.
Application Limits: Heavy reliance on the quality of auxiliary components (e.g., unit test generators) in self-evolving settings, and system performance may be bottlenecked by LLM latency/cost at scale.
Resource Trade-offs: While agent context scales $O(1)$ , total system throughput and communication may still tax distributed LLM backends.

A plausible implication is that MacNet frameworks offer a scalable path to orchestrating collective reasoning among LLMs, enabling resource-efficient emergence of complex capabilities. Future research may involve reward-model-driven updates, meta-prompt learning for gradient/update agents, and extension to domains beyond code (e.g., document synthesis with automated validators).

References

Scaling LLM-based Multi-Agent Collaboration (Qian et al., 2024)
Unrolled Graph Learning for Multi-Agent Collaboration (Zhang et al., 2022)
Self-Evolving Multi-Agent Collaboration Networks for Software Development (Hu et al., 2024)
Active collaboration in relative observation for Multi-agent visual SLAM based on Deep Q Network (Pei et al., 2019)
Agent-as-a-Service based on Agent Network (Zhu et al., 13 May 2025)