Papers
Topics
Authors
Recent
Search
2000 character limit reached

MAS²: Self-Generative Multi-Agent Systems

Updated 15 April 2026
  • MAS² is a recursive meta-multi-agent framework that autonomously designs, configures, and rectifies agent collectives to address diverse tasks.
  • It employs a tri-agent architecture—Generator, Implementer, and Rectifier—to dynamically synthesize and repair systems in real time.
  • Empirical evaluations demonstrate significant performance gains and robust cross-backbone generalization, achieving up to a 19.6% improvement on benchmarks.

MAS2^2 (“Self-Generative, Self-Configuring, Self-Rectifying Multi-Agent Systems”) is a recursive meta-multi-agent framework that autonomously designs, configures, deploys, and adaptively rectifies bespoke multi-agent systems (MAS) to address diverse tasks under real-world dynamism. Distinct from prior “generate-once-and-deploy” paradigms, MAS2^2 features a tri-agent architecture—comprising Generator, Implementer, and Rectifier meta-agents—that dynamically synthesizes and repairs agent collectives. It is trained using Collaborative Tree Optimization, yielding significant gains across a wide array of benchmarks and maintaining Pareto efficiency with respect to token cost and task performance (Wang et al., 29 Sep 2025).

1. Paradigm Shift and Conceptual Foundations

Traditional LLM-based MAS, such as AutoGen, MetaGPT, G-Designer, and MAS-GPT, rely either on hand-crafted ensembles or automated but static workflows. These systems exhibit brittleness: failures in tool invocation or resource access, pipeline collapses from single-point deviations, and lack of online repair mechanisms or cost-aware adaptation strategies. MAS2^2 addresses these challenges by introducing a recursive, meta-level orchestration paradigm: a system designed not only to instantiate MAS on demand but also to continuously adapt the agent collective at runtime.

MAS2^2 operationalizes three nested procedural loops:

  • The Generator designs a high-level template—defining roles, communication protocols, and tool requisites.
  • The Implementer configures the template with concrete LLM backbones and tool bindings.
  • The Rectifier monitors execution, detects performance or cost anomalies, and iteratively refines the live MAS instance.

This recursion enables MAS2^2 to self-configure and self-rectify in situ, facilitating robust operation under unpredictable task, resource, and tooling constraints.

2. Tri-Agent Architecture and Dynamic Coordination

MAS2^2 formalizes each generated MAS as M=R,P,T,B\mathcal{M} = \langle \mathcal{R}, \mathcal{P}, \mathcal{T}, \mathcal{B} \rangle, where:

  • R\mathcal{R}: Agent roles.
  • P\mathcal{P}: Messaging and communication schema among roles.
  • T\mathcal{T}: Available toolset (e.g., REPL, web search).
  • 2^20: Assignments mapping roles to specific LLM backbones.

Each meta-agent is governed by a stochastic policy:

  • Generator 2^21 samples MAS templates 2^22 given query 2^23.
  • Implementer 2^24 assigns roles to LLMs: 2^25, with LLM pool 2^26.
  • Rectifier 2^27 is activated by a trigger 2^28 (e.g., when cost exceeds threshold or failure is detected), producing a revised collective 2^29.

Inference pseudocode:

2^29

At execution, the Rectifier may dynamically reroute communications, alter prompt contents, or swap agent backbones, yielding a self-modifying operational MAS.

3. Collaborative Tree Optimization: Meta-Agent Training

MAS2^20 employs Collaborative Tree Optimization (CTO) to jointly train the policies 2^21, 2^22, and 2^23. For each query 2^24, a decision tree 2^25 is constructed:

  • Node labels specify the meta-agent acting at each stage.
  • Paths from root-to-leaf trace a complete generation–instantiation–rectification trajectory 2^26 for a MAS instance.

Cost-sensitive reward:

2^27

where 2^28 denotes resource usage (e.g., token count or wall time), and failed runs receive zero reward.

Each decision node 2^29 receives a value

2^20

Training objective: For each node, action pairs with 2^21 are used to build a preference dataset, optimizing a value-scaled PPO-style loss: 2^22 High-value trajectory discriminations thus dominate gradient signals during learning.

4. Empirical Performance, Cost Analysis, and Pareto Efficiency

MAS2^23 is evaluated on seven benchmarks spanning four domains: multi-hop QA (HotpotQA, Bamboogle, NQ), deep research (BrowseComp⁺), code generation (HumanEval, MBPP), and mathematics (MATH). The LLM pool comprises GPT-4o, GPT-4o-mini, Qwen2.5-72B, Qwen3-14B, and QwQ-32B; meta-agents default to Qwen3-8B.

Main results: MAS2^24 achieves average gains of up to 2^25 in performance (2^26) compared to SOTA baselines, with specific gains such as:

  • HotpotQA: 89.3% (+23.8)
  • Bamboogle: 67.2% (+31.0)
  • NQ: 79.1% (+15.5)
  • BrowseComp⁺: 19.7% (+10.2)
  • HumanEval: 97.0% (+19.6)
  • MBPP: 85.1% (+21.4)
  • MATH: 71.3% (+13.3)

MAS2^27 consistently resides on the empirical Pareto frontier of performance versus token cost, such as on Bamboogle (2^28 at 2^290.152^2064.8\%2^21p<0.01versusthesecondbestMASacrossperexampleaccuracyvectors.</p><p><strong><ahref="https://www.emergentmind.com/topics/ablationstudies"title=""rel="nofollow"dataturbo="false"class="assistantlink"xdataxtooltip.raw="">Ablationstudies</a></strong>confirmthenecessityofeachmetaagent:removalofGenerator,Implementer,orRectifieryieldsperformancedropsonMBPP,HotpotQA,andMATH(e.g.,MBPPdropsfrom85.2to79.0,80.4,or81.7,respectively)(<ahref="/papers/2509.24323"title=""rel="nofollow"dataturbo="false"class="assistantlink"xdataxtooltip.raw="">Wangetal.,29Sep2025</a>).</p><h2class=paperheadingid=crossbackbonegeneralizationcapability>5.CrossBackboneGeneralizationCapability</h2><p>MAS versus the second-best MAS across per-example accuracy vectors.</p> <p><strong><a href="https://www.emergentmind.com/topics/ablation-studies" title="" rel="nofollow" data-turbo="false" class="assistant-link" x-data x-tooltip.raw="">Ablation studies</a></strong> confirm the necessity of each meta-agent: removal of Generator, Implementer, or Rectifier yields performance drops on MBPP, HotpotQA, and MATH (e.g., MBPP drops from 85.2 to 79.0, 80.4, or 81.7, respectively) (<a href="/papers/2509.24323" title="" rel="nofollow" data-turbo="false" class="assistant-link" x-data x-tooltip.raw="">Wang et al., 29 Sep 2025</a>).</p> <h2 class='paper-heading' id='cross-backbone-generalization-capability'>5. Cross-Backbone Generalization Capability</h2> <p>MAS^2$2 demonstrates robust generalization to unseen LLM backbones. While trained on pool $^2$3, inference augments the pool with previously unencountered models (Qwen3-Coder, GPT-5-Mini, Gemini-2.5-Pro). The Implementer policy, without further fine-tuning, can select these novel backbones when beneficial.

Formally, for $^2$4 the original implemented mapping, the inference stage solves: $^2$5 where $^2$6.

Empirical results show up to a $^2$7 lift in accuracy on MATH (71.3%→90.6%) and Bamboogle (67.2%→84.0%) at moderate additional cost, illustrating a high degree of zero-shot backbone integration.

6. Scalability, Limitations, and Potential Extensions

The decoupled architecture allows each meta-agent to scale independently to larger pools and more complex agent collectives. CTO reuses offline-fabricated trajectory data, reducing demands for costly online RL interaction.

Identified limitations:

  • CTO training remains compute-intensive on complex orchestration tasks.
  • Performance is ultimately bounded by the underlying LLMs’ capacities. Error rectification is only possible to the degree that failure modes are describable in natural language.
  • Current Rectifier mechanisms focus on cost and failure criteria; subtler correctness shifts are not directly detected.

Proposed extensions:

  • Hierarchical Rectification with multi-level monitoring (e.g., at the subgraph or token level).
  • Continuous adaptation using test-time gradient updates.
  • Memory-augmented meta-agents to enable persistent cross-task knowledge transfer.
  • Automated tool discovery, allowing the Implementer to create or integrate novel code modules or retrieval systems.

MAS$^2$8 thus inaugurates a shift toward meta-MAS research in which multi-agent collectives are not static but self-evolving and autonomously modifiable (Wang et al., 29 Sep 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to MAS$^2$.