MAS²: Self-Generative Multi-Agent Systems

Updated 15 April 2026

MAS² is a recursive meta-multi-agent framework that autonomously designs, configures, and rectifies agent collectives to address diverse tasks.
It employs a tri-agent architecture—Generator, Implementer, and Rectifier—to dynamically synthesize and repair systems in real time.
Empirical evaluations demonstrate significant performance gains and robust cross-backbone generalization, achieving up to a 19.6% improvement on benchmarks.

MAS $^2$ (“Self-Generative, Self-Configuring, Self-Rectifying Multi-Agent Systems”) is a recursive meta-multi-agent framework that autonomously designs, configures, deploys, and adaptively rectifies bespoke multi-agent systems (MAS) to address diverse tasks under real-world dynamism. Distinct from prior “generate-once-and-deploy” paradigms, MAS $^2$ features a tri-agent architecture—comprising Generator, Implementer, and Rectifier meta-agents—that dynamically synthesizes and repairs agent collectives. It is trained using Collaborative Tree Optimization, yielding significant gains across a wide array of benchmarks and maintaining Pareto efficiency with respect to token cost and task performance (Wang et al., 29 Sep 2025).

1. Paradigm Shift and Conceptual Foundations

Traditional LLM-based MAS, such as AutoGen, MetaGPT, G-Designer, and MAS-GPT, rely either on hand-crafted ensembles or automated but static workflows. These systems exhibit brittleness: failures in tool invocation or resource access, pipeline collapses from single-point deviations, and lack of online repair mechanisms or cost-aware adaptation strategies. MAS $^2$ addresses these challenges by introducing a recursive, meta-level orchestration paradigm: a system designed not only to instantiate MAS on demand but also to continuously adapt the agent collective at runtime.

MAS $^2$ operationalizes three nested procedural loops:

The Generator designs a high-level template—defining roles, communication protocols, and tool requisites.
The Implementer configures the template with concrete LLM backbones and tool bindings.
The Rectifier monitors execution, detects performance or cost anomalies, and iteratively refines the live MAS instance.

This recursion enables MAS $^2$ to self-configure and self-rectify in situ, facilitating robust operation under unpredictable task, resource, and tooling constraints.

2. Tri-Agent Architecture and Dynamic Coordination

MAS $^2$ formalizes each generated MAS as $\mathcal{M} = \langle \mathcal{R}, \mathcal{P}, \mathcal{T}, \mathcal{B} \rangle$ , where:

$\mathcal{R}$ : Agent roles.
$\mathcal{P}$ : Messaging and communication schema among roles.
$\mathcal{T}$ : Available toolset (e.g., REPL, web search).
$^2$ 0: Assignments mapping roles to specific LLM backbones.

Each meta-agent is governed by a stochastic policy:

Generator $^2$ 1 samples MAS templates $^2$ 2 given query $^2$ 3.
Implementer $^2$ 4 assigns roles to LLMs: $^2$ 5, with LLM pool $^2$ 6.
Rectifier $^2$ 7 is activated by a trigger $^2$ 8 (e.g., when cost exceeds threshold or failure is detected), producing a revised collective $^2$ 9.

Inference pseudocode:

$^2$ 9

At execution, the Rectifier may dynamically reroute communications, alter prompt contents, or swap agent backbones, yielding a self-modifying operational MAS.

3. Collaborative Tree Optimization: Meta-Agent Training

MAS $^2$ 0 employs Collaborative Tree Optimization (CTO) to jointly train the policies $^2$ 1, $^2$ 2, and $^2$ 3. For each query $^2$ 4, a decision tree $^2$ 5 is constructed:

Node labels specify the meta-agent acting at each stage.
Paths from root-to-leaf trace a complete generation–instantiation–rectification trajectory $^2$ 6 for a MAS instance.

Cost-sensitive reward:

$^2$ 7

where $^2$ 8 denotes resource usage (e.g., token count or wall time), and failed runs receive zero reward.

Each decision node $^2$ 9 receives a value

$^2$ 0

Training objective: For each node, action pairs with $^2$ 1 are used to build a preference dataset, optimizing a value-scaled PPO-style loss: $^2$ 2 High-value trajectory discriminations thus dominate gradient signals during learning.

4. Empirical Performance, Cost Analysis, and Pareto Efficiency

MAS $^2$ 3 is evaluated on seven benchmarks spanning four domains: multi-hop QA (HotpotQA, Bamboogle, NQ), deep research (BrowseComp⁺), code generation (HumanEval, MBPP), and mathematics (MATH). The LLM pool comprises GPT-4o, GPT-4o-mini, Qwen2.5-72B, Qwen3-14B, and QwQ-32B; meta-agents default to Qwen3-8B.

Main results: MAS $^2$ 4 achieves average gains of up to $^2$ 5 in performance ( $^2$ 6) compared to SOTA baselines, with specific gains such as:

HotpotQA: 89.3% (+23.8)
Bamboogle: 67.2% (+31.0)
NQ: 79.1% (+15.5)
BrowseComp⁺: 19.7% (+10.2)
HumanEval: 97.0% (+19.6)
MBPP: 85.1% (+21.4)
MATH: 71.3% (+13.3)

MAS $^2$ 7 consistently resides on the empirical Pareto frontier of performance versus token cost, such as on Bamboogle ( $^2$ 8 at $^2$ 90.15 $^2$ 064.8\% $^2$ 1p<0.01 $versus the second-best MAS across per-example accuracy vectors.</p> <p><strong><a href="https://www.emergentmind.com/topics/ablation-studies" title="" rel="nofollow" data-turbo="false" class="assistant-link" x-data x-tooltip.raw="">Ablation studies</a></strong> confirm the necessity of each meta-agent: removal of Generator, Implementer, or Rectifier yields performance drops on MBPP, HotpotQA, and MATH (e.g., MBPP drops from 85.2 to 79.0, 80.4, or 81.7, respectively) (<a href="/papers/2509.24323" title="" rel="nofollow" data-turbo="false" class="assistant-link" x-data x-tooltip.raw="">Wang et al., 29 Sep 2025</a>).</p> <h2 class='paper-heading' id='cross-backbone-generalization-capability'>5. Cross-Backbone Generalization Capability</h2> <p>MAS$ ^2$2 demonstrates robust generalization to unseen LLM backbones. While trained on pool $^2$3, inference augments the pool with previously unencountered models (Qwen3-Coder, GPT-5-Mini, Gemini-2.5-Pro). The Implementer policy, without further fine-tuning, can select these novel backbones when beneficial.

Formally, for $^2$4 the original implemented mapping, the inference stage solves: $^2$5 where $^2$6.

Empirical results show up to a $^2$7 lift in accuracy on MATH (71.3%→90.6%) and Bamboogle (67.2%→84.0%) at moderate additional cost, illustrating a high degree of zero-shot backbone integration.

6. Scalability, Limitations, and Potential Extensions

The decoupled architecture allows each meta-agent to scale independently to larger pools and more complex agent collectives. CTO reuses offline-fabricated trajectory data, reducing demands for costly online RL interaction.

Identified limitations:

CTO training remains compute-intensive on complex orchestration tasks.
Performance is ultimately bounded by the underlying LLMs’ capacities. Error rectification is only possible to the degree that failure modes are describable in natural language.
Current Rectifier mechanisms focus on cost and failure criteria; subtler correctness shifts are not directly detected.

Proposed extensions:

Hierarchical Rectification with multi-level monitoring (e.g., at the subgraph or token level).
Continuous adaptation using test-time gradient updates.
Memory-augmented meta-agents to enable persistent cross-task knowledge transfer.
Automated tool discovery, allowing the Implementer to create or integrate novel code modules or retrieval systems.

MAS$^2$8 thus inaugurates a shift toward meta-MAS research in which multi-agent collectives are not static but self-evolving and autonomously modifiable (Wang et al., 29 Sep 2025).

Markdown Report Issue Upgrade to Chat

References (1)

MAS$^2$: Self-Generative, Self-Configuring, Self-Rectifying Multi-Agent Systems (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to MAS$^2$.

MAS²: Self-Generative Multi-Agent Systems

1. Paradigm Shift and Conceptual Foundations

2. Tri-Agent Architecture and Dynamic Coordination

3. Collaborative Tree Optimization: Meta-Agent Training

4. Empirical Performance, Cost Analysis, and Pareto Efficiency

6. Scalability, Limitations, and Potential Extensions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

MAS²: Self-Generative Multi-Agent Systems

1. Paradigm Shift and Conceptual Foundations

2. Tri-Agent Architecture and Dynamic Coordination

3. Collaborative Tree Optimization: Meta-Agent Training

4. Empirical Performance, Cost Analysis, and Pareto Efficiency

6. Scalability, Limitations, and Potential Extensions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research