LLM-Based Multi-Agent Systems

Updated 30 October 2025

LLM-based MAS are systems where multiple autonomous agents, powered by LLMs, coordinate to solve tasks beyond a single model's capacity.
They leverage diverse architectures and specialized roles to enhance reasoning, planning, and decision-making across complex workflows.
Robustness and scalability are achieved through modular designs, adaptive communication, and rigorous evaluation frameworks.

LLM-based Multi-Agent Systems (LLM-MAS) are systems in which multiple autonomous agents—each controlled or augmented by LLMs—cooperate, coordinate, or compete to solve complex tasks that exceed the capability or reliability of any single model. These systems form the basis of scalable AI architectures for reasoning, planning, decision-making, software generation, formalization, and collaboration across diverse domains.

1. Fundamental Architectures and Design Principles

Topologies and Communication Patterns

LLM-MAS rely on explicit or dynamically constructed collaboration structures. Conventional approaches use fixed, handcrafted topologies—such as centralized (hub-and-spoke), decentralized (peer-to-peer), hierarchical (layered), or chain/graph-based workflows—to define information flow, agent roles, and aggregation strategies (Leong et al., 2 Oct 2025, &&&1&&&). Dynamic architectures have emerged, including:

Adaptive Graphs: Dynamic graph designers (AMAS, DynaSwarm) select optimal agent communication topologies for each input or task instance using parameter-efficient adaptation (LoRA) of LLM backbones (Leong et al., 2 Oct 2025, Leong et al., 31 Jul 2025).
Blackboard Architectures: All agent communications are mediated through a global shared blackboard, enabling full transparency and dynamic, context-driven agent orchestration (LbMAS) (Han et al., 2 Jul 2025).

Heterogeneity and Role Specialization

While early LLM-MAS systems assigned all agents the same LLM, recent frameworks deploy heterogeneous LLMs—assigning different models to roles based on domain or functional specialization (chatbot, reasoner, planner, etc.). This approach, exemplified by X-MAS, leverages LLM diversity and yields measurable improvements (up to +47% on AIME-2024) over homogeneous MAS, without structural redesigns (Ye et al., 22 May 2025).

Modular and Extensible Design

Frameworks emphasize plug-and-play extensibility through abstract base classes for agents, LLMs, knowledge bases, retrievers, and tools (e.g., MASA’s architecture for autoformalization (Zhang et al., 10 Oct 2025)). Composability and shared libraries facilitate fast adaptation to evolving tasks, new agent types, or external symbolic resources (e.g., theorem provers, code sandboxing).

2. Coordination, Workflow, and Optimization

Coordination Patterns

LLM-MAS coordination can be instantiated via direct engineering (chains, trees, pipelines), indirectly through process models, or learned optimization. Distinct paradigms include:

Classical Process Modeling: Software engineering-inspired patterns (Waterfall, V-Model, Agile) map to agent handoff, validation, and iteration, each entailing trade-offs between speed, robustness, and code/output quality (Ha et al., 17 Sep 2025).
Iterative and Refined Optimization: OMAC provides a theoretically grounded framework for optimizing both agent functionality and collaboration structure across five dimensions, using LLM-based semantic exploration and contrastive evaluation loops to synthesize higher-performing and more coherent MAS (Li et al., 17 May 2025).
Parallelized and Interleaved Execution: High-performance real-time systems deploy dual-thread architectures for planning and acting with interruptibility, central memory, and skill libraries, reducing latency and supporting coordinated adaptation to dynamic environments (e.g., in Minecraft) (Li et al., 5 Mar 2025).

Experience and Continual Learning

Experience replay and cross-task memory are introduced by MAEL, where agents accumulate and retrieve rewarded task step traces as few-shot exemplars. Step-wise or task-wise retrieval improves convergence and output quality, especially on structurally recurring or long-horizon tasks (Li et al., 29 May 2025).

3. Robustness, Safety, and Security

Chaos Engineering for Robustness

LLM-MAS are subject to unique vulnerabilities such as hallucination propagation, cascading agent failures, and emergent communication breakdowns. Application of chaos engineering systematically injects faults—agent dropouts, message corruption, hallucination triggers, resource contention—into sandboxed environments to uncover and subsequently mitigate weak points in agent design, inter-agent protocols, or overall system observability (Owotogbe, 6 May 2025).

Evaluation Metrics:

$\text{ErrorRate} = \frac{\text{Number of faulty outputs}}{\text{Total outputs}}$

$\text{FDR} = \frac{\text{Detected faults}}{\text{Injected faults}}$

Security Threat Modeling and Defense

LLM-MAS are vulnerable to novel intention-hiding attacks, including suboptimal fixation, reframing misalignment, fake information injection, and execution delays (Xie et al., 7 Jul 2025). These attacks can degrade collaborative outcomes without obvious misbehavior. Psychological profiling (e.g., HEXACO-based AgentXposed) and graph-topological anomaly detection (G-Safeguard via GNNs) provide robust defense by monitoring agent behavioral deviations and interrupting the spread of malicious content (Xie et al., 7 Jul 2025, Wang et al., 16 Feb 2025).

Evaluation Platforms and Behavioral Analysis

To benchmark LLM-MAS beyond synthetic tasks, open platforms such as WiS use structured games ("Who is Spy?") to evaluate reasoning, deception, defense, and collaboration, with dynamic leaderboards and granular, behavior-tracking analytics (Hu et al., 2024).

4. Learning, Adaptation, and Scalability

Reinforcement and Preference Optimization

Efficient communication and effectiveness are jointly optimized in frameworks such as Optima, which uses an iterative generate-rank-select-train cycle, with multi-objective reward functions that penalize verbosity and encourage performance, interpretability, and token efficiency (Chen et al., 2024). Techniques involve supervised fine-tuning, direct preference optimization, and MCTS-inspired sampling for paired preference mining in tree-structured MAS dialogs.

Reward function:

$R(\tau) = R_\text{task}(\tau) - \lambda_\text{token} R_\text{token}(\tau) + \lambda_\text{loss} \frac{1}{R_\text{loss}(\tau)}$

Group-Based RL for MAS Optimization

MHGPO introduces critic-free, group-based MARL for LLM-MAS, using relative rewards within agent rollout groups to robustly and scalably estimate policy gradients. Sampling strategies (independent, fork-on-first, round-robin) trade off sample diversity and efficiency, consistently outperforming standard MAPPO in task performance and resource use across multi-hop QA and search tasks (Chen et al., 3 Jun 2025).

Group advantage:

$\hat{A}_{k,i} = \frac{R_{k,i} - \mathrm{mean}(\{R_{l,j} \mid m_{l,j} = m_{k,i}\})}{\mathrm{std}(\{R_{l,j} \mid m_{l,j} = m_{k,i}\})}$

Hybrid MARL-alignment (MAGRPO) and Dec-POMDP formalizations further address the challenge of learning cooperative policies in decentralized, partially observable, and language-action space environments (Liu et al., 6 Aug 2025).

5. Evaluation, Benchmarking, and Generalization

Theoretical Task Complexity and MAS Gains

A principled framework decomposes task complexity into depth (sequential reasoning steps) and width (required parallel capabilities), showing that LLM-MAS gain most over single-agent LLMs (LLM-SAS) on tasks of high depth. The benefit from collaborating agents increases with both depth and width, but the effect is more pronounced with depth, due to error correction and diversification (Tang et al., 5 Oct 2025).

Success Rate (MAS vs. SAS):

$S_{\mathrm{multi}}(d, w, N, r) = r \left[ 1 - (1 - s(w))^N \right]^d$

$S_{\mathrm{single}}(d, w) = [s(w)]^d$

$\Delta(d, w, N, r) = \frac{S_{\mathrm{multi}} - S_{\mathrm{single}}}{S_{\mathrm{single}}}$

Unified Codebases and Standardized Evaluation

MASLab provides a rigorously validated, open-source codebase with standardized implementations, benchmarks, and evaluation protocols for 20+ MAS methods—removing confounds from data wrangling, parameterization, or brittle rule-based grading. This enables meaningful, transparent comparison and advances reproducibility (Ye et al., 22 May 2025).

Generative MAS Construction

Rather than manual MAS design or prompt engineering, generative paradigms (MAS-GPT) cast MAS construction as a language modeling task—producing executable code for query-adaptive MAS generation in one shot, reducing both development effort and inference cost, with consistent, robust out-of-domain performance (Ye et al., 5 Mar 2025).

6. Creative and Specialized Applications

Creative and Open-ended Generation

LLM-MAS are extensively used for creative tasks (writing, art, ideation), structured using explicit agent personas, divergent exploration, iterative refinement, and collaborative synthesis to maximize output novelty, diversity, and coherence. Unique coordination and persona modeling as well as creative evaluation metrics are required; challenges include standardization, conflict resolution, and bias mitigation (Lin et al., 27 May 2025).

Specialized Domains: Software Engineering and Mathematics

Frameworks such as MASA (for mathematical autoformalization) orchestrate LLM agents for generation, hard/soft critique, refinement, and tool integration (theorem provers, KBs), demonstrating robust modularity, interpretable workflows, and significant gains in syntactic/semantic correctness on formal mathematics tasks (Zhang et al., 10 Oct 2025). Process model alignment in code generation and complex software projects (MetaGPT with Waterfall, V-Model, Agile) exposes trade-offs between artifacts, cost, and quality (Ha et al., 17 Sep 2025).

LLM-based Multi-Agent Systems have undergone rapid evolution, moving from static topologies and heuristic coordination to adaptive, optimized, and robust architectures that leverage both the diversity of LLM backbones and dynamic, context-sensitive collaboration. State-of-the-art research now focuses on scaling to diverse, complex tasks; enabling unified, extensible platforms; optimizing for robustness and security; and exploring principled frameworks for understanding when and how multi-agent configurations provide decisive performance gains over single-agent approaches.

Markdown Upgrade to Chat

References (20)

AMAS: Adaptively Determining Communication Topology for LLM-based Multi-Agent System (2025)

DynaSwarm: Dynamically Graph Structure Selection for LLM-based Multi-agent System (2025)

Exploring Advanced LLM Multi-Agent Systems Based on Blackboard Architecture (2025)

X-MAS: Towards Building Multi-Agent Systems with Heterogeneous LLMs (2025)

MASA: LLM-Driven Multi-Agent Systems for Autoformalization (2025)

Evaluating Classical Software Process Models as Coordination Mechanisms for LLM-Based Software Generation (2025)

OMAC: A Broad Optimization Framework for LLM-Based Multi-Agent Collaboration (2025)

Parallelized Planning-Acting for Efficient LLM-based Multi-Agent Systems (2025)

Cross-Task Experiential Learning on LLM-based Multi-Agent Collaboration (2025)

10.

Assessing and Enhancing the Robustness of LLM-based Multi-Agent Systems Through Chaos Engineering (2025)

11.

Who's the Mole? Modeling and Detecting Intention-Hiding Malicious Agents in LLM-Based Multi-Agent Systems (2025)

12.

G-Safeguard: A Topology-Guided Security Lens and Treatment on LLM-based Multi-agent Systems (2025)

13.

WiS Platform: Enhancing Evaluation of LLM-Based Multi-Agent Systems Through Game-Based Analysis (2024)

14.

Optima: Optimizing Effectiveness and Efficiency for LLM-Based Multi-Agent System (2024)

15.

Heterogeneous Group-Based Reinforcement Learning for LLM-based Multi-Agent Systems (2025)

16.

LLM Collaboration With Multi-Agent Reinforcement Learning (2025)

17.

On the Importance of Task Complexity in Evaluating LLM-Based Multi-Agent Systems (2025)

18.

MASLab: A Unified and Comprehensive Codebase for LLM-based Multi-Agent Systems (2025)

19.

MAS-GPT: Training LLMs to Build LLM-based Multi-Agent Systems (2025)

20.

Creativity in LLM-based Multi-Agent Systems: A Survey (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to LLM-Based MAS.

LLM-Based Multi-Agent Systems

1. Fundamental Architectures and Design Principles

Topologies and Communication Patterns

Heterogeneity and Role Specialization

Modular and Extensible Design

2. Coordination, Workflow, and Optimization

Coordination Patterns

Experience and Continual Learning

3. Robustness, Safety, and Security

Chaos Engineering for Robustness

Security Threat Modeling and Defense

Evaluation Platforms and Behavioral Analysis

4. Learning, Adaptation, and Scalability

Reinforcement and Preference Optimization

Group-Based RL for MAS Optimization

5. Evaluation, Benchmarking, and Generalization

Theoretical Task Complexity and MAS Gains

Unified Codebases and Standardized Evaluation

Generative MAS Construction

6. Creative and Specialized Applications

Creative and Open-ended Generation

Specialized Domains: Software Engineering and Mathematics

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research

LLM-Based Multi-Agent Systems

1. Fundamental Architectures and Design Principles

Topologies and Communication Patterns

Heterogeneity and Role Specialization

Modular and Extensible Design

2. Coordination, Workflow, and Optimization

Coordination Patterns

Experience and Continual Learning

3. Robustness, Safety, and Security

Chaos Engineering for Robustness

Security Threat Modeling and Defense

Evaluation Platforms and Behavioral Analysis

4. Learning, Adaptation, and Scalability

Reinforcement and Preference Optimization

Group-Based RL for MAS Optimization

5. Evaluation, Benchmarking, and Generalization

Theoretical Task Complexity and MAS Gains

Unified Codebases and Standardized Evaluation

Generative MAS Construction

6. Creative and Specialized Applications

Creative and Open-ended Generation

Specialized Domains: Software Engineering and Mathematics

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research