LLM-Based Multi-Agent Systems

Updated 15 December 2025

LLM-Based Multi-Agent Systems are computational frameworks where autonomous agents powered by large language models coordinate and collaborate to tackle complex tasks.
They implement structured agent taxonomies, dynamic communication protocols, and generation techniques such as iterative refinement and divergent exploration.
These systems are applied across domains like cybersecurity, scientific research, and economic simulation while addressing challenges in scalability, bias, and security.

LLM-Based Multi-Agent Systems (MAS) are computational frameworks in which multiple autonomous agents, each powered by a LLM, collaborate to solve complex tasks, coordinate actions, or simulate sophisticated environments. These systems are characterized by dynamic inter-agent communication, emergent division of labor, and collective problem solving, and are deployed across domains ranging from creative generation and scientific research to cybersecurity, control engineering, and economic simulation. Recent research establishes taxonomies, formal models, and practical protocols that guide the architecture, coordination, evaluation, and safety of LLM-based MAS (Lin et al., 27 May 2025, Guo et al., 2024, Yang et al., 2024).

1. Taxonomies of Agent Proactivity, Persona, and Specialization

LLM-based MAS incorporate a rich taxonomy of agent behaviors and role archetypes to support division of labor and creative workflows (Lin et al., 27 May 2025).

Proactivity Classes:
- Reactive agents remain in a wait-for-instructions mode, solely responding to external stimuli without initiative.
- Proactive agents continuously monitor contextual signals, taking initiative by proposing new tasks, suggesting changes, or alerting peers of opportunities.
- Mixed agents dynamically switch between reactive and proactive modes through policies or entropy-based thresholds.
Persona Archetypes:
- Expert (high-precision domain specialist), Creative Thinker (divergent idea generator), Critic (evaluates feasibility or logic), Coordinator (mediates allocation and conflict).
Task Allocation & Creativity:
- Coordinators allocate brainstorming to Creative Thinkers; Experts filter for correctness; Critics perform ranking or down-selection.
- Persona-driven pipelines enhance balance between novelty and feasibility in generative tasks.

Specialization and organic adaptation are observed through experience management: agents that excel at particular sub-processes see their role probability distributions updated, leading to emergent specialization and improved collective efficiency (Yang et al., 2024).

2. System Architectures and Communication Protocols

Architectural paradigms in LLM MAS include centralized, decentralized, layered, and hybrid designs, each with distinct trade-offs in scalability, robustness, and coordination overhead (Guo et al., 2024, Yang et al., 2024, Lin et al., 27 May 2025).

Architecture	Description	Strength	Limitation
Centralized	Orchestrator-agent manages state	Simplicity	Single point of failure
Decentralized/Peer-Peer	Agents direct communication	Robustness	Consensus complexity
Layered/Hierarchical	Multi-level routing/validation	Validation	Latency, complexity
Blackboard	Public message board, dynamic selection	Context sharing	Scaling with roles

Message Passing: Synchronous (RPC) and asynchronous (queue-based) methods support flexible coordination. Common message schemas include sender_id, receiver_id, type, payload, and timestamp (Lin et al., 27 May 2025).
Protocols: Dynamic agent selection, majority voting, utility-based convergence, and auctioning are used for task allocation and consensus (Lin et al., 27 May 2025, Han et al., 2 Jul 2025).
Example: Task auctioning in decentralized settings, and blackboard-based architectures that achieve state-of-the-art reasoning performance with lower token costs than static pipelines (Han et al., 2 Jul 2025).

3. Generation Techniques, Memory, and Planning Mechanisms

Creativity and collaborative effectiveness hinge on rigorous workflow design and sophisticated memory and planning modules (Lin et al., 27 May 2025, Aratchige et al., 13 Mar 2025).

Generation Techniques:
- Divergent Exploration: Agents maximize output novelty by broad sampling or prompt perturbation ( $T$ increases in $p(w_t | w_{<t}) \propto \exp \frac{\log P_{\text{LLM}}(w_t | w_{<t})}{T}$ ).
- Iterative Refinement: Multiple rounds of candidate output, scoring, and gradient-based updating of internal representations, e.g., $z^{(k+1)} = z^{(k)} + \eta \sum_j \nabla_z f_j(\text{decode}(z^{(k)}))$ .
- Collaborative Synthesis: Agents merge partial outputs, optimizing a global utility $U = \alpha N + \beta C + \gamma R$ (novelty, coherence, relevance).
Memory Systems:
- Short-term: Context windows for immediate history.
- Long-term: Vector databases, retrieval-augmented logs, or symbolic stores.
- Shared Memory: Centralized stores or blackboards for global context (Han et al., 2 Jul 2025).
- Self-Controlled Memory: LLM-based controllers decide what and when to store.
Planning Mechanisms:
- Tree-of-Thought, ReAct frameworks: Interleaved reasoning/action or explicit search trees (Aratchige et al., 13 Mar 2025).
- Parallelized Planning-Acting: Dual-thread architectures with interruptible execution for real-time responsiveness (Li et al., 5 Mar 2025).
- Graph-based and meta-learning policies: Dependency graphs and meta-learned agent coordination (Jia et al., 13 Mar 2025).

4. Application Domains and Empirical Performance

LLM-based MAS are applied in problem solving, world simulation, data markets, control engineering, software engineering, and large-scale optimization (Guo et al., 2024, Nagaitsev et al., 21 Nov 2025, Sashihara et al., 17 Nov 2025, Zahedifar et al., 26 May 2025, Sun et al., 7 Apr 2025).

Examples by Domain:
- Software: Multi-role development teams (e.g., MetaGPT), code review, and debug pipelines.
- Science: Teams of researcher-assistants and critics for literature search or hypothesis generation.
- Economics: LLM agents as buyers/sellers in data marketplaces, accurately reproducing real-world trading distributions (Sashihara et al., 17 Nov 2025).
- Engineering: Supervisory MAS for controller design, with agents specializing in retrieval, reasoning, simulation, critique, and communication (Zahedifar et al., 26 May 2025).
- Optimization: Multi-agent systems achieve a 2.88× speedup in PyTorch inference tasks by pairing exploit-heavy search with error-fixing agents (Nagaitsev et al., 21 Nov 2025).
- Creative Generation: LLM MAS surpass static and single-agent methods in knowledge, reasoning, and math, with reduced token usage (Han et al., 2 Jul 2025, Lin et al., 27 May 2025).

Domain	MAS Architecture	Metric/Task	SOTA Example	Performance
Reasoning	Blackboard	MATH, GSM8K	LbMAS	81.7% avg (Han et al., 2 Jul 2025)
Software	Layered/static	HumanEval	MetaGPT	Competitive
Data Markets	Decentralized	Trading metrics	LLM-MAS	Realistic distributions (Sashihara et al., 17 Nov 2025)
Optimization	Specialized agents	PyTorch speedup	PIKE-B+EFA	2.88× (Nagaitsev et al., 21 Nov 2025)
Engineering	Hierarchical	Control Design	LLM-Agent-Controller	83% completed (Zahedifar et al., 26 May 2025)

5. RL Optimization, Scalability, and Business Aspects

Reinforcement learning and graph-based policies support scalable, adaptive MAS; business incentives and privacy constraints shape deployment (Chen et al., 3 Jun 2025, Jia et al., 13 Mar 2025, Yang et al., 2024).

Multi-Agent RL (MARL):
- Critic-free algorithms such as MHGPO achieve higher stability and scalability over critic-based baselines (e.g., MAPPO), with group-based advantage estimation and flexible sampling (IS, FoF, RR) (Chen et al., 3 Jun 2025).
- Joint policy optimization using reward propagation, agent-specific penalties, and meta-learning enhances coordination in both search and embodied domains (Jia et al., 13 Mar 2025).
Business/Privacy:
- MLAS protocol supports agent monetization via credit allocation (Shapley value), incentive formulas $I_e = \alpha\,\text{DataVal}_e + \beta\,\text{TrafficRev}_e + \gamma\,\text{IntelRev}_e$ , and maintains privacy through local data stores, differential privacy, and SMPC (Yang et al., 2024).
- Decentralized systems (e.g., AgentNet) combine RAG and evolutionary specialization in a DAG to enable privacy-preserving collaboration (Yang et al., 1 Apr 2025).

6. Evaluation Metrics, Security, and Risk Management

Robust evaluation frameworks and emerging threat models inform the reliability and safety of LLM-based MAS (Lin et al., 27 May 2025, Zhang et al., 2024, Xie et al., 7 Jul 2025, Miao et al., 11 Aug 2025, Reid et al., 6 Aug 2025).

Creativity Metrics:
- Novelty: $1 - \max_{r\in\text{ref}} \frac{|x \cap r|}{|x \cup r|}$ .
- Coherence: Mean cosine similarity of adjacent sentence embeddings.
- Human-likeness: Discriminator-predicted score $\mathrm{HL}(x)$ .
- Distinct-n: Fraction of unique $n$ -grams.
General MAS Metrics:
- Success Rate, Task Completion Time, Collaboration Efficiency, Resource Usage (Guo et al., 2024).
- Accuracy, F1, Precision/Recall in domain-specific tasks (Härer, 12 Jun 2025).
Security:
- AgentPrune: Spatial-temporal graph pruning reduces redundant/malicious communication, cuts costs up to 87%, and increases robustness under adversarial attack by 3.5–10.8% (Zhang et al., 2024).
- BlindGuard: Hierarchical, unsupervised anomaly detector for malicious agent propagation, effective without attack-specific labels (Miao et al., 11 Aug 2025).
- AgentXposed: HEXACO trait modeling and behavioral interrogation for intention-hiding attack detection, outperforming MBTI/Big Five baselines by 8–15 pp in F1 (Xie et al., 7 Jul 2025).
Risk Analysis:
- Failure modes: cascading reliability, inter-agent communication failures, monoculture collapse, conformity bias, deficient theory of mind, and mixed-motive dynamics are formally defined and quantified (Reid et al., 6 Aug 2025).
- Staged testing (simulation, sandbox, pilot, deployment) with convergent evidence is advocated for robust governance.
- Key metrics: reliability cascade, communication fidelity, monoculture correlation, conformity error rate, and multi-agent risk scores.

7. Open Challenges and Future Directions

Outstanding research challenges include evaluation standardization, coordination protocol adaptivity, scaling communication, bias mitigation, theoretical guarantees, and collective intelligence synthesis (Lin et al., 27 May 2025, Guo et al., 2024, Yang et al., 2024, Miao et al., 11 Aug 2025, Reid et al., 6 Aug 2025).

Benchmark Unification: Current lack of integrated benchmarks spanning text, image, and multi-modal creativity.
Bias and Ethics: Insufficient mitigation of stereotype amplification; recommendation of embedded meta-personas for bias detection/mitigation.
Scalability: $O(N^2)$ communication complexity requires topology adaptation and role assignment meta-learning.
Security: Adaptive, topology-agnostic defenses needed for evolving adversarial threats; unsupervised and contrastive anomaly detection are promising directions.
Theory: Formal limits of MAS synergy, triggers for emergent agency, and certification of “true” creativity remain open.

References

(Lin et al., 27 May 2025) Creativity in LLM-based Multi-Agent Systems: A Survey
(Guo et al., 2024) LLM based Multi-Agents: A Survey of Progress and Challenges
(Han et al., 2 Jul 2025) Exploring Advanced LLM Multi-Agent Systems Based on Blackboard Architecture
(Härer, 12 Jun 2025) Specification and Evaluation of Multi-Agent LLM Systems -- Prototype and Cybersecurity Applications
(Nagaitsev et al., 21 Nov 2025) Optimizing PyTorch Inference with LLM-Based Multi-Agent Systems
(Yang et al., 2024) LLM-based Multi-Agent Systems: Techniques and Business Perspectives
(Zahedifar et al., 26 May 2025) LLM-Agent-Controller: A Universal Multi-Agent LLM System as a Control Engineer
(Li et al., 5 Mar 2025) Parallelized Planning-Acting for Efficient LLM-based Multi-Agent Systems
(Aratchige et al., 13 Mar 2025) LLMs Working in Harmony: A Survey on the Technological Aspects of Building Effective LLM-Based Multi Agent Systems
(Jia et al., 13 Mar 2025) Enhancing Multi-Agent Systems via Reinforcement Learning with LLM-based Planner and Graph-based Policy
(Sashihara et al., 17 Nov 2025) LLM-based Multi-Agent System for Simulating Strategic and Goal-Oriented Data Marketplaces
(Liu et al., 6 Aug 2025) LLM Collaboration With Multi-Agent Reinforcement Learning
(Chen et al., 3 Jun 2025) Heterogeneous Group-Based Reinforcement Learning for LLM-based Multi-Agent Systems
(Xie et al., 7 Jul 2025) Who's the Mole? Modeling and Detecting Intention-Hiding Malicious Agents in LLM-Based Multi-Agent Systems
(Miao et al., 11 Aug 2025) BlindGuard: Safeguarding LLM-based Multi-Agent Systems under Unknown Attacks
(Zhang et al., 2024) Cut the Crap: An Economical Communication Pipeline for LLM-based Multi-Agent Systems
(Reid et al., 6 Aug 2025) Risk Analysis Techniques for Governed LLM-based Multi-Agent Systems
(He et al., 2024) LLM-Based Multi-Agent Systems for Software Engineering: Literature Review, Vision and the Road Ahead