Papers
Topics
Authors
Recent
Search
2000 character limit reached

LLM-Based Multi-Agent Systems

Updated 15 December 2025
  • LLM-Based Multi-Agent Systems are computational frameworks where autonomous agents powered by large language models coordinate and collaborate to tackle complex tasks.
  • They implement structured agent taxonomies, dynamic communication protocols, and generation techniques such as iterative refinement and divergent exploration.
  • These systems are applied across domains like cybersecurity, scientific research, and economic simulation while addressing challenges in scalability, bias, and security.

LLM-Based Multi-Agent Systems (MAS) are computational frameworks in which multiple autonomous agents, each powered by a LLM, collaborate to solve complex tasks, coordinate actions, or simulate sophisticated environments. These systems are characterized by dynamic inter-agent communication, emergent division of labor, and collective problem solving, and are deployed across domains ranging from creative generation and scientific research to cybersecurity, control engineering, and economic simulation. Recent research establishes taxonomies, formal models, and practical protocols that guide the architecture, coordination, evaluation, and safety of LLM-based MAS (Lin et al., 27 May 2025, Guo et al., 2024, Yang et al., 2024).

1. Taxonomies of Agent Proactivity, Persona, and Specialization

LLM-based MAS incorporate a rich taxonomy of agent behaviors and role archetypes to support division of labor and creative workflows (Lin et al., 27 May 2025).

  • Proactivity Classes:
    • Reactive agents remain in a wait-for-instructions mode, solely responding to external stimuli without initiative.
    • Proactive agents continuously monitor contextual signals, taking initiative by proposing new tasks, suggesting changes, or alerting peers of opportunities.
    • Mixed agents dynamically switch between reactive and proactive modes through policies or entropy-based thresholds.
  • Persona Archetypes:
    • Expert (high-precision domain specialist), Creative Thinker (divergent idea generator), Critic (evaluates feasibility or logic), Coordinator (mediates allocation and conflict).
  • Task Allocation & Creativity:
    • Coordinators allocate brainstorming to Creative Thinkers; Experts filter for correctness; Critics perform ranking or down-selection.
    • Persona-driven pipelines enhance balance between novelty and feasibility in generative tasks.

Specialization and organic adaptation are observed through experience management: agents that excel at particular sub-processes see their role probability distributions updated, leading to emergent specialization and improved collective efficiency (Yang et al., 2024).

2. System Architectures and Communication Protocols

Architectural paradigms in LLM MAS include centralized, decentralized, layered, and hybrid designs, each with distinct trade-offs in scalability, robustness, and coordination overhead (Guo et al., 2024, Yang et al., 2024, Lin et al., 27 May 2025).

Architecture Description Strength Limitation
Centralized Orchestrator-agent manages state Simplicity Single point of failure
Decentralized/Peer-Peer Agents direct communication Robustness Consensus complexity
Layered/Hierarchical Multi-level routing/validation Validation Latency, complexity
Blackboard Public message board, dynamic selection Context sharing Scaling with roles
  • Message Passing: Synchronous (RPC) and asynchronous (queue-based) methods support flexible coordination. Common message schemas include sender_id, receiver_id, type, payload, and timestamp (Lin et al., 27 May 2025).
  • Protocols: Dynamic agent selection, majority voting, utility-based convergence, and auctioning are used for task allocation and consensus (Lin et al., 27 May 2025, Han et al., 2 Jul 2025).
  • Example: Task auctioning in decentralized settings, and blackboard-based architectures that achieve state-of-the-art reasoning performance with lower token costs than static pipelines (Han et al., 2 Jul 2025).

3. Generation Techniques, Memory, and Planning Mechanisms

Creativity and collaborative effectiveness hinge on rigorous workflow design and sophisticated memory and planning modules (Lin et al., 27 May 2025, Aratchige et al., 13 Mar 2025).

  • Generation Techniques:
    • Divergent Exploration: Agents maximize output novelty by broad sampling or prompt perturbation (TT increases in p(wtw<t)explogPLLM(wtw<t)Tp(w_t | w_{<t}) \propto \exp \frac{\log P_{\text{LLM}}(w_t | w_{<t})}{T}).
    • Iterative Refinement: Multiple rounds of candidate output, scoring, and gradient-based updating of internal representations, e.g., z(k+1)=z(k)+ηjzfj(decode(z(k)))z^{(k+1)} = z^{(k)} + \eta \sum_j \nabla_z f_j(\text{decode}(z^{(k)})).
    • Collaborative Synthesis: Agents merge partial outputs, optimizing a global utility U=αN+βC+γRU = \alpha N + \beta C + \gamma R (novelty, coherence, relevance).
  • Memory Systems:
    • Short-term: Context windows for immediate history.
    • Long-term: Vector databases, retrieval-augmented logs, or symbolic stores.
    • Shared Memory: Centralized stores or blackboards for global context (Han et al., 2 Jul 2025).
    • Self-Controlled Memory: LLM-based controllers decide what and when to store.
  • Planning Mechanisms:

4. Application Domains and Empirical Performance

LLM-based MAS are applied in problem solving, world simulation, data markets, control engineering, software engineering, and large-scale optimization (Guo et al., 2024, Nagaitsev et al., 21 Nov 2025, Sashihara et al., 17 Nov 2025, Zahedifar et al., 26 May 2025, Sun et al., 7 Apr 2025).

  • Examples by Domain:
    • Software: Multi-role development teams (e.g., MetaGPT), code review, and debug pipelines.
    • Science: Teams of researcher-assistants and critics for literature search or hypothesis generation.
    • Economics: LLM agents as buyers/sellers in data marketplaces, accurately reproducing real-world trading distributions (Sashihara et al., 17 Nov 2025).
    • Engineering: Supervisory MAS for controller design, with agents specializing in retrieval, reasoning, simulation, critique, and communication (Zahedifar et al., 26 May 2025).
    • Optimization: Multi-agent systems achieve a 2.88× speedup in PyTorch inference tasks by pairing exploit-heavy search with error-fixing agents (Nagaitsev et al., 21 Nov 2025).
    • Creative Generation: LLM MAS surpass static and single-agent methods in knowledge, reasoning, and math, with reduced token usage (Han et al., 2 Jul 2025, Lin et al., 27 May 2025).
Domain MAS Architecture Metric/Task SOTA Example Performance
Reasoning Blackboard MATH, GSM8K LbMAS 81.7% avg (Han et al., 2 Jul 2025)
Software Layered/static HumanEval MetaGPT Competitive
Data Markets Decentralized Trading metrics LLM-MAS Realistic distributions (Sashihara et al., 17 Nov 2025)
Optimization Specialized agents PyTorch speedup PIKE-B+EFA 2.88× (Nagaitsev et al., 21 Nov 2025)
Engineering Hierarchical Control Design LLM-Agent-Controller 83% completed (Zahedifar et al., 26 May 2025)

5. RL Optimization, Scalability, and Business Aspects

Reinforcement learning and graph-based policies support scalable, adaptive MAS; business incentives and privacy constraints shape deployment (Chen et al., 3 Jun 2025, Jia et al., 13 Mar 2025, Yang et al., 2024).

  • Multi-Agent RL (MARL):
    • Critic-free algorithms such as MHGPO achieve higher stability and scalability over critic-based baselines (e.g., MAPPO), with group-based advantage estimation and flexible sampling (IS, FoF, RR) (Chen et al., 3 Jun 2025).
    • Joint policy optimization using reward propagation, agent-specific penalties, and meta-learning enhances coordination in both search and embodied domains (Jia et al., 13 Mar 2025).
  • Business/Privacy:
    • MLAS protocol supports agent monetization via credit allocation (Shapley value), incentive formulas Ie=αDataVale+βTrafficReve+γIntelReveI_e = \alpha\,\text{DataVal}_e + \beta\,\text{TrafficRev}_e + \gamma\,\text{IntelRev}_e, and maintains privacy through local data stores, differential privacy, and SMPC (Yang et al., 2024).
    • Decentralized systems (e.g., AgentNet) combine RAG and evolutionary specialization in a DAG to enable privacy-preserving collaboration (Yang et al., 1 Apr 2025).

6. Evaluation Metrics, Security, and Risk Management

Robust evaluation frameworks and emerging threat models inform the reliability and safety of LLM-based MAS (Lin et al., 27 May 2025, Zhang et al., 2024, Xie et al., 7 Jul 2025, Miao et al., 11 Aug 2025, Reid et al., 6 Aug 2025).

  • Creativity Metrics:
    • Novelty: 1maxrrefxrxr1 - \max_{r\in\text{ref}} \frac{|x \cap r|}{|x \cup r|}.
    • Coherence: Mean cosine similarity of adjacent sentence embeddings.
    • Human-likeness: Discriminator-predicted score HL(x)\mathrm{HL}(x).
    • Distinct-n: Fraction of unique nn-grams.
  • General MAS Metrics:
    • Success Rate, Task Completion Time, Collaboration Efficiency, Resource Usage (Guo et al., 2024).
    • Accuracy, F1, Precision/Recall in domain-specific tasks (Härer, 12 Jun 2025).
  • Security:
    • AgentPrune: Spatial-temporal graph pruning reduces redundant/malicious communication, cuts costs up to 87%, and increases robustness under adversarial attack by 3.5–10.8% (Zhang et al., 2024).
    • BlindGuard: Hierarchical, unsupervised anomaly detector for malicious agent propagation, effective without attack-specific labels (Miao et al., 11 Aug 2025).
    • AgentXposed: HEXACO trait modeling and behavioral interrogation for intention-hiding attack detection, outperforming MBTI/Big Five baselines by 8–15 pp in F1 (Xie et al., 7 Jul 2025).
  • Risk Analysis:
    • Failure modes: cascading reliability, inter-agent communication failures, monoculture collapse, conformity bias, deficient theory of mind, and mixed-motive dynamics are formally defined and quantified (Reid et al., 6 Aug 2025).
    • Staged testing (simulation, sandbox, pilot, deployment) with convergent evidence is advocated for robust governance.
    • Key metrics: reliability cascade, communication fidelity, monoculture correlation, conformity error rate, and multi-agent risk scores.

7. Open Challenges and Future Directions

Outstanding research challenges include evaluation standardization, coordination protocol adaptivity, scaling communication, bias mitigation, theoretical guarantees, and collective intelligence synthesis (Lin et al., 27 May 2025, Guo et al., 2024, Yang et al., 2024, Miao et al., 11 Aug 2025, Reid et al., 6 Aug 2025).

  • Benchmark Unification: Current lack of integrated benchmarks spanning text, image, and multi-modal creativity.
  • Bias and Ethics: Insufficient mitigation of stereotype amplification; recommendation of embedded meta-personas for bias detection/mitigation.
  • Scalability: O(N2)O(N^2) communication complexity requires topology adaptation and role assignment meta-learning.
  • Security: Adaptive, topology-agnostic defenses needed for evolving adversarial threats; unsupervised and contrastive anomaly detection are promising directions.
  • Theory: Formal limits of MAS synergy, triggers for emergent agency, and certification of “true” creativity remain open.

References

Definition Search Book Streamline Icon: https://streamlinehq.com
References (20)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to LLM-Based Multi-Agent Systems.