LLM-based Modular Multi-Agent Frameworks

Updated 9 March 2026

LLM-based Modular Multi-Agent Frameworks are systems that decompose complex tasks into specialized, modular components coordinated via defined messaging protocols.
They leverage dynamic task routing, memory integration, and secure tool wrappers to enhance performance across applications like automation, software engineering, and formal reasoning.
Empirical studies show that joint orchestrator and agent memory designs can boost execution accuracy and coordination success significantly compared to isolated configurations.

A LLM-based modular multi-agent framework is an architectural paradigm in which multiple LLM-driven agents, each encapsulating a specialized reasoning function or toolset, interoperate via clearly defined module boundaries, messaging schemas, and orchestration mechanisms. These frameworks achieve complex reasoning, robust tool use, and adaptive coordination across agents by explicitly partitioning system responsibilities (planning, memory, tool invocation, environment interaction) into modular components, each of which can be independently implemented, extended, replaced, or optimized. Key realizations of this paradigm include LEGOMem (Han et al., 6 Oct 2025), LaMAS (Yang et al., 2024), and OMAC (Li et al., 17 May 2025), with a diversity of application domains ranging from workflow automation to defense, formal mathematics, document understanding, software engineering, and large-scale coordination.

1. Architectural Principles and System Design

LLM-based modular multi-agent frameworks are grounded in explicit decomposition of system responsibilities. Typical instantiations, such as LEGOMem, LaMAS, and ALMAS, establish an architecture consisting of the following high-level modules:

Orchestrator (or Supervisor, Orchestration Layer): Centralized decision-maker responsible for overall workflow planning, decomposition of tasks into subtasks, dynamic agent selection, and global state tracking.
Task Agents (Worker Agents): Special-purpose LLM-based modules assigned to subdomains, such as tool use (Word, Excel, Email), modal analysis (technical/semantic), code synthesis/testing, document parsing, or critique.
Tool/Environment Wrappers: Modular adapters that allow agents to interact with external APIs, simulated environments, or legacy systems, often standardizing tool-call invocation and output.
Memory Components (Procedural, Episodic, Working): Retrieval-based stores of previous trajectories, plans, results, and intermediate execution traces, specialized by agent role.
Shared Communication and Context: Message buses, blackboards, graph-based state passing, or tuple-based protocols which enable agents to share partial results, feedback, or coordination signals.

These systems are characterized by:

Role-based or capability-based agent registration and discovery (Yang et al., 2024).
Explicit, typed message schemas for agent–agent and agent–environment communication (JSON, tuple, or YAML schemas).
Decoupling of orchestration (task assignment/control flow) from core agent reasoning and execution loops (Li et al., 29 May 2025), often reflected in layered system diagrams.

2. Modular Procedural Memory and Experience

The memory subsystem in modular frameworks serves both to enhance agent reasoning (few-shot prompting, case retrieval) and to improve team-level adaptation across workflows. LEGOMem (Han et al., 6 Oct 2025) exemplifies such a procedural memory system:

Memory Units: Self-contained records distilled from past successful trajectories, including high-level decompositions, tool call sequences, result summaries, and explicit agent reflections.
Full-task vs. subtask granularity: Orchestrator memory stores full-task plans for workflow decomposition and delegation; agent memory caches execution-level subtasks correlated with concrete tool usage.
Retrieval Mechanism: Embedding-based indexing $\varphi(\cdot)$ , cosine similarity scoring, and nearest-neighbor search support both orchestrator-level (task) and agent-level (subtask) retrieval.
Dynamic Allocation: Joint placement of memory banks maximizes system accuracy and enables strong performance even for teams composed of smaller LLMs.

The systematic study of memory placement reveals that combining orchestrator and agent memory yields 58.44% success in OfficeBench, surpassing orchestrator-only (53.29%) and agent-only (49.78%) variants (Han et al., 6 Oct 2025).

3. Coordination, Planning, and Routing Patterns

Modular multi-agent frameworks incorporate advanced routing and planning strategies to achieve both scalability and adaptiveness:

Dynamic Task Routing & Parallelism: Orchestrators (or planners) compute confidence metrics for agent-task pairs $c_j(t)$ and ambiguity thresholds $\theta$ to balance serial assignment with parallel evaluation (Xia et al., 22 Jul 2025). Tasks with low-confidence are routed in parallel to multiple agents, with an evaluator scoring and selecting the best candidate output.
Structured Feedback and Iterative Refinement: Feedback buses carry structured critique—clarification, override, revision requests—allowing agents to iteratively improve outputs. Mathematical formulations model revision steps as sequential minimization of a local loss $\mathcal{L}$ subject to feedback (Xia et al., 22 Jul 2025).
Support for Competitive and Cooperative Reasoning: Worker pools may be evaluated competitively on ambiguous tasks or cooperate via bidirectional information propagation, as formalized in graph-driven message passing algorithms (Li et al., 29 May 2025).
Planning Interfaces: Both schema-constrained multi-step plans and free-form LLM-native planning are supported; over-constrained interfaces typically reduce accuracy and increase latency (Orogat et al., 3 Feb 2026).

4. Memory, Specialization, and Tool-Use Modules

LLM-based multi-agent frameworks treat memory and specialization as explicit, role-aware modules:

Procedural vs. Episodic Memory: Retrieval-based vector stores encode either high-level workflow plans or low-level execution traces. Modular memory allows hybrid use of working (accumulation), retrieval, and hybrid retrieval-accumulation architectures (Han et al., 6 Oct 2025, Orogat et al., 3 Feb 2026).
Agent Specialization: Agents declare capability vectors or specialization profiles at registration (LaMAS, ALMAS), allowing orchestrators to match subtasks to optimal agents using similarity functions (Yang et al., 2024).
Tool Integration: Tools are represented as MCP servers exposing formal APIs, permitting atomic invocation under role-aware access policies (Chao et al., 13 Oct 2025). Agents invoke tools through wrappers, which expose schemas for validation, capability negotiation, and security enforcement (Yang et al., 2024).

5. Evaluation: Benchmarks, Empirical Findings, and Best Practices

A unified benchmarking pipeline, MAFBench (Orogat et al., 3 Feb 2026), has enabled systematic comparison across diverse multi-agent LLM frameworks. Core empirical findings include:

Metric	Graph/Role-based	GABM
Median Latency	1.3–10× baseline	>100× baseline
Planning Accuracy	up to –30pp
Coordination Success	90+% → <30%
Retrieval Memory AR	44.9 (hybrid)	19.1 (accum)

Memory-centric architectures (retrieval or hybrid) achieve the highest Accurate Retrieval and Long-Range Understanding (Orogat et al., 3 Feb 2026).
Joint orchestrator and agent memory maximizes task decomposition and execution accuracy in office automation workflows (Han et al., 6 Oct 2025).
Dynamic task routing with bidirectional feedback leads to up to +29% factual coverage and +47% coherence over static pipelines (Xia et al., 22 Jul 2025).
Contrastive optimization actors (Semantic Initializer, Contrastive Comparator) yield measurable improvements in code and reasoning accuracy (e.g., HumanEval Pass@1 from 85.74% to 89.25%) in OMAC (Li et al., 17 May 2025).
Explicit modular decomposition (as in MASA or LAMPS) facilitates agile extension and precise enforcement of security constraints.

Design principles derived from these studies (Orogat et al., 3 Feb 2026):

Prefer minimally-deep orchestration for scalability.
Architect memory to fit task semantics, not context size.
Default to permissive (LLM-native) planning over rigid schemas.
Encode specialization procedurally rather than via role-prompt labels.
Align communication topology with coordination task (local vs. global).
Engineer at the system interface—not just prompt level—for multi-agent robustness.

6. Security, Privacy, and Safety Modules

Advanced modular frameworks (e.g., Terrarium (Nakamura et al., 16 Oct 2025), AutoDefense (Zeng et al., 2024)) integrate security, privacy, and safety as first-class modules:

Blackboard and Communication Bus: Central data store in which all agent proposals, tool invocations, and observations are logged, supporting append-only, channel-based access.
Fine-grained Access Control: Encryption, authentication tokens, channel-level ACLs, and HMAC-based message validation (Nakamura et al., 16 Oct 2025).
Multi-Agent Defense Chains: Task splitting among roles (intention analysis, prompt inference, judgment), modular pluggability of classifiers (e.g., Llama Guard), and response filtering to reduce adversarial success (Jailbreak ASR: 7.95% vs. 55.74% baseline (Zeng et al., 2024)).
Extensibility for Security Research: Attack modules (confidentiality, data poisoning, context overflow) plug into the same MCA schema, enabling reproducible testing and quantitative risk evaluation (Nakamura et al., 16 Oct 2025).

7. Extensibility, Hierarchical Modularity, and Applications

Hierarchical modular designs (LLM × MapReduce-V3 (Chao et al., 13 Oct 2025)) demonstrate that:

Atomic MCP servers encapsulate functional units (grouping, skeleton init, digest, refinement), which can be arbitrarily aggregated into hierarchical agents (e.g., Analysis, Skeleton, Writing agents).
Dynamic LLM-driven planners select toolchains and workflows on-the-fly based on execution history, available tools, and user feedback tokens, creating an adaptable yet auditable research pipeline.
Human-in-the-Loop Integration: User interventions are elevated to first-class actions within the same protocol as agent-to-agent calls—crucial for tasks requiring consensus or subjective input.

Applications of modular LLM-based multi-agent frameworks are diverse and include:

Workflow and office process automation via flexible agent/tool schemas (Han et al., 6 Oct 2025).
Security pipeline design for code package analysis (Zeshan et al., 17 Jan 2026).
Multi-modal data fusion for financial trading (Wu et al., 13 Jul 2025).
Automated software engineering, spanning the full SDLC (Tawosi et al., 3 Oct 2025).
Formal mathematics autoformalization with theorem-prover integration (Zhang et al., 10 Oct 2025).

References

LEGOMem: Modular Procedural Memory for Multi-agent LLM Systems for Workflow Automation (Han et al., 6 Oct 2025)
LaMAS: LLM-based Multi-Agent Systems: Techniques and Business Perspectives (Yang et al., 2024)
OMAC: A Broad Optimization Framework for LLM-Based Multi-Agent Collaboration (Li et al., 17 May 2025)
Parallelism Meets Adaptiveness: Scalable Documents Understanding in Multi-Agent LLM Systems (Xia et al., 22 Jul 2025)
Terrarium: Revisiting the Blackboard for Multi-Agent Safety, Privacy, and Security Studies (Nakamura et al., 16 Oct 2025)
MAFBench and Unified Analysis: Understanding Multi-Agent LLM Frameworks: A Unified Benchmark and Experimental Analysis (Orogat et al., 3 Feb 2026)
LLM $\times$ MapReduce-V3: Enabling Interactive In-Depth Survey Generation through a MCP-Driven Hierarchically Modular Agent System (Chao et al., 13 Oct 2025)
ALMAS: an Autonomous LLM-based Multi-Agent Software Engineering Framework (Tawosi et al., 3 Oct 2025)
MASA: LLM-Driven Multi-Agent Systems for Autoformalization (Zhang et al., 10 Oct 2025)
Many Hands Make Light Work: An LLM-based Multi-Agent System for Detecting Malicious PyPI Packages (Zeshan et al., 17 Jan 2026)
MountainLion: A Multi-Modal LLM-Based Agent System for Interpretable and Adaptive Financial Trading (Wu et al., 13 Jul 2025)
AutoDefense: Multi-Agent LLM Defense against Jailbreak Attacks (Zeng et al., 2024)
Cross-Task Experiential Learning on LLM-based Multi-Agent Collaboration (Li et al., 29 May 2025)
LLM-Agent-UMF: LLM-based Agent Unified Modeling Framework for Seamless Design of Multi Active/Passive Core-Agent Architectures (Hassouna et al., 2024)