MOSAIC: Multi-Agent Orchestration
- Multi-Agent Orchestration (MOSAIC) is a framework paradigm that decomposes global objectives into modular subtasks executed by specialized autonomous agents.
- It employs formal routing via FSMs and DAG-based scheduling along with hierarchical role assignments to ensure scalable, continuous quality control.
- Empirical evaluations show MOSAIC frameworks outperform single-agent systems in reliability and efficiency across diverse domains from desktop automation to climate science.
Multi-Agent Orchestration (MOSAIC) is a paradigm and class of frameworks for structured coordination among specialized autonomous agents—human or artificial—designed to solve complex, multi-step tasks across diverse domains. MOSAIC approaches explicitly decompose global objectives into subtasks, assign those subtasks to modular agents via formal routing or game-theoretic mechanisms, and execute robust, adaptive workflows with continuous quality control, proactive replanning, and dynamic error recovery. Recent literature details MOSAIC instantiations in desktop automation (Guo et al., 14 Sep 2025), climate science (Kim et al., 25 Nov 2025), process modeling (Lin et al., 2024), scientific coding (Raghavan et al., 9 Oct 2025), resilient multi-domain networks (Chen et al., 2019), question-answering (Seabra et al., 2024), incident response (Drammeh, 19 Nov 2025), software testing (Hariharan et al., 12 Oct 2025), social simulations (Liu et al., 10 Apr 2025), meta-agent selection (Agrawal et al., 3 May 2025), and more.
1. Formal Principles and Control Mechanisms
MOSAIC frameworks are fundamentally characterized by explicit control-flow and state management, allowing predictable, auditable routing of information, decisions, and artifacts:
- Finite-State Machine (FSM)-Driven Routing: Agentic Lybic operates its entire orchestration logic as a deterministic FSM , where each state captures the goal status, subtask execution, and controller mode (e.g., REPLAN, QUALITY_CHECK, EXECUTE_ACTION) (Guo et al., 14 Sep 2025). Transitions are triggered by formally enumerated event codes. Quality control is baked into the state transitions via gating functions based on visual similarity and progress metrics, enforcing strict action validity and enabling adaptive replanning.
- Hierarchical Layering and Reasoning: Many MOSAIC systems organize agents into tiered architectures (e.g., Controller→Manager→Worker→Evaluator in Agentic Lybic; Orchestrate-Agent→Plan-Agent→Data/Coding-Agents in ClimateAgent (Kim et al., 25 Nov 2025)), enforcing separation of concerns—strategic oversight, tactical planning, specialized execution, and quality assessment. In network-of-networks settings, “games-in-games” structure agents into strategic, tactical, and mission layers, each operating its own distributed game, and compose the layers for resilience and adaptability (Chen et al., 2019).
- Dynamic Subtask Routing: Subtasks are represented as nodes in directed acyclic graphs (DAGs) or hypergraphs, with explicit dependencies. Plans are topologically sorted and dispatched to agent roles. Orchestration can adaptively invoke specialized agents—e.g., GUI Operator, System Technician, or Reasoning Analyst (Guo et al., 14 Sep 2025)—and switch modalities to optimize efficiency.
2. Modular Agent Roles and Workflow Composition
MOSAIC explicitly defines and instantiates modular agents, each encapsulating a narrowed functional persona:
- Specialization: Agent subtypes handle distinct modalities or reasoning scopes. ClimateAgent, for instance, delegates data acquisition to Data-Agents capable of dynamic API introspection, processing to Coding-Agents, and reporting to Visualization Agents (Kim et al., 25 Nov 2025). Agricultural VQA frameworks assign Retriever, Reflector, parallel Answerers, and Improver roles to handle iterative evidence gathering, bias reduction, and multi-image alignment (Ke et al., 29 Sep 2025).
- Role Assignment Protocols: Role-based task allocation adopts formal matching—plan nodes annotated with required skills are mapped to the most capable agent by capability vectors or neural selectors (Guo et al., 14 Sep 2025, Agrawal et al., 3 May 2025). In OrchVis, hierarchical goal alignment and skill-to-task matching are optimized as assignment problems over directed goal graphs (Zhou, 28 Oct 2025).
- Inter-Agent Communication: Agents exchange triggers, signals, and artifacts through well-defined schemas—event codes (FSM), structured JSON messages (LangGraph), and shared memory contexts. Multi-agent frameworks employ both synchronous (stepwise, blocking) and asynchronous (event-driven, concurrent) execution.
3. Robust Quality Control, Recovery, and Error Handling
MOSAIC advances reliability via multi-triggered, continuously evaluated quality gates and structured error recovery:
- Continuous Quality Gates: Every critical state transition and subtask result is gated—for example, Agentic Lybic uses similarity and progress metrics to classify results as gate_done, gate_fail, gate_continue, or gate_supplement (Guo et al., 14 Sep 2025). ClimateAgent’s Coding-Agent self-correction loop uses semantic LLM validation and retries up to micro-iterations for robust execution (Kim et al., 25 Nov 2025).
- Structured Replanning: Upon error detection—timeout, execution, or stagnation—the orchestration core triggers partial or full replanning, often invoking Manager or Plan-Agents to adjust subtask lists, issue supplemental data requests, or ask for clarification (Guo et al., 14 Sep 2025, Yang et al., 9 Dec 2025).
- Artifact-Based Collaboration: Quality control extends to artifact mediation—for instance, the Analyst–Operator shared memory files in Agentic Lybic generalize to any domain where structured perception and reasoning must be decoupled (Guo et al., 14 Sep 2025).
4. Performance, Empirical Evaluation, and Benchmarks
Empirical evidence consistently demonstrates that MOSAIC orchestration yields significant performance, reliability, and interpretability gains over monolithic or single-agent baselines:
- Outperforming Single-Agent Systems: On the OSWorld desktop automation benchmark, Agentic Lybic achieves 57.07% success in 50 steps, setting a new record above previous methods (Guo et al., 14 Sep 2025). ClimateAgent reaches 100% task completion with a report quality score of 8.32, outperforming Copilot and GPT-5 (Kim et al., 25 Nov 2025). Multi-agent orchestration for incident response delivers 100% actionable recommendations, 80× specificity, and zero quality variance compared to 1.7% for single LLM copilot configurations (Drammeh, 19 Nov 2025).
- Complex Workflow Domains: In scientific coding, MOSAIC frameworks enable auditable decomposition, iterative debugging, and robust error correction—improving main-problem solve rates and numerical precision for composite problems (Raghavan et al., 9 Oct 2025). In process modeling for BPMN, multi-agent orchestration (generation, refinement, reviewing, testing) yields outputs outperforming 52–89% of human-drawn models (Lin et al., 2024).
- Ablation and Similarity Analysis: Removal of key orchestration layers—e.g., Reviewer, Debugger agents, or continuous gating—significantly degrades quality, highlighting the necessity of comprehensive, multi-agent protocols (Guo et al., 14 Sep 2025, Raghavan et al., 9 Oct 2025, Kim et al., 25 Nov 2025).
5. Generalization, Domain Adaptation, and Architectural Insights
MOSAIC orchestration frameworks exhibit generalization and extensibility across diverse problem domains:
- Domain-Agnostic Patterns: ClimateAgent’s hierarchical planning, multi-candidate recovery, and metadata-driven code synthesis are directly portable to genomics and materials science workflows (Kim et al., 25 Nov 2025). Multi-source Q&A systems for Contract Management integrate router, RAG, SQL, and graph agents, with dynamic prompt engineering and per-source confidence scoring (Seabra et al., 2024).
- Multi-Layer Game-Theoretic Design: In multi-domain networked operations, blending inter-layer equilibria (strategic, tactical, mission) enables self-adaptivity, resilience, and secure-by-design properties (Chen et al., 2019). The Gestalt Nash Equilibrium formalism quantifies full-system resilience and agent coordination stability.
- Emergent Capabilities: Modular sharing and composition (MOSAIC in RL collective learning) drive sample-efficient curriculum emergence—agents selectively transfer subnetworks via cosine similarity over Wasserstein embeddings (Nath et al., 5 Jun 2025).
6. Design Guidelines and Best Practices
The literature summarizes specific design lessons for robust, scalable, generalizable multi-agent orchestration:
- Enumerate Situations and Triggers: Use FSMs or Petri nets to structure orchestration flow; explicitly enumerate all controller states and triggers to prevent unobservable state drift (Guo et al., 14 Sep 2025).
- Tiered Reasoning: Structure control flow into tiered modules—global oversight, strategic planners, specialized executors, and continuous verifiers. Modular workflows enhance separation of concerns and component upgradability (Guo et al., 14 Sep 2025, Kim et al., 25 Nov 2025).
- Multi-Triggered Quality Gates: Gating should be periodic, stagnation-triggered, and success-driven, with fine-grained outcomes (done/fail/supplement/continue) to enable proactive error handling rather than delayed rollback (Guo et al., 14 Sep 2025).
- Artifact-Mediated Coordination: Facilitate inter-agent collaboration through shared artifacts or external tool APIs (e.g., external BPMN validators, medical codebook retrievers), enabling domain extensibility (Lin et al., 2024, Yang et al., 9 Dec 2025).
- Human Oversight and Interaction Balance: Systems such as OrchVis empower users to control high-level goals and conflict resolution while delegating procedural execution to agents; strategic autonomy minimizes micromanagement (Zhou, 28 Oct 2025).
- Dynamic Agent Selection and Meta-Orchestration: Neural orchestrators use task context, agent history, and fuzzy response evaluation to select optimal agents per task, supporting extensible, interpretable, and adaptive MAS pipelines (Agrawal et al., 3 May 2025).
- Monitor Appropriateness and Heterogeneity: Theory and simulation show orchestration value arises strictly when agents differ in skill or cost; empirical estimation of appropriateness is recommended before deploying complex orchestrators (Bhatt et al., 17 Mar 2025).
7. Applications and Broader Impact
MOSAIC is now established in fields ranging from autonomous desktop automation (Guo et al., 14 Sep 2025), analytic climate workflows (Kim et al., 25 Nov 2025), process engineering (Lin et al., 2024), scientific code generation (Raghavan et al., 9 Oct 2025), collective RL (Nath et al., 5 Jun 2025), secure network-of-networks (Chen et al., 2019), social simulations (Liu et al., 10 Apr 2025), robust question-answering (Seabra et al., 2024), to production incident response (Drammeh, 19 Nov 2025) and meta-agent selection (Agrawal et al., 3 May 2025). Its generalizable design patterns—modular layering, explicit state control, dynamic assignment, continuous verification, and artifact-based mediation—yield measurable advances in reliability, performance, interpretability, and domain adaptation. The orchestration layer is frequently the decisive factor separating state-of-the-art, reproducible multi-agent systems from brittle or drifting single-model baselines, and is increasingly recognized as an operational requirement for scale, safety, and auditability.