Manager-RAG: Manager-Driven RAG Systems

Updated 22 November 2025

Manager-RAG is a multi-agent system that clearly separates planning, control, and subtask delegation, enhancing interpretability and robust trade-off management.
It employs dynamic agent coordination with specialized worker agents, as seen in frameworks like SIRAG, MA-RAG, and mRAG, to improve multi-hop reasoning and QA performance.
The framework integrates process-level intermediate supervision and SLA-driven optimization, ensuring scalable resource management and transparent decision-making.

Manager-RAG (Manager-Driven Retrieval-Augmented Generation) frameworks are a class of multi-agent systems that explicitly separate planning, control, and sub-module delegation from core retrieval and generation in Retrieval-Augmented Generation (RAG) pipelines. These approaches leverage a “manager” or “orchestrator” agent to coordinate specialized worker agents and enforce process-level supervision, hierarchical control, or system-level resource optimization. This explicit control structure enables fine-grained reasoning, improved interpretability, and robust trade-off management across a range of RAG applications, from open-domain QA and dialogue to cyber security, large-scale enterprise deployment, long-horizon automation, and domain-specific NER.

1. Core Principles and Architectural Patterns

Manager-RAG frameworks introduce an explicit separation between a central manager agent and one or more worker agents, each optimized for distinct subtasks (retrieval proposal, evidence filtering, answer generation, validation, or arbitration). The manager's core roles include:

Query decomposition (e.g., high-level planning, subtask sequencing)
Routing and task assignment (dynamic agent invocation)
Process supervision (choice of when to continue, halt, or reformulate retrieval or generation)
System-level monitoring, feedback control, and optimization

Notable instantiations:

SIRAG (“process-supervised multi-agent framework”): introduces a Decision Maker (manager) and Knowledge Selector (worker), supervised through LLM-as-judge scores at each step, with independence from the underlying retriever or generator (Wang et al., 17 Sep 2025).
MA-RAG: features a Planner (manager) that produces chain-of-thought structured plans and coordinates Step Definer, Extractor, and QA agents; essential for multi-hop reasoning and ambiguity resolution in information-seeking tasks (Nguyen et al., 26 May 2025).
mRAG: employs a Coordinator (manager) agent driving six specialized “worker” agents and controlling module invocation order, with manager policy learned via reward-guided self-training (Salemi et al., 12 Jun 2025).
CyberRAG: central manager LLM orchestrates a pool of attack-family classifiers, iterative retrieval agents, and report generators for robust cyber attack classification and explanation (Blefari et al., 3 Jul 2025).
MME-RAG: hierarchical “multi-manager-expert” architecture in which lightweight managers (“type-level judges”) determine type presence and delegate span extraction to experts, with responsibility for aggregation and high-level routing (Xue et al., 15 Nov 2025).
Mobile-Agent-RAG: employs Manager-RAG for high-level plan grounding using human-validated exemplars, coordinating with an Operator-RAG for execution-level guidance in hierarchical mobile automation (Zhou et al., 15 Nov 2025).

This architectural decomposition increases modularity, supports plug-and-play agent updates, and provides a natural locus for policy learning or optimization (Wang et al., 17 Sep 2025, Salemi et al., 12 Jun 2025).

2. Process-Level Supervision and Learning

One core advance in Manager-RAG methods is process-level intermediate supervision: rather than only crediting the final answer, manager decisions and agent actions receive fine-grained assessments.

In SIRAG, each Decision Maker (DM) or Knowledge Selector (KS) step is evaluated via a fixed LLM judge (e.g., GPT-4), granting a process-level score $r_{\rm proc}^i(s_t^i, a_t^i) \in [0, 1]$ , which is combined with the task reward: $r_t^i = \alpha R_{\rm sys} + \beta r_{\rm proc}^i$ (Wang et al., 17 Sep 2025).
Tree-structured rollouts enable exploration of diverse reasoning pathways, with rewards backpropagated via PPO and generalized advantage estimation (GAE), yielding more stable and interpretable convergence.
mRAG employs a reward-guided trajectory sampling paradigm: joint trajectories $(a_1, ..., a_n)$ are sampled, assessed on correctness (nugget-based recall) and faithfulness (retrieval-groundedness), with manager and other agent policies fine-tuned via supervised learning on highest-reward trajectories (Salemi et al., 12 Jun 2025).

The result is higher sample efficiency, reduced gradient variance, and direct credit assignment to intermediate choices—a fundamental improvement over pure RL with sparse terminal rewards.

3. Dynamic Agent Coordination and Task Decomposition

Manager-RAG systems facilitate explicit, dynamic decomposition of the overall task pipeline, supporting on-demand invocation of worker agents, tractable multi-hop or ambiguous reasoning, and flexible orchestration logic.

In MA-RAG, the Planner decomposes the user’s query $q$ into subtasks $P = \{s_1, \dots, s_n\}$ and orchestrates sequential or parallel sub-steps (Step Definer, Retrieval, Extractor, QA) (Nguyen et al., 26 May 2025).
The mRAG Coordinator iteratively examines state, selects the next agent (planner, searcher, reasoner, etc.), dispatches requests, collects structured replies, and determines termination based on aggregated outputs (Salemi et al., 12 Jun 2025).
CyberRAG’s manager LLM uses a selection policy $f_{\max}(c_1,\dots,c_N)$ to determine which classifier outputs are sufficiently credible for downstream action (threshold $\tau=0.5$ ), loops through iterative retrieval until evidentiary convergence, and logs rationale for each action (Blefari et al., 3 Jul 2025).

This explicit control enables efficient use of computational resources, avoids unnecessary computation, and grants interpretability to intermediate trajectories. Ablation on MA-RAG shows the Planner (manager) is crucial for multi-hop QA performance; removing it results in drastic accuracy loss (HotpotQA EM: $50.7\to36.2$ ) (Nguyen et al., 26 May 2025).

4. Optimizing for SLA and System Constraints

For deployment in real-world environments, Manager-RAG architectures serve as natural points for enforcing system-level Service Level Agreements (SLAs), quality metrics, and operations management.

Manager-RAG frameworks in SLA-driven QA systems map SLOs for latency, cost, and quality into a configuration vector $x$ (number of agents, sources, thresholds). The orchestrator solves a constrained optimization (e.g., knapsack) to minimize a weighted sum $f(x) = \alpha C(x) + \beta L(x) - \gamma Q(x)$ , subject to $Q(x)\ge Q_{\rm min}, L(x)\le L_{\rm max}, C(x)\le C_{\rm max}$ (Iannelli et al., 7 Dec 2024).
Dynamic reconfiguration per-query or in batch, with live monitoring and feedback, enables real-time mediation of cost-quality-latency tradeoffs. Empirical results show that ensemble size $N$ can be optimized for desired increases in $F_1$ (e.g., $F_1=0.648\to0.688$ for $N=1\to5$ agents, but with diminishing returns and linear cost increase) (Iannelli et al., 7 Dec 2024).
Concrete production strategies encompass auto-scaling via K8s, feedback loops for online quality estimation, and adaptive data-source selection under operational constraints.

This capability renders Manager-RAG pipelines highly suitable for regulated or high-demand environments (e.g., finance, health, customer support), where explicit resource governance and monitoring are mission-critical.

5. Robustness, Interpretability, and Human Alignment

Manager-RAG frameworks offer increased robustness, interpretability, and controllability over classical RAG.

Plug-and-play agent modularity: most Manager-RAG designs (e.g., SIRAG, CyberRAG, MME-RAG) require no modification to underlying retrievers or generators; managers operate as thin control wrappers, facilitating deployment and incremental extension (Wang et al., 17 Sep 2025, Blefari et al., 3 Jul 2025).
Human-aligned process transparency: manager agents produce explicit rationales (as in CyberRAG, “Chose SQLi because $c_{\rm SQLi}=0.99$ ”) and expose intermediate state transitions, retrieval rounds, or arbitration logic, allowing traceability and audit (Blefari et al., 3 Jul 2025, Wang et al., 17 Sep 2025).
Strategic hallucination mitigation: mobile automation frameworks such as Mobile-Agent-RAG constrain high-level LLM reasoning by grounding managers in human-validated task plan exemplars, reducing the space of plausible but unsupported strategies (Zhou et al., 15 Nov 2025).
Hierarchical decomposition: for NER and dialogue, MME-RAG’s multi-manager design allows parallel type-level judgments and selective expert invocation, raising F1 and retrieval precision (e.g., MME-RAG outperforms IF-WRANER and flat RAG; customer-service F1 improves from $93.44\to95.11$ ) (Xue et al., 15 Nov 2025).
Empirical ablation studies confirm that removal of manager policies consistently degrades multi-step accuracy, coherence, or planning effectiveness (Nguyen et al., 26 May 2025, Zhou et al., 15 Nov 2025, Xue et al., 15 Nov 2025).

6. Operations, Observability, and Best Practices

Manager-RAG architectures align naturally with emerging RAGOps and LLMOps paradigms for lifecycle management, observability, and quality control.

The 4+1 model—logical, process, development, physical, and scenario views—gives a systematic foundation for Manager-RAG: retriever, generator, and manager agents as microservices; modular deployment; and interleaving of pipeline and DataOps lifecycles (Xu et al., 3 Jun 2025).
Integrated observability: tracing all manager and agent decisions, storing DAGs of tool calls per query, and enforcing comprehensive audit logs for regulatory compliance (GDPR, AI Act).
Metrics and tradeoffs: real-time balancing of precision@k, recall@k, fluency, faithfulness, hallucination rate vs. LLM cost and response time. SLAs can be encoded as utility functions for manager optimization of retrieval depth $k$ and scoring weights (Xu et al., 3 Jun 2025, Iannelli et al., 7 Dec 2024).
Resilience patterns: dynamic fallback (ensuring minimal service under LLM/API outages), hot caching of frequent plans and query responses, and uncertainty-aware escalation to human agents.
Human-in-the-loop: manager agents may conditionally invoke expert review or hybrid bandit policies when uncertainty exceeds a tunable threshold (Xu et al., 3 Jun 2025).

Best practices from deployed Manager-RAG systems emphasize modular orchestration, all-stage observability, continuous drift detection, and explicit test coverage alignment (e.g., via coverage clustering/fallback expansion) (Xu et al., 3 Jun 2025). Use cases in taxation, scientific data cataloging, and cyberdefense validate the framework’s versatility.

7. Applications and Empirical Validation

Manager-RAG is applicable across QA, cyberdefense, long-horizon automation, NER/dialogue, and systems operation:

Domain/Framework	Manager Agent Role	Key Outcomes/Performance
SIRAG (Wang et al., 17 Sep 2025)	DM & KS, PPO-trained w/ LLM judge	More stable learning, interpretable
MA-RAG (Nguyen et al., 26 May 2025)	Planner (dynamic CoT)	Top QA on multi-hop, ablations show Planner essential
mRAG (Salemi et al., 12 Jun 2025)	Coordinator (reward self-trained)	Improved correctness/faithfulness
CyberRAG (Blefari et al., 3 Jul 2025)	Control classifiers/retrieval/report	Accuracy $+$ 10%, explanations 4.9/5
MME-RAG (Xue et al., 15 Nov 2025)	Multi-manager (type judge)	+1–2 F1 vs. flat, better transfer
Mobile-Agent-RAG (Zhou et al., 15 Nov 2025)	Plan-level RAG (human KB)	+17.4% CR, +10.2% step efficiency
SLA-Orchestrated (Iannelli et al., 7 Dec 2024)	SLA-driven Orchestrator	F1 scaling, cost/latency managed
RAGOps (Xu et al., 3 Jun 2025)	Scenario/integration management	Scalable, traceable QA pipelines

In all cases, empirical ablation and production studies demonstrate that manager-driven decomposition and control confer substantial practical advantages in interpretability, stability, and system-level optimization. The manager paradigm is now fundamental to the design of high-performance, robust, and observable RAG systems.