Multi-Agent Query Understanding System

Updated 1 February 2026

Multi-Agent Query Understanding System is a modular framework that decomposes query processing into coordinated specialized agents for enhanced interpretation and evidence aggregation.
It employs reinforcement learning to orchestrate XML-based function calls across sub-agents, achieving accuracy improvements of up to 30 percentage points over single-agent methods.
The system integrates dynamic topology, hierarchical memory, and domain-specialized agents to ensure robust performance even under adversarial conditions.

A Multi-Agent Query Understanding System (MAS) is a software architecture that decomposes the process of analyzing, interpreting, and responding to user queries into a coordinated set of specialized agents. Agents are orchestrated by a policy that dynamically assigns roles and designs data flows for robust and adaptive query processing. This paradigm leverages the collaborative strength and modularity of multiple agent types, especially when implemented with LLMs, to outperform traditional single-agent architectures in settings where queries are complex, ambiguous, domain-specific, adversarial, or require parallel data access and evidence aggregation (Ke et al., 21 Jan 2026).

1. Holistic Orchestration via Function-Calling Reinforcement Learning

MAS-Orchestra establishes orchestration as a single-step function-calling reinforcement learning (RL) problem. The orchestrator receives an input query $x$ and a Degree of MAS (DoM) $m \in \{\mathrm{Low}, \mathrm{High}\}$ , then emits an XML plan $a$ . This plan defines both the agent pool (agent roles, parameters) and the data-flow edges linking their outputs and inputs.

The execution is mediated by a deterministic parser $f$ that instantiates the agents as callable Python functions and wires up their communication graph. The RL objective maximizes the expected reward over query-answer pairs: $\max_{\theta}\;\mathbb{E}_{(x,y)\sim\mathcal D}\;\mathbb{E}_{a\sim\pi_\theta(\cdot\mid x,m)} \bigl[R\bigl(x,y,f(x,a)\bigr)\bigr].$ The reward $R(x, y, \hat{y})$ is a binary correctness signal. Optimization uses Group-Relative Policy Optimization (GRPO), a PPO-style update clipped and normalized groupwise for stability [(Ke et al., 21 Jan 2026), Appendix C]. The policy $\pi_{\theta}(a\mid x,m)$ is a prompt-tuned Transformer decoder, such as Qwen-7B, trained to emit valid XML orchestration in a single forward pass.

Sub-agents are abstracted as callable functions (e.g., def QueryParser(query:str)->ParseTreeJSON) and are externally described by JSON/XML signatures only—hiding internal control flow from the orchestrator. This yields strict boundaries, modularity, and ease of role assignment.

2. Controlled Task Characterization: The MASBENCH Benchmark

The performance of a MAS for query understanding is not universal but conditionally depends on task structure and system configuration. MASBENCH provides a rigorous benchmark suite parameterized along five critical axes:

Depth ( $D$ ): Maximum path length in the agent sub-task graph.
Horizon ( $H$ ): Number of intermediate results that must be retained and recalled.
Breadth ( $B$ ): Max in-degree (fan-in) of the node in the agent graph.
Parallelism ( $m \in \{\mathrm{Low}, \mathrm{High}\}$ 0): Number of subgraphs solvable in parallel.
Robustness ( $m \in \{\mathrm{Low}, \mathrm{High}\}$ 1): Number of sub-tasks corrupted by adversarial “poison notes.”

Explicit control over these axes enables systematic in-distribution and out-of-distribution generalization evaluation, and surfaces the precise conditions (e.g., high Breadth or Robustness) where MAS designs offer unique advantages over monolithic single-agent systems [(Ke et al., 21 Jan 2026), Table 2].

3. Empirical Performance and System Robustness

MAS-Orchestra demonstrates substantial gains over single-agent and fixed-pipeline baselines. On Breadth/Parallel tasks, accuracy improvements of up to +30 percentage points were observed compared to strong single agents (SAS). Robustness to adversarial/poisoned tasks is a persistent benefit, even as sub-agent LLMs increase in capability: under 10 poisoned subtasks, single agents collapse to near 0% accuracy, while MAS-Orchestra retains >50% [(Ke et al., 21 Jan 2026), Figs. 3, 5, 6].

On public datasets including AIME24/25, HotpotQA, and GPQA, MAS-Orchestra outperforms leading alternatives by 3.75–12.12 percentage points, highlighting generalizability and cross-domain transfer [(Ke et al., 21 Jan 2026), Table 3].

Ablation studies reveal: (i) the advantage of instruction-tuned orchestrators for schema and behavior adherence; (ii) diminishing returns from additional MAS structure as sub-agents strengthen, except under Robustness stressors; and (iii) the necessity of including adversarial examples in training to achieve reliable OOD robustness [(Ke et al., 21 Jan 2026), Figs. 4, 8].

4. System Blueprint: Practical MAS Design for Query Understanding

A state-of-the-art MAS for query understanding (MAQUS) comprises the following agent roles and orchestration principles:

SyntaxParser (CoTAgent): Tokenizes and grammaticalizes input queries.
SemanticParser (DebateAgent): Produces logical forms via inter-agent debate (e.g., Formalist vs. Pragmatist).
EntityLinker (Self-Consistency Agent): Employs multiple LLM chains to robustly disambiguate entities.
IntentClassifier (ReflexionAgent): Iteratively refines user intent using confidence feedback.
Retriever (WebSearchAgent): Executes multi-turn external knowledge retrieval.
AnswerSynthesizer (CoTAgent): Aggregates evidence and logical forms to generate final answers.

Design is formalized as an XML data-flow graph, with a high DoM (arbitrary fan-in/fan-out), orchestrator policy trained for user relevance (clicks, human labels), adversarial data to target robustness, and a “Verifier” agent that critiques and reroutes answers on detected errors (Ke et al., 21 Jan 2026). This structure enables parallel evidence aggregation, modular semantic decomposition, error detection, and iterative refinement.

5. Hierarchical Memory and Adaptive Architectures

To further advance system adaptation and knowledge retention, hierarchical memory architectures such as G-Memory embed MAS query processing within a three-tier graph: interaction trajectory (fine-grained turn-level logs), query graph (cross-trial index), and insight graph (abstracted lessons) (Zhang et al., 9 Jun 2025). Query-time retrieval combines high-level insights and relevant past trajectories, and agent-specific memory is synthesized for each role.

Up to +10.12 percentage-point accuracy improvements in QA benchmarks validate the inclusion of hierarchical, bi-directional memory in MAS, enabling effective cross-trial learning, condensed context, and continual evolution of agent behaviors—while maintaining low token cost per query (Zhang et al., 9 Jun 2025).

6. Dynamic Topology and Meta-Optimization

Adaptive topology methods represent the next frontier. One-for-all frameworks (OFA-MAS) use autoregressive graph generative models (TAGSE + Mixture-of-Experts) to map user queries to optimal MAS collaboration graphs in one pass, removing the need for task-specific model retraining and offering superior performance across benchmarks (Li et al., 19 Jan 2026).

Dynamic graph design, as in AMAS, further advances this by (i) storing the top- $m \in \{\mathrm{Low}, \mathrm{High}\}$ 2 RL-trained graphs, and (ii) learning a lightweight LLM-based selector to pick the graph best suited to each query (Leong et al., 2 Oct 2025). This approach consistently outperforms both static graphs and single-agent baselines, especially for cross-domain workloads and diverse reasoning tasks.

7. Specialization, Heterogeneity, and Domain Adaptation

Heterogeneous MAS architectures enable specialization by assigning task- and domain-optimal LLMs to each sub-agent, yielding performance improvements (5–8 percentage points in query understanding) relative to single-LLM baselines (Ye et al., 22 May 2025). MAS-GPT reframes MAS generation as program synthesis—given a query, a fine-tuned LLM emits executable Python code instantiating the MAS, orchestrating role allocation and control flow (Ye et al., 5 Mar 2025). This pipeline achieves both efficiency (low call count) and adaptability across domains with minimal engineering overhead.

In domain-specialized applications (e.g., statute retrieval), multi-agent query interpretation with reinforcement learning (GRPO) enables iterative, multi-perspective reformulation and retrieval, outperforming RAG-type and dense-retrieval baselines on challenging in-distribution and out-of-distribution datasets (Li et al., 25 Jan 2026). Combined with zero-shot LLM rerankers, this architecture achieves Recall@10 gains up to +8.09 percentage points and closes the gap between oracle and realized recall.

Fundamentally, Multi-Agent Query Understanding Systems advance beyond fixed, monolithic LLM-based approaches by leveraging explicit modularization, adaptive reasoning topologies, controlled evaluation, robust training, hierarchical memory, and heterogeneous specialization. The result is a robust, scalable, and extensible solution for complex real-world query understanding tasks (Ke et al., 21 Jan 2026, Zhang et al., 9 Jun 2025, Li et al., 19 Jan 2026, Leong et al., 2 Oct 2025, Ye et al., 5 Mar 2025, Ye et al., 22 May 2025, Li et al., 25 Jan 2026).