Query Agent: Autonomous Query Rewriting
- Query Agent is a system that transforms ambiguous queries into precise, semantically rich reformulations for optimized retrieval.
- It employs multiple specialized agents to decompose, rewrite, and analyze queries, significantly improving recall and accuracy.
- Reinforcement learning and zero-shot LLM reranking drive its performance, achieving notable gains over traditional retrieval methods.
A Query Agent is a specialized autonomous module or multi-agent ensemble whose explicit purpose is to transform an information need, often articulated in natural language or ambiguous form, into a concrete, semantically robust, and coverage-maximizing set of queries for subsequent retrieval or downstream processing. Query Agents employ various forms of decomposition, reformulation, collaborative reasoning, and policy optimization to maximize task-specific retrieval utility—such as Recall, Mean Reciprocal Rank, or end-to-end accuracy—under domain or interactional constraints. The following sections provide a technical overview of Query Agent design, with emphasis on advanced multi-agent query understanding as typified by LegalMALR (Li et al., 25 Jan 2026).
1. Multi-Agent Query Understanding Architecture
LegalMALR exemplifies a modern Query Agent as a Multi-Agent Query Understanding System (MAS). Rather than relying on a single “one-shot” embedding or lexical reformulation, MAS decomposes the query-understanding process across a set of interacting LLM-driven agents:
- Planner Agent: Dynamically selects which specialized agent to invoke at each turn, monitors coverage of statutory space, and decides on termination.
- Rewrite/Analysis Agents: Six distinct agents execute roles such as Single-Element Rewrite (colloquial→statutory term), Supplementary-Element Rewrite (make implicit legal conditions explicit), Multi-Element Decomposition (divide multi-issue queries), Supportive-Law Rewrite (retrieve auxiliary statutes), and Semantic-Abnormality Analyzer/Rewriter (detect/repair doctrinal inconsistencies).
- Retrieval Subsystem: Each query reformulation is submitted to a dense retriever (Qwen3-Embedding-4B), yielding top-30 candidates, and pruned to top-10 by a lightweight reranker.
- Candidate Aggregation: All retrieved candidates are pooled and deduplicated per iteration.
The overall pipeline is iterative. The Planner observes both the query and the accumulation of retrieved candidates, adaptively firing agents to produce semantically diverse reformulations until statutory coverage plateaus or a termination criterion is met. On average, ≈14 distinct statute candidates per query are produced after 1–4 rounds of rewrite+retrieve (Li et al., 25 Jan 2026).
2. Query Reformulation via Prompt-Parameterized Agents
Each specialized agent operates over the shared LLM backbone but is governed by a unique system prompt that induces role-specific behavior and output format. For example:
- Single-Element Rewrite: “You are a legal-language specialist. Rewrite the query into precise statutory terms. Output one reformulation.”
- Supplementary-Element Rewrite: “You are a criminal-law expert. Add any decisive but implicit conditions. Output one enriched reformulation.”
- Multi-Element Decomposition: “Decompose into focused sub-queries, each targeting a legal issue. Output multiple reformulations.”
This design ensures that, while using the same underlying LLM (Qwen3-4B-Instruct in LegalMALR), the system produces a broad spectrum of legally-informed query rewrites, surfacing different interpretations and resolutions of underspecified or implicit issues in the original user input. Adaptive agent firing by the Planner maximizes legal coverage while minimizing redundant or irrelevant reformulations (Li et al., 25 Jan 2026).
3. Reinforcement Policy Optimization for Agent Coordination
Naïve prompt engineering cannot adequately control the high stochasticity and variance of LLM-driven agent output. To address this, LegalMALR employs Generalized Reinforcement Policy Optimization (GRPO), training a single agent policy π_θ across the Planner and all rewrite agents.
The reward structure for a trajectory τ is:
- Terminal reward:
- Step penalty: per iteration
- Hit reward (shaping): , where is the number of new gold statutes retrieved at step t
- Fallback penalty: if MAS exits before retrieval
Group normalization across K=8 trajectories per query sharpens advantage estimates:
with objective
This objective is optimized via a REINFORCE estimator, with LoRA adapters on a frozen backbone, stabilizing multi-step agentic behaviors and greatly reducing variance compared to standard PPO or a fixed policy. MAS deployments use a single rollout per query at inference, with Planner temperature and rewrite temperature for consistency and coverage (Li et al., 25 Jan 2026).
4. LLM-Based Zero-Shot Reranking
After multi-agent candidate generation, final reranking is performed by a zero-shot LLM (Qwen-Max) in a listwise fashion. The reranker prompt requests analysis based on:
- Factual alignment of statute and query
- Satisfaction of statutory elements
- Exception and condition structure
- Doctrinal coherence
The model ingests the user query, a list of candidate statutes (each with initial retrieval score), and emits a JSON array of top-K statute indices, balancing score-based and reasoning-based relevance. This zero-shot reranking consistently outperforms lightweight submodel rerankers (by 10–15pp in Mean Reciprocal Rank and Recall on both in-distribution and out-of-distribution benchmarks) (Li et al., 25 Jan 2026).
5. End-to-End Retrieval Pipeline and Empirical Results
The complete pipeline:
- User query input.
- MAS multi-agent iterative reformulation and retrieval, generating ≈14 candidates.
- Zero-shot LLM reranker orders the pool.
- Top-10 statutes returned.
Evaluation is conducted on CSAID (a curated set of 118 diverse, difficult legal queries) and the public STARD test set. Metrics include Recall@10 and MRR@10.
| Dataset | Method | Recall@10 | MRR@10 |
|---|---|---|---|
| STARD | Qwen3-Original RAG | 0.7579 | 0.6736 |
| Qwen3-SFT | 0.7690 | 0.7043 | |
| LegalMALR | 0.8195 | 0.7367 | |
| CSAID | Qwen3-Original | 0.6032 | 0.8720 |
| Qwen3-SFT | 0.6323 | 0.8663 | |
| LegalMALR | 0.6841 | 0.9161 |
The MAS query agent with GRPO and LLM reranking achieves +6.2pp and +8.1pp recall improvement, and +6.3pp/+4.4pp MRR improvement on STARD/CSAID, respectively, over standard RAG and finetuned baselines. This establishes that (1) multi-perspective query expansion, (2) agent-level reinforcement, and (3) end-to-end LLM reranking drive the upper bound for recall and legal applicability in statute retrieval (Li et al., 25 Jan 2026).
6. Technical Implementation and Extensibility
Critical implementation points for reproduction or extension include:
- All agents operate as prompt-engineered wrappers atop the same LLM; specificity emerges from prompt granularity, not model weights.
- MAS architecture is implemented as an iterative loop that supports up to four rounds of reformulation and retrieval per query, pooling deduplicated candidate statutes.
- GRPO training freezes the base LLM, optimizing only LoRA adapters for policy stability and compute efficiency.
- Zero-shot LLM reranking is prompt-driven and agnostic to the underlying retriever, allowing modular swap-in of larger or domain-specific models for reranking.
- CSAID, the associated dataset, is constructed with multiple statutory labels per query to facilitate robust evaluation of recall and coverage.
MAS and LegalMALR’s approaches directly address the challenge of implicit, multi-issue, or underspecified legal queries and demonstrate that multi-agent query understanding and LLM-enhanced reranking are synergistic for robust, high-recall statute retrieval (Li et al., 25 Jan 2026).
7. Extensions and Broader Impact
The MAS-based Query Agent framework in LegalMALR generalizes to domains beyond statute retrieval wherever complex, multi-faceted, or underspecified queries challenge conventional RAG systems. Mapping the agent roles to domain-specific concept expansion, explicit context addition, and decomposition tasks allows adaptation to biomedical, technical, or regulatory question-answering. Reinforcement-based agent policy learning via group-normalized rewards provides a stable path for optimizing multi-agent behaviors within larger orchestrated workflows in complex information-seeking settings (Li et al., 25 Jan 2026).
References:
- LegalMALR: Multi-Agent Query Understanding and LLM-Based Reranking for Chinese Statute Retrieval (Li et al., 25 Jan 2026)