Towards a Modular, Multi-Agent AI Search Paradigm
The paper "Towards AI Search Paradigm" (Li et al., 20 Jun 2025 ) presents a comprehensive architectural and methodological framework for next-generation search systems, advancing beyond both classical information retrieval (IR) and current retrieval-augmented generation (RAG) approaches. The authors propose a modular, multi-agent system that emulates human-like information seeking and decision-making, with explicit mechanisms for dynamic task decomposition, tool orchestration, robust execution, and context-aware answer synthesis.
Motivation and Context
Traditional IR systems, including lexical and learning-to-rank (LTR) models, are limited by their reliance on static document retrieval and ranking, often requiring users to synthesize information manually. RAG systems, while enabling direct answer generation, are typically single-shot and struggle with complex, multi-hop, or tool-requiring queries. The paper identifies a critical gap: current systems lack the cognitive flexibility and multi-stage reasoning necessary for complex information needs, such as those involving evidence aggregation, tool use, and dynamic planning.
Multi-Agent Architecture
The proposed AI Search Paradigm is structured around four specialized, LLM-powered agents:
- Master Agent: Analyzes query complexity and intent, dynamically assembles agent teams, and oversees execution with reflective re-planning.
- Planner Agent: Decomposes complex queries into a directed acyclic graph (DAG) of sub-tasks, selects tools via a Model-Context Protocol (MCP) platform, and adapts the system's capability boundary.
- Executor Agent: Executes sub-tasks, invoking external tools as needed, and incorporates fallback mechanisms for tool failures.
- Writer Agent: Synthesizes results from all sub-tasks, performing disambiguation, filtering, and multi-perspective answer generation.
This modular design enables the system to adaptively configure workflows based on query complexity, supporting three execution modes: Writer-only (simple queries), Executor-inclusive (moderately complex queries), and Planner-enhanced (complex, multi-step queries).
Methodological Innovations
1. Dynamic Task Planning and Tool Integration
The Planner agent leverages a dynamic capability boundary, selecting a relevant subset of tools for each query. Tool documentation is iteratively refined using a self-driven framework (DRAFT), which simulates tool use, analyzes feedback, and rewrites documentation to optimize LLM interpretability. Tools are clustered semantically to enable robust fallback and redundancy.
A dual-tower retrieval model, enhanced with collaborative learning (COLT), ensures that the Planner retrieves a complete and functionally diverse set of tools for each task, addressing the common issue of incomplete tool selection in dense retrieval settings.
2. DAG-Based Reasoning and Execution
Complex queries are decomposed into a DAG of atomic sub-tasks, each with explicit dependencies and tool bindings. The Executor traverses the DAG in topological order, executing sub-tasks in parallel where possible. The Master agent monitors execution, triggering localized re-planning upon failures or incomplete results, thus enhancing robustness and efficiency.
3. Reinforcement Learning for Multi-Agent Optimization
The system employs Group Relative Policy Optimization (GRPO) to jointly optimize agent policies, using a composite reward function that incorporates final answer correctness, user feedback, formatting, and intermediate execution success. This multi-agent RL approach aligns the objectives of all agents with the overall goal of high-quality answer generation.
4. Robust and Aligned Generation
The Writer agent is optimized for robustness against noisy or adversarially perturbed retrievals using adversarial tuning (ATM), and for alignment with RAG-specific requirements (PA-RAG), including informativeness, robustness, and citation quality. User feedback, both explicit and implicit, is incorporated via RL with human behaviors (RLHB), directly aligning generation with real-world user preferences.
A multi-agent PPO framework (MMOA-RAG) is introduced for end-to-end joint optimization of the Planner, Executor, and Writer, using a shared reward signal (e.g., F1 score) and agent-specific penalties to ensure cooperative behavior and training stability.
5. Efficient LLM Inference
The paper details both algorithmic and infrastructure-level optimizations for LLM inference, including:
- Local attention and structured pruning for parameter and compute reduction.
- Output length reduction via prompt engineering and training-based methods.
- Semantic caching and prefill-decode separation for infrastructure efficiency.
- Quantization and speculative decoding for further acceleration.
These techniques are critical for deploying the system at web scale, where latency and cost constraints are paramount.
Empirical Evaluation
The system is evaluated in both controlled and real-world settings (Baidu Search). Key findings include:
- For complex queries, the AI Search system achieves a 13% relative improvement in normalized win rate (NWR) over the legacy web search system, with statistically significant gains.
- Online A/B testing demonstrates reductions in change query rate (CQR) and increases in page views, daily active users, and dwell time, indicating improved user engagement and satisfaction.
- Case studies illustrate the system's ability to handle multi-step reasoning and tool orchestration, outperforming traditional systems on queries requiring evidence synthesis and computation.
Implications and Future Directions
The AI Search Paradigm represents a significant shift towards agentic, tool-augmented, and reasoning-centric search architectures. By externalizing reasoning into explicit, modular workflows and integrating dynamic tool use, the system addresses the limitations of both classical IR and current RAG models. The multi-agent design, with explicit planning, execution, and synthesis stages, provides a blueprint for scalable, trustworthy, and adaptive information-seeking systems.
Theoretically, this work bridges cognitive models of human information foraging with practical, LLM-driven architectures. Practically, it demonstrates the feasibility of deploying such systems at scale, with robust performance on complex, real-world queries.
Future research directions include:
- Extending the agentic framework to support more diverse tool types (e.g., multimodal, interactive, or domain-specific tools).
- Enhancing the Planner's reasoning capabilities with more advanced planning algorithms and meta-cognitive strategies.
- Investigating more sophisticated reward shaping and credit assignment in multi-agent RL for improved sample efficiency and stability.
- Exploring privacy-preserving and federated approaches for user feedback integration and semantic caching.
The modular, multi-agent paradigm outlined in this work is likely to inform the next generation of AI-powered search and decision-support systems, with broad applicability across domains requiring complex, tool-mediated reasoning and synthesis.