Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Towards AI Search Paradigm (2506.17188v1)

Published 20 Jun 2025 in cs.CL, cs.AI, and cs.IR

Abstract: In this paper, we introduce the AI Search Paradigm, a comprehensive blueprint for next-generation search systems capable of emulating human information processing and decision-making. The paradigm employs a modular architecture of four LLM-powered agents (Master, Planner, Executor and Writer) that dynamically adapt to the full spectrum of information needs, from simple factual queries to complex multi-stage reasoning tasks. These agents collaborate dynamically through coordinated workflows to evaluate query complexity, decompose problems into executable plans, and orchestrate tool usage, task execution, and content synthesis. We systematically present key methodologies for realizing this paradigm, including task planning and tool integration, execution strategies, aligned and robust retrieval-augmented generation, and efficient LLM inference, spanning both algorithmic techniques and infrastructure-level optimizations. By providing an in-depth guide to these foundational components, this work aims to inform the development of trustworthy, adaptive, and scalable AI search systems.

Towards a Modular, Multi-Agent AI Search Paradigm

The paper "Towards AI Search Paradigm" (Li et al., 20 Jun 2025 ) presents a comprehensive architectural and methodological framework for next-generation search systems, advancing beyond both classical information retrieval (IR) and current retrieval-augmented generation (RAG) approaches. The authors propose a modular, multi-agent system that emulates human-like information seeking and decision-making, with explicit mechanisms for dynamic task decomposition, tool orchestration, robust execution, and context-aware answer synthesis.

Motivation and Context

Traditional IR systems, including lexical and learning-to-rank (LTR) models, are limited by their reliance on static document retrieval and ranking, often requiring users to synthesize information manually. RAG systems, while enabling direct answer generation, are typically single-shot and struggle with complex, multi-hop, or tool-requiring queries. The paper identifies a critical gap: current systems lack the cognitive flexibility and multi-stage reasoning necessary for complex information needs, such as those involving evidence aggregation, tool use, and dynamic planning.

Multi-Agent Architecture

The proposed AI Search Paradigm is structured around four specialized, LLM-powered agents:

  • Master Agent: Analyzes query complexity and intent, dynamically assembles agent teams, and oversees execution with reflective re-planning.
  • Planner Agent: Decomposes complex queries into a directed acyclic graph (DAG) of sub-tasks, selects tools via a Model-Context Protocol (MCP) platform, and adapts the system's capability boundary.
  • Executor Agent: Executes sub-tasks, invoking external tools as needed, and incorporates fallback mechanisms for tool failures.
  • Writer Agent: Synthesizes results from all sub-tasks, performing disambiguation, filtering, and multi-perspective answer generation.

This modular design enables the system to adaptively configure workflows based on query complexity, supporting three execution modes: Writer-only (simple queries), Executor-inclusive (moderately complex queries), and Planner-enhanced (complex, multi-step queries).

Methodological Innovations

1. Dynamic Task Planning and Tool Integration

The Planner agent leverages a dynamic capability boundary, selecting a relevant subset of tools for each query. Tool documentation is iteratively refined using a self-driven framework (DRAFT), which simulates tool use, analyzes feedback, and rewrites documentation to optimize LLM interpretability. Tools are clustered semantically to enable robust fallback and redundancy.

A dual-tower retrieval model, enhanced with collaborative learning (COLT), ensures that the Planner retrieves a complete and functionally diverse set of tools for each task, addressing the common issue of incomplete tool selection in dense retrieval settings.

2. DAG-Based Reasoning and Execution

Complex queries are decomposed into a DAG of atomic sub-tasks, each with explicit dependencies and tool bindings. The Executor traverses the DAG in topological order, executing sub-tasks in parallel where possible. The Master agent monitors execution, triggering localized re-planning upon failures or incomplete results, thus enhancing robustness and efficiency.

3. Reinforcement Learning for Multi-Agent Optimization

The system employs Group Relative Policy Optimization (GRPO) to jointly optimize agent policies, using a composite reward function that incorporates final answer correctness, user feedback, formatting, and intermediate execution success. This multi-agent RL approach aligns the objectives of all agents with the overall goal of high-quality answer generation.

4. Robust and Aligned Generation

The Writer agent is optimized for robustness against noisy or adversarially perturbed retrievals using adversarial tuning (ATM), and for alignment with RAG-specific requirements (PA-RAG), including informativeness, robustness, and citation quality. User feedback, both explicit and implicit, is incorporated via RL with human behaviors (RLHB), directly aligning generation with real-world user preferences.

A multi-agent PPO framework (MMOA-RAG) is introduced for end-to-end joint optimization of the Planner, Executor, and Writer, using a shared reward signal (e.g., F1 score) and agent-specific penalties to ensure cooperative behavior and training stability.

5. Efficient LLM Inference

The paper details both algorithmic and infrastructure-level optimizations for LLM inference, including:

  • Local attention and structured pruning for parameter and compute reduction.
  • Output length reduction via prompt engineering and training-based methods.
  • Semantic caching and prefill-decode separation for infrastructure efficiency.
  • Quantization and speculative decoding for further acceleration.

These techniques are critical for deploying the system at web scale, where latency and cost constraints are paramount.

Empirical Evaluation

The system is evaluated in both controlled and real-world settings (Baidu Search). Key findings include:

  • For complex queries, the AI Search system achieves a 13% relative improvement in normalized win rate (NWR) over the legacy web search system, with statistically significant gains.
  • Online A/B testing demonstrates reductions in change query rate (CQR) and increases in page views, daily active users, and dwell time, indicating improved user engagement and satisfaction.
  • Case studies illustrate the system's ability to handle multi-step reasoning and tool orchestration, outperforming traditional systems on queries requiring evidence synthesis and computation.

Implications and Future Directions

The AI Search Paradigm represents a significant shift towards agentic, tool-augmented, and reasoning-centric search architectures. By externalizing reasoning into explicit, modular workflows and integrating dynamic tool use, the system addresses the limitations of both classical IR and current RAG models. The multi-agent design, with explicit planning, execution, and synthesis stages, provides a blueprint for scalable, trustworthy, and adaptive information-seeking systems.

Theoretically, this work bridges cognitive models of human information foraging with practical, LLM-driven architectures. Practically, it demonstrates the feasibility of deploying such systems at scale, with robust performance on complex, real-world queries.

Future research directions include:

  • Extending the agentic framework to support more diverse tool types (e.g., multimodal, interactive, or domain-specific tools).
  • Enhancing the Planner's reasoning capabilities with more advanced planning algorithms and meta-cognitive strategies.
  • Investigating more sophisticated reward shaping and credit assignment in multi-agent RL for improved sample efficiency and stability.
  • Exploring privacy-preserving and federated approaches for user feedback integration and semantic caching.

The modular, multi-agent paradigm outlined in this work is likely to inform the next generation of AI-powered search and decision-support systems, with broad applicability across domains requiring complex, tool-mediated reasoning and synthesis.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (21)
  1. Yuchen Li (84 papers)
  2. Hengyi Cai (20 papers)
  3. Rui Kong (9 papers)
  4. Xinran Chen (5 papers)
  5. Jiamin Chen (14 papers)
  6. Jun Yang (357 papers)
  7. Haojie Zhang (21 papers)
  8. Jiayi Li (62 papers)
  9. Jiayi Wu (32 papers)
  10. Yiqun Chen (20 papers)
  11. Changle Qu (5 papers)
  12. Keyi Kong (4 papers)
  13. Wenwen Ye (5 papers)
  14. Lixin Su (15 papers)
  15. Xinyu Ma (49 papers)
  16. Long Xia (25 papers)
  17. Daiting Shi (10 papers)
  18. Jiashu Zhao (13 papers)
  19. Haoyi Xiong (98 papers)
  20. Shuaiqiang Wang (68 papers)
Youtube Logo Streamline Icon: https://streamlinehq.com