Multi-Agent Agentic Search

Updated 19 March 2026

Multi-agent agentic search is the study of systems where multiple autonomous agents coordinate, compete, or interact to explore complex and dynamic spaces.
It integrates methodologies such as evolutionary computation, RL, and LLM-powered orchestration to achieve scalability, robustness, and improved task performance.
Applications span scientific discovery, legal analysis, and environmental monitoring, demonstrating significant performance gains over traditional single-agent approaches.

Multi-agent agentic search is the study and engineering of systems in which multiple autonomous agents coordinate, compete, or otherwise interact to conduct search and reasoning over complex spaces—ranging from combinatorial landscapes and open-ended information domains to high-dimensional continuous environments. In contrast to single-agent search, these systems leverage both distributed intelligence and structured interaction to optimize for scalability, robustness, adaptability, and task-specific performance. Recent advances integrate methodologies ranging from evolutionary computation and probabilistic planning to LLM-powered agent orchestration, with rigorous evaluation across scientific, engineering, and real-world information-seeking domains.

1. Formal Models and Problem Definitions

A unifying aspect of multi-agent agentic search is the explicitly multi-agent formulation of the search process, instantiated in various mathematical frameworks depending on the task:

Competitive Multi-Agent Search (CMAS): Formalizes agentic search on the NK landscape. Each agent operates over a discrete space of N-bit strings, with interactions via dynamically changing fitness landscapes subject to agent visitation and memory protocols (public vs. private caches) (Bahceci et al., 2023). Fitness perturbations encode both boosting (increased reward for novel discovery) and crowding (diminishing returns for revisiting crowded states), inducing a highly nonstationary search environment.
Multi-Agent Decision Process and Dec-POMDPs: Multi-agent agentic search tasks are rigorously modeled as decentralized partially observable Markov decision processes (Dec-POMDPs), where each agent receives partial views, maintains private memory, and coordinates via shared communication or message-passing (Wei et al., 18 Jan 2026). The objective is typically to maximize a group return function, such as

$J(\theta) = \mathbb{E} \left[ \sum_t \gamma^t \frac{1}{N} \sum_i r^i_t \right].$

Agentic Supernet Architecture: Workflows are defined as directed acyclic graphs of agentic operators (LLM calls plus tools), with a probabilistic supernet (distribution over operator choices and depths), enabling dynamic, query-conditioned system assembly (Zhang et al., 6 Feb 2025).
Asynchronous, Decentralized Active Search: Agents occupy a discretized environment, each independently updating a local belief (posterior) and making asynchronous Thompson-sampling-based action decisions to efficiently cover sparse target spaces. Coordination emerges from the stochasticity of the agents' choices and eventual data sharing (Ghods et al., 2020, Banerjee et al., 2022).
Multi-Agent Grammar Search: Linear or compositional workflows are generated via context-free grammar derivations over human-interpretable modules (step-wise reasoner, critic, voter, debate) (Singh et al., 16 Dec 2025).

2. Architectures, Interaction Protocols, and Roles

Recent systems instantiate multi-agent agentic search in both static and learned architectures, distinguished by interaction and orchestration modalities:

Fixed-role Pipelines: Architectures such as AutoGen, CAMEL, or L-MARS implement centralized-planning/decentralized-execution (CTDE) patterns: a manager agent decomposes queries, assigns subtasks to executor agents (retrievers, analysts), and aggregates/controls results via verifier agents (Wei et al., 18 Jan 2026, Wang et al., 31 Aug 2025).
Dynamic/Adaptive Graphs: Frameworks such as MaAS and GPTSwarm utilize a supernet over possible agentic topologies. A controller samples query-specific subgraph workflows, allocating both inference resources and role compositions based on task complexity (Zhang et al., 6 Feb 2025, Wei et al., 18 Jan 2026).
Decentralized, Fully Asynchronous Teams: In domains such as active search for targets or environmental monitoring, agents follow decentralized, interruptible policies—often Thompson-sampling or MCTS-based—with only intermittent or delayed data sharing. Collision, overlap, and redundancy are mitigated probabilistically or through optional event-driven communication (Ghods et al., 2020, Bakshi et al., 2023).
Role Specialization for Robustness: Architectures such as M-ASK factorize search into Search Behavior Agents (query planning, action) and Knowledge Management Agents (context pruning, filtering), orchestrated via turn-based activation and shared knowledge state (Chen et al., 8 Jan 2026).
LLM-native Multi-Agent Workflows: ARM and Grammar MAS approaches define reasoning modules as code objects (Python classes/functions) or context-free grammar derivations, automatically discovered by evolutionary or search algorithms, and supporting recursive, parallel, or compositional reasoning (Yao et al., 7 Oct 2025, Singh et al., 16 Dec 2025).

3. Algorithms and Optimization Methods

Optimization in multi-agent agentic search spans several computational paradigms:

Evolutionary Computation in Competitive Spaces: CMAS leverages neuroevolution—specifically NEAT-evolved CPPNs—to encode stochastic policies over discrete state/action tables. Evolution optimizes specialized or generalist strategies under fixed or heterogeneous opponent pools, measured by mean performance over many simulated competitive runs (Bahceci et al., 2023).
RL-based Policy Optimization: Group-relative advantage functions (GRPO; normalized by within-population statistics) stabilize multi-agent PPO updates for collaborative retrieval and reasoning (Wei et al., 18 Jan 2026, Chen et al., 8 Jan 2026).
Lookahead Planning and Pareto Optimization: In cost-aware or stealthy search, agents jointly plan over myopic and risk-adjusted reward using MCTS, Thompson sampling, and Pareto-front selection (trade-off between discovery and cost/stealth penalty) (Banerjee et al., 2022, Bakshi et al., 2023).
Agentic Architecture Search: The agentic supernet is optimized via Monte Carlo policy gradients over a surrogate loss comprising task performance and resource cost, with operator-level “textual gradient” LLM code edits for self-improvement (Zhang et al., 6 Feb 2025). Grammar search leverages forced sampling over component usage for statistically robust construction of performant, interpretable workflows (Singh et al., 16 Dec 2025).
Reflection-Guided Module Discovery: ARM uses tree search with LLM-powered Critic/Designer modules to mutate and optimize agentic reasoning modules through scaffolded execution trace analysis. The resulting modules serve as both reasoning primitives and orchestrator subroutines (Yao et al., 7 Oct 2025).

4. Empirical Findings and Performance

Multi-agent agentic search frameworks achieve state-of-the-art performance across diverse benchmarks:

Information-seeking tasks: In WideSearch and DeepWideSearch benchmarks, A-MapReduce delivers 5–17% item F1 improvements and 45.8% lower runtime versus baseline multi-agent systems, owing to explicit horizontal task-decomposition and memory-driven batching (Chen et al., 1 Feb 2026).
Scientific Discovery and Operator Learning: AgenticSciML demonstrates orders-of-magnitude improvements (10×–11,000× error reduction) over both single-agent baselines and expert-designed methods, driven by ensembles of debater, proposer, critic, and retriever agents cycling in evolutionary search (Jiang et al., 10 Nov 2025).
Legal and Complex QA: L-MARS achieves 98% accuracy and 0.39 U-Score on LegalSearchQA, improving by +9-12 pp in accuracy and 30% in uncertainty versus pure LLM baselines by enforcing judge-verified, criterion-driven search and synthesis (Wang et al., 31 Aug 2025).
Cost-aware Active Search: CAST and SPATS achieve near-linear scaling in time-to-full-recovery as the number of agents increases, with robust performance under asynchronous, decentralized execution and stringent cost regimes (Banerjee et al., 2022, Ghods et al., 2020).
Automated Reasoning and Mathematics: Grammar search MAS and ARM meta-orchestrators systematically surpass hand-crafted multi-agent pipelines on MATH, AIME, GPQA, and MMLU-Pro, with up to +2.4 pp average accuracy gain and zero invalid generation rate (Singh et al., 16 Dec 2025, Yao et al., 7 Oct 2025).

5. Dynamic Landscapes, Visualization, and Mechanistic Insights

Rich mechanistic behaviors and emergent phenomena arise in multi-agent agentic search:

Fitness Landscape Surfing: In CMAS, agents learn to “ride the wave” of fitness boosting—exploiting temporary gradients before crowding erodes reward. Spherical projections of the NK landscape make visible these trajectories and highlight the adaptive timing of local exploitation and disruptive exploration (Bahceci et al., 2023).
Workware Dynamic Adaptation: MaAS demonstrates adaptive depth allocation and early-exit in its workflows, dynamically shaping agentic system structure based on task difficulty—a feature critical for resource-efficient scale (Zhang et al., 6 Feb 2025).
Reflection and Verification Loops: In ARM and M-ASK, the modularity of multi-agent modules (head-to-head critique, weighted voting, verification) supports robustness under noise and variable information quality, with ablations confirming the crucial nature of dense credit assignment and intermediate filtering (Yao et al., 7 Oct 2025, Chen et al., 8 Jan 2026).

6. Challenges and Open Directions

Major research challenges span scalability, coordination, and governance:

World Modeling: Explicit, jointly trained world models for agent teams remain an open challenge, limiting true open-ended reasoning and simulation capabilities (Wei et al., 18 Jan 2026).
Distributed Memory and Knowledge Management: Persistent, decentralized memory systems for storing, querying, and aligning search traces and knowledge bases over long horizons are yet unsolved for complex agent teams (Wei et al., 18 Jan 2026, Chen et al., 8 Jan 2026).
Credit Assignment and Emergent Role Adaptation: Algorithmic advances such as group-relative PPO partially mitigate credit assignment ambiguity, but robust and scalable mechanisms for attributing outcome success or dynamically re-allocating roles in large, heterogenous agent collectives remain lacking (Wei et al., 18 Jan 2026).
Safety and Governance: Multi-agent agentic search in real-world web or tool environments introduces new governance risks, including coordinated misuse, feedback loops, and data poisoning. Auditable protocols, “guard” agents, and systematic oversight are critical future research areas (Wei et al., 18 Jan 2026).

7. Impact and Outlook

Multi-agent agentic search constitutes a foundational paradigm shift: distributing reasoning, search, and execution across autonomously interacting agents enables fundamental advances in scalability, adaptability, and solution quality across scientific discovery, information retrieval, planning, and collaborative problem-solving. By integrating components from evolutionary computation, RL, probabilistic planning, and LLM orchestration, these systems provide a template for robust, interpretable, and resource-efficient solutions to open-ended complex search tasks. Ongoing work continues to generalize these frameworks to new modalities, domains, and coordination regimes, with active research into automated system discovery, long-horizon memory management, and safe, auditable deployment in high-stakes settings (Bahceci et al., 2023, Wei et al., 18 Jan 2026, Zhang et al., 6 Feb 2025, Chen et al., 1 Feb 2026, Singh et al., 16 Dec 2025).