- The paper presents TaS, a framework that reformulates long-horizon agentic information seeking as a table completion task to enhance state management.
- It leverages a structured tabular schema to decouple planning from execution, enabling precise candidate filtering and efficient sub-agent orchestration.
- Empirical results demonstrate TaS’s superiority over baseline models in Deep, Wide, and DeepWide search scenarios, showing significant gains in accuracy and efficiency.
Motivation and Problem Statement
Long-horizon agentic information seeking tasks involve sequential, multi-step reasoning and large-scale retrieval across the web. Current agent frameworks, such as ReAct, maintain search state and planning information within plain-text contexts, leading to context fragility, state dilution, and frequent hallucinations in extended interactions. As task complexity and search horizon expand, agents display a pronounced "lost-in-the-middle" phenomenon, where critical information cannot be reliably tracked or synthesized.
TaS Framework: Design and Architecture
The Table-as-Search (TaS) framework addresses context fragility by introducing tabular, externally structured state management for InfoSeeking agents. TaS transforms queries into a structured schema, where rows correspond to candidate entities and columns represent constraints and required information. Table cells are used to record search history and results, while empty cells denote explicit plan items pending completion.
Figure 1: TaS reformulates InfoSeeking as a Table Completion task, explicitly managing search state and supporting Deep, Wide, and DeepWide paradigms.
The tabular schema unifies three InfoSeeking paradigms:
- Deep Search: Precise candidate filtering with multi-constraint verification.
- Wide Search: Broad candidate aggregation with minimal constraints.
- DeepWide Search: Simultaneous breadth-oriented exploration and depth-oriented verification and information collection.
TaS is implemented as a multi-agent system with a planner Main-Agent orchestrating specialized Sub-Agents. The planner initializes the schema, manages row expansion, and coordinates cell population in parallel, facilitating efficient search and deep attribute extraction. All state is offloaded to a persistent external database, overcoming unstructured context window limitations.
Formally, queries are mapped via ϕ(q)→S, creating a schema S=⟨K,C,I⟩, with K for candidates, C for constraints, and I for required information. Agent policy is then π(⋅∣q,τt​,Tt​), conditioning on structured table Tt​ and trajectory τt​ rather than unstructured text.
The execution pipeline follows three phases:
- Table Initialization: Parsing query, constructing schema, and initializing database table.
- Dynamic Orchestration: Planner selects between row expansion (candidate discovery) and cell population (attribute filling) based on table state, dispatching Sub-Agents in parallel.
- Answer Synthesis: Final response synthesis based on completed structured evidence.
TaS supports "plug-and-play" integration of advanced search models as Sub-Agents and persistent scalable storage, offering high flexibility and architectural modularity.
Empirical Evaluation and Results
Extensive experiments are conducted across Deep Search (GAIA, BrowseComp-ZH), Wide Search (WideSearch), and DeepWide Search (new BD benchmark) scenarios. TaS is compared against single- and multi-agent ReAct baselines, compute-scaled variants, commercial systems (e.g., Gemini DeepResearch), and specialized agentic RL-trained models.
Deep Search: TaS consistently achieves superior accuracy, particularly with cost-efficient models (e.g., Gemini-2.5-Flash), which, under TaS, outperformed larger Multi-Agent ReAct baselines by a margin of +14% on GAIA. When task does not require external search, overhead associated with table management may cause minor regression, confirming TaS's specialization for open-world InfoSeeking.
Wide Search: TaS demonstrates holistic superiority, achieving higher Success Rates with smaller models and outperforming computation-heavy baselines in maximum recall. Precision and recall are both improved, with structured table constraints effectively filtering noise during large-scale aggregation.
Figure 2: Robustness analysis reveals TaS's superiority as task complexity increases in both BrowseComp-ZH and WideSearch.
DeepWide Search: On real-world BD cases, TaS outperforms both ReAct and Gemini DeepResearch, with gains of +4.7% in entity accuracy and +5.1% in information precision. By decoupling planning and execution, TaS allows efficient sub-agent replacement (e.g., fine-tuned 32B models), demonstrating architectural scalability.
Efficiency and Scaling: TaS consistently achieves higher performance at comparable or lower tool usage volumes. Test-time scaling experiments on BrowseComp-ZH and WideSearch illustrate that TaS benefits more from increased compute allocation than baseline methods.
Figure 3: Gemini-2.5-Flash shows higher search efficiency under TaS on both Deep and Wide benchmarks.
Figure 4: Test-time scaling analysis confirms TaS's effectiveness in leveraging expanded compute, amplifying performance margins.
Analytical Insights and Ablation Studies
TaS's robustness is validated: performance gaps over baselines widen as complexity increases due to precise state tracking and explicit planning. Efficiency analyses demonstrate that performance stems from planning quality, not brute-force search scaling. Ablation studies further reveal that the main planner's reasoning capability is critical, while sub-agents may be efficiently replaced with specialized or smaller models without significant degradation.
Qualitative Case Analyses
TaS prevents failure modes endemic to unstructured agents:
Practical and Theoretical Implications
TaS enables decoupling of control and execution in agentic InfoSeeking, improving scalability, robustness, and query fidelity for real-world applications. By externalizing state, TaS overcomes inherent limitations of context compression, paving the way for high-density retrieval and stable performance in industrial-scale tasks. The plug-and-play compatibility with state-of-the-art search models ensures forward-compatibility and operational flexibility. Architecturally, TaS can serve as a canonical framework for explicit state management and planning in AGI-scale research systems.
Limitations and Future Directions
TaS's structured approach introduces rigidity for tasks better served by free-form reasoning. Adaptive mechanisms for switching between tabular and text-centric modalities are warranted. Performance is bounded by planner model strength; future optimization via agentic RL may further unlock TaS's potential. While context compression strategies are orthogonal to TaS and can be integrated, TaS fundamentally distinguishes itself by persistent externalization of state.
Scalability of evaluation, particularly in DeepWide benchmarks, remains contingent on human-in-the-loop verification due to open-ended nature. Iterative ground truth maintenance offers partial mitigation but does not fully resolve reproducibility constraints.
Conclusion
Table-as-Search (TaS) redefines agentic InfoSeeking by shifting from brittle, unstructured context management to explicit, tabular state completion. Robust empirical results confirm TaS's effectiveness across depth, width, and hybrid paradigms, with demonstrated gains in robustness, efficiency, and scalability. TaS thus provides a viable path forward for large-scale, long-horizon agent architectures, offering a blueprint for structured planning in future AI research systems (2602.06724).