Papers
Topics
Authors
Recent
Search
2000 character limit reached

OpenSeeker-v2: Data-Centric LLM Search Agent

Updated 7 May 2026
  • The paper introduces OpenSeeker-v2, a 30-billion parameter model trained exclusively with supervised fine-tuning on synthetic, multi-hop search trajectories.
  • It employs the ReAct framework, integrating planning and diverse tool actions within a single autoregressive decoder and a 256k token context window.
  • The system outperforms prior models on multiple benchmarks, proving that rigorous data synthesis can drive state-of-the-art performance with modest compute.

OpenSeeker-v2 is a 30-billion parameter LLM based search agent developed using only supervised fine-tuning (SFT) on a dataset of high-difficulty, informative trajectories. Designed around the ReAct paradigm and capable of state-of-the-art (SOTA) performance across multiple benchmarks, OpenSeeker-v2 demonstrates that rigorous data synthesis, rather than multi-stage industrial training pipelines, can yield frontier web search agents with a modest compute and data regime (Du et al., 5 May 2026).

1. Model Architecture and Paradigm

OpenSeeker-v2 implements the ReAct framework, where the agent alternates between textual reasoning traces (rtr_t, e.g., “Thought:” or “Plan:” statements), tool call actions (ata_t), and tool-generated observations (oto_t) until a final answer (yy) is produced. This agent is realized as a single autoregressive decoder (Qwen3-30B-A3B-Thinking) supporting a 256k token context window, enabling substantial intermixed planning and action sequences. Each inference session may involve up to Tmax=200T_{max}=200 tool calls, with each step representing the composition:

rtatotr_t \rightarrow a_t \rightarrow o_t

The overall multi-step trajectory is formalized as:

τ=(r1,a1,o1,r2,a2,o2,...,rT,aT,oT,rT+1,y)\tau = (r_1, a_1, o_1, r_2, a_2, o_2, ..., r_T, a_T, o_T, r_{T+1}, y)

The training objective is standard autoregressive cross-entropy over teacher-forced trajectories:

LCE=tlogP(tokentcontext<t)\mathcal{L}_{CE} = -\sum_t \log P(\mathrm{token}^*_t | \mathrm{context}_{<t})

This supervised fine-tuning protocol tightly integrates planning ("where next to look") and execution ("which tool to invoke") within a single LLM backbone (Du et al., 5 May 2026).

2. Data Synthesis Pipeline and Modifications

OpenSeeker-v2's core advance lies in the generation of challenging synthetic multiturn search trajectories, produced via three distinct interventions:

2.1. Knowledge-Graph Scaling

Given a global web graph G=(V,E)G=(V,E), each synthetic scenario is generated by expanding a local subgraph from a seed node vseedv_{seed}, previously using a small radius ata_t0 (ata_t1). In v2, the expansion budget increases to ata_t2, expanding the subgraph's node and edge count by a scaling factor ata_t3, resulting in:

ata_t4

ata_t5

The synthetic question ata_t6 is drawn from the distribution ata_t7, ensuring that questions require deeper, multi-hop reasoning (i.e., evidence aggregation).

2.2. Expanded Tool-Set

The tool set is augmented from the original ata_t8 by adding ata_t9 new complementary primitives (e.g., "scroll," "extract_table," "summarize_page"), yielding:

oto_t0

This forces the agent to learn richer, more diverse action sequences through exposure to a broader range of tool interactions.

2.3. Strict Low-Step Filtering

To enforce higher task difficulty, trajectories oto_t1 with tool-call length oto_t2 below a threshold oto_t3 are discarded:

oto_t4

This filtering excludes straightforward one- or two-step searches, elevating the minimum reasoning horizon and preventing overfitting to shallow tasks.

Collectively, these modifications yield a compact dataset of 10.6k synthesized question–trajectory pairs, each requiring sustained, multi-step reasoning (Du et al., 5 May 2026).

3. Training Protocol

OpenSeeker-v2 is trained exclusively via SFT on the aforementioned dataset. No continual pre-training (CPT), reinforcement learning (RL), or data augmentation beyond the default Qwen3 SFT settings is used. Specific hyperparameters—such as batch size, learning rate schedule, and the number of epochs—are not detailed and inherit their values from the Qwen3 SFT regime.

This minimalist training approach demonstrates that with high-difficulty, multi-hop trajectories, SFT alone can drive specialized LLM agents to SOTA capabilities (Du et al., 5 May 2026).

4. Benchmarks and Evaluation

OpenSeeker-v2 is evaluated on four deep-research agentic benchmarks, each employing exact-match accuracy as the primary metric:

Benchmark Language Description Metric
BrowseComp English Multi-hop, open-ended web browsing Exact-match accuracy
BrowseComp-ZH Chinese As above, in Chinese Exact-match accuracy
Humanity’s Last Exam English Long-horizon, multi-stage “exam” queries Exact-match (whole question)
xbench-DeepSearch mixed Heterogeneous set: table extraction, synthesis Macro-averaged exact-match

Accuracy is calculated as the percentage of total test queries for which the model produces an exactly correct answer (Du et al., 5 May 2026).

5. Experimental Results and Comparisons

The following summarizes OpenSeeker-v2’s empirical results versus the previous SOTA Tongyi DeepResearch system (both are 30B ReAct agents):

Model (Training Strategy) BrowseComp BrowseComp-ZH HLE xbench
Tongyi DeepResearch (CPT+SFT+RL) 43.4 46.7 32.9 75.0
OpenSeeker-v2 (10.6k SFT) 46.0 58.1 34.6 78.0

OpenSeeker-v2 outperforms Tongyi DeepResearch by +2.6 points (BrowseComp), +11.4 (BrowseComp-ZH), +1.7 (HLE), and +3.0 (xbench) despite using only SFT and a significantly smaller training dataset, with no CPT or RL (Du et al., 5 May 2026). The report does not present formal statistical tests or confidence intervals, and isolated ablations for each data modification are not provided. Comparison to OpenSeeker-v1, which used 11.7k samples, demonstrates overall aggregate gains of +16.5, +9.7, and +4.0 points on BrowseComp, BrowseComp-ZH, and xbench respectively.

6. Theoretical and System Design Aspects

OpenSeeker-v2’s modular design can be contextualized within the framework of near-decomposability as articulated in the InfoSeeker blueprint (Lee et al., 3 Apr 2026). Complex agentic systems can be partitioned into semi-autonomous modules, allowing short-run independence among "Worker" entities executing atomic tool calls and long-run dependence via a "Host" coordinating high-level planning. The InfoSeeker-inspired architecture involves three tiers:

  • Host (oto_t5): Maintains overall context, issues subqueries, and aggregates step outputs.
  • Managers (oto_t6): Domain-specialist components responsible for decomposing tasks, reflecting on incomplete subtasks, and aggregating subtask results.
  • Workers (oto_t7): Perform atomic tool-based operations in parallel, isolated from the Host and each other except via their subtask output.

Manager-level aggregation, reflection, and strict context isolation enforce error boundaries and prevent cascading failures. Parallel execution among Workers reduces overall latency by exploiting concurrency, with empirical speed-ups in related frameworks of oto_t8 over baselines (Lee et al., 3 Apr 2026).

7. Implications, Limitations, and Future Directions

OpenSeeker-v2 serves as evidence that SFT on highly-informative, multi-hop synthetic trajectories is sufficient for training competitive frontier search agents at 30B parameter scale, without recourse to continual pre-training or reinforcement learning. The system’s performance advances indicate that careful design of data generation pipelines—specifically, knowledge-graph scaling, toolset enrichment, and stringent difficulty filtering—can substitute for expensive multi-stage industry protocols.

A plausible implication is the significant democratization of SOTA search-agent research. Academic and open-source teams with limited compute resources can now obtain comparable results by investing in data-centric rather than model-centric engineering. Notably, OpenSeeker-v2 represents the first state-of-the-art search agent at this scale and paradigm developed by a purely academic team using SFT only (Du et al., 5 May 2026).

Limitations include the lack of formal statistical assessment in the reported results and the absence of modular ablation studies isolating the effect of each intervention. Full adoption of the near-decomposable architectural stack (Host/Manager/Worker), dynamic worker pool allocation, adaptive subtask decomposition, and cost-aware scheduling as proposed in the InfoSeeker blueprint constitute promising directions for both scalability and robustness (Lee et al., 3 Apr 2026).

OpenSeeker-v2’s open-sourced model weights and the documented data synthesis approach provide a foundation for future research in data-centric and modular agent design.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to OpenSeeker-v2.