Search-o1: Agentic Search-Enhanced Large Reasoning Models (2501.05366v1)

Published 9 Jan 2025 in cs.AI, cs.CL, and cs.IR

Abstract: Large reasoning models (LRMs) like OpenAI-o1 have demonstrated impressive long stepwise reasoning capabilities through large-scale reinforcement learning. However, their extended reasoning processes often suffer from knowledge insufficiency, leading to frequent uncertainties and potential errors. To address this limitation, we introduce \textbf{Search-o1}, a framework that enhances LRMs with an agentic retrieval-augmented generation (RAG) mechanism and a Reason-in-Documents module for refining retrieved documents. Search-o1 integrates an agentic search workflow into the reasoning process, enabling dynamic retrieval of external knowledge when LRMs encounter uncertain knowledge points. Additionally, due to the verbose nature of retrieved documents, we design a separate Reason-in-Documents module to deeply analyze the retrieved information before injecting it into the reasoning chain, minimizing noise and preserving coherent reasoning flow. Extensive experiments on complex reasoning tasks in science, mathematics, and coding, as well as six open-domain QA benchmarks, demonstrate the strong performance of Search-o1. This approach enhances the trustworthiness and applicability of LRMs in complex reasoning tasks, paving the way for more reliable and versatile intelligent systems. The code is available at \url{https://github.com/sunnynexus/Search-o1}.

Authors (8)

Xiaoxi Li (24 papers)
Guanting Dong (46 papers)
Jiajie Jin (14 papers)
Yuyao Zhang (52 papers)
Yujia Zhou (34 papers)
Yutao Zhu (63 papers)
Peitian Zhang (23 papers)
Zhicheng Dou (113 papers)

Summary

The paper introduces Search-o1, a framework designed to enhance large reasoning models (LRMs) with an agentic retrieval-augmented generation (RAG) mechanism and a Reason-in-Documents module, to address knowledge insufficiency in extended reasoning processes. The paper identifies that LRMs often encounter knowledge gaps during complex reasoning, leading to uncertainties and potential errors. Search-o1 integrates an agentic search workflow, enabling dynamic retrieval of external knowledge when LRMs face uncertain knowledge points. Additionally, it incorporates a Reason-in-Documents module to refine retrieved documents, minimizing noise and preserving coherent reasoning flow.

The paper's core contributions are:

Proposing Search-o1, which integrates an agentic search workflow into the reasoning process of LRM for autonomous knowledge supplementation.
Combining the reasoning process with an agentic RAG mechanism and a knowledge refinement module.
Demonstrating the performance of Search-o1 across complex reasoning domains and open-domain question answering (QA) benchmarks.

The method section details the problem formulation, Search-o1 framework, agentic retrieval-augmented generation mechanism, knowledge refinement via Reason-in-Documents, and Search-o1 inference process.

The problem is formalized as generating a comprehensive solution for a question $q$ , consisting of a logical reasoning chain $\mathcal{R}$ and a final answer $a$ , based on task instruction $I$ , question $q$ , and externally retrieved documents $\mathcal{D}$ . The objective is expressed as the mapping $(I, q, \mathcal{D}) \rightarrow (\mathcal{R}, a)$ , with the generation process defined as:

$P(\mathcal{R}, a \mid I, q, \mathcal{D}) = \underbrace{\prod_{t=1}^{T_r} P(\mathcal{R}_{t} \mid \mathcal{R}_{<t}, I, q, \mathcal{D}_{<t})}_{\text{Reasoning Process}} \cdot \underbrace{\prod_{t=1}^{T_a} P(a_t \mid a_{<t}, \mathcal{R}, I, q)}_{\text{Answer Generation}}$

where:

$T_r$ is the number of tokens in the reasoning sequence $\mathcal{R}$
$\mathcal{R}_t$ is the token at position $t$
$\mathcal{R}_{\textless t}$ represents all tokens generated before position $t$
$\mathcal{D}_{\leq t}$ represents all documents retrieved up to token $t$ in the reasoning chain
$T_a$ is the length of the answer sequence $a$
$a_t$ is the token at position $t$
$a_{\textless t}$ indicates all generated answer tokens before position $t$

The agentic RAG mechanism empowers the reasoning model to autonomously determine when to retrieve external knowledge during the reasoning process. During the generation of the reasoning chain $\mathcal{R}$ , the model may generate search queries $q_{\text{search}^{(i)}}$ encapsulated between special symbols. The generation of each search query is expressed as:

$P(q_{\text{search}^{(i)}} \mid I, q, \mathcal{R}^{(i-1)}) = \prod_{t=1}^{T_q^{(i)}} P\left(q_{\text{search}, t}^{(i)} \mid q_{\text{search}, <t}^{(i)}, I, q, \mathcal{R}^{(i-1)}\right)$

where:

$T_q^{(i)}$ is the length of the $i$ -th search query
$q_{\text{search}, t}^{(i)}$ denotes the token generated at step $t$ of the $i$ -th search query
$\mathcal{R}^{(i-1)}$ represents all the reasoning steps prior to the $i$ -th search step, including both search queries and search results

The knowledge refinement module, Reason-in-Documents, selectively integrates relevant and concise information into the reasoning chain. For each search step $i$ , given $\mathcal{R}^{(<i)}$ , the current search query $q_{\text{search}^{(i)}}$ , and the retrieved documents $\mathcal{D}^{(i)}$ , the knowledge refinement process operates in two stages: generating an intermediate reasoning sequence ${r}_{\text{docs}^{(i)}}$ and producing refined knowledge ${r}_{\text{final}^{(i)}}$ .

The generation of the intermediate reasoning sequence ${r}_{\text{docs}^{(i)}}$ is expressed as:

$P({r}_{\text{docs}^{(i)}} \mid \mathcal{R}^{(<i)}, q_{\text{search}^{(i)}}, \mathcal{D}^{(i)}) = \prod_{t=1}^{T_d^{(i)}} P\left({r}_{\text{docs}, t}^{(i)} \mid {r}_{\text{docs}, <t}^{(i)}, \mathcal{R}^{(<i)}, q_{\text{search}^{(i)}}, \mathcal{D}^{(i)}\right)$

where:

$T_d^{(i)}$ is the length of the intermediate reasoning sequence
${r}_{\text{docs}, t}^{(i)}$ denotes the token at step $t$

The refined knowledge ${r}_{\text{final}^{(i)}}$ is then generated based on this analysis:

$P({r}_{\text{final}^{(i)}} \mid {r}_{\text{docs}^{(i)}}, \mathcal{R}^{(<i)}, q_{\text{search}^{(i)}}) = \prod_{t=1}^{T_r^{(i)}} P\left({r}_{\text{final}, t}^{(i)} \mid {r}_{\text{final}, <t}^{(i)}, {r}_{\text{docs}^{(i)}}, \mathcal{R}^{(<i)}, q_{\text{search}^{(i)}}\right)$

where:

$T_r^{(i)}$ is the length of the refined knowledge sequence
${r}_{\text{final}, t}^{(i)}$ denotes the token at step $t$

The refined knowledge ${r}_{\text{final}^{(i)}}$ is then incorporated into the reasoning chain $\mathcal{R}^{(i)}$ .

The Search-o1 inference process begins by initializing the reasoning sequence with the task instruction $I$ concatenated with the specific question $q$ . The reasoning model $\mathcal{M}$ may produce search queries, triggering the retrieval function to obtain relevant external documents $\mathcal{D}$ . These retrieved documents, along with the reason-in-documents instruction and the current reasoning sequence $\mathcal{R}$ , are then processed by the Reason-in-Documents module. A batch inference mechanism is also employed to efficiently handle multiple questions simultaneously.

The experimental setup involves evaluations on challenging reasoning tasks, including GPQA, MATH500, AMC2023, AIME2024, and LiveCodeBench, as well as open-domain QA tasks such as Natural Questions (NQ), TriviaQA, HotpotQA, 2WikiMultihopQA, MuSiQue, and Bamboogle. The baselines include direct reasoning methods using models like Qwen2.5-32B-Instruct, Qwen2.5-Coder-32B-Instruct, QwQ-32B-Preview, Qwen2.5-72B-Instruct, Llama3.3-70B-Instruct, DeepSeek-R1-Lite-Preview, OpenAI GPT-4o, and o1-preview, as well as retrieval-augmented reasoning methods like standard RAG and RAG Agent (RAgent).

The implementation details specify the use of QwQ-32B-Preview as the backbone LRM, with specific generation settings and the Bing Web Search API for retrieval. A back-off strategy is applied when a final answer is not provided.

Key results include:

QwQ-32B-Preview consistently shows superior performance compared to traditional instruction-tuned LLMs.
RAgent-QwQ-32B surpasses both standard RAG-based models and direct reasoning QwQ-32B in most tasks.
Search-o1 outperforms RAgent-QwQ-32B in most tasks, demonstrating the effectiveness of the Reason-in-Documents strategy.
Scaling analysis demonstrates that Search-o1 can effectively leverage an increasing number of retrieved documents.
Comparison with human experts on the GPQA extended set shows that Search-o1 outperforms human experts in overall performance, as well as in both physics and biology.
On open-domain QA tasks, retrieval significantly improves performance for both reasoning and non-reasoning models across all tasks.
Search-o1 generally outperforms all baselines on multi-hop tasks.

PDF Markdown

Related Papers

GitHub

GitHub - sunnynexus/Search-o1: Search-o1: Agentic Search-Enhanced Large Reasoning Models (17 stars)

Tweets

https://twitter.com/_reachsumit/status/1877553975085433193

https://twitter.com/omarsar0/status/1877742482152436055

https://twitter.com/_philschmid/status/1879546619231883714

https://twitter.com/fly51fly/status/1877832916635398151

https://twitter.com/TheTuringPost/status/1878241786151526729

https://twitter.com/NovaSpectre_/status/1887637500090589328

YouTube

Show All Videos

HackerNews

Search-O1: Agentic Search-Enhanced Large Reasoning Models (2 points, 0 comments)