Deep Researcher with Test-Time Diffusion (2507.16075v1)

Published 21 Jul 2025 in cs.CL

Abstract: Deep research agents, powered by LLMs, are rapidly advancing; yet, their performance often plateaus when generating complex, long-form research reports using generic test-time scaling algorithms. Drawing inspiration from the iterative nature of human research, which involves cycles of searching, reasoning, and revision, we propose the Test-Time Diffusion Deep Researcher (TTD-DR). This novel framework conceptualizes research report generation as a diffusion process. TTD-DR initiates this process with a preliminary draft, an updatable skeleton that serves as an evolving foundation to guide the research direction. The draft is then iteratively refined through a "denoising" process, which is dynamically informed by a retrieval mechanism that incorporates external information at each step. The core process is further enhanced by a self-evolutionary algorithm applied to each component of the agentic workflow, ensuring the generation of high-quality context for the diffusion process. This draft-centric design makes the report writing process more timely and coherent while reducing information loss during the iterative search process. We demonstrate that our TTD-DR achieves state-of-the-art results on a wide array of benchmarks that require intensive search and multi-hop reasoning, significantly outperforming existing deep research agents.

Summary

The paper introduces the TTD-DR framework that refines draft research reports through iterative denoising with external retrieval and self-evolution.
It demonstrates superior performance over existing agents, achieving 69.1% and 74.5% win rates on LongForm Research and DeepConsult benchmarks respectively.
The method enhances components like research plan generation, question synthesis, and report production by mimicking the iterative process of human research.

Test-Time Diffusion Deep Researcher: An In-Depth Analysis

The paper "Deep Researcher with Test-Time Diffusion" (2507.16075) introduces a novel framework called Test-Time Diffusion Deep Researcher (TTD-DR) for generating complex, long-form research reports. The TTD-DR framework addresses the limitations of existing deep research agents by drawing inspiration from the iterative nature of human research, which involves cycles of planning, drafting, searching, and revision.

Core Components of TTD-DR

The TTD-DR framework consists of two core mechanisms:

Report-Level Refinement via Denoising with Retrieval: This mechanism involves iteratively refining an initial, noisy draft report by "denoising" it with the help of information retrieved from external sources. The draft is progressively updated based on the research plan.
Component-wise Optimization via Self-Evolution: This mechanism enhances the quality of each component within the agentic workflow, such as plan generation, question generation, answer synthesis, and report generation, through a self-evolutionary algorithm.
Figure 1: Illustration of our Test-Time Diffusion Deep Researcher (TTD-DR) framework, designed to mimic the iterative nature of human research through a draft. A user query initiates both a preliminary draft and a research plan. This evolving draft, along with the research plan, dynamically informs the generation of search questions and subsequent information retrieval to be timely and coherent, while reducing information loss. The retrieved information is then leveraged to denoise and refine the initial draft in a continuous feedback loop. The entire workflow is further optimized by a self-evolutionary algorithm to enhance the quality of the research plan, generated questions, answers, and the final report, demonstrating the synergistic power of diffusion and self-evolution in achieving superior research outcomes.

The authors draw an analogy between the report generation process and the sampling process in a diffusion model augmented by retrieval (ReDi). In this analogy, the initial draft report is analogous to a noisy image, and the denoising module, aided by retrieval tools, refines the draft into a higher-quality output.

Backbone Deep Research Agent

The backbone deep research agent consists of three major stages:

Research Plan Generation: A dedicated LLM agent generates a structured research plan based on the user query.
Iterative Search and Synthesis: This stage involves iteratively generating search questions based on the research plan and synthesizing precise answers from retrieved documents using a RAG-like system.
Final Report Generation: A unit LLM agent synthesizes all the gathered information to produce a comprehensive and coherent final report.

Component-wise Self-Evolution Details

The self-evolutionary algorithm is applied to each component of the agentic workflow to improve its performance. For example, in the case of search answer generation, the algorithm works as follows:

Multiple diverse variants of an output (e.g., several possible answers to a search query) are generated.
Each answer variant is assessed by an LLM-as-a-judge, which provides a fitness score and textual critiques.
Each variant undergoes a revision step based on the feedback received.
Multiple revised variants are merged into a single, high-quality output.
Figure 2: Illustration of the component-wise Self-Evolution applied to Search Answer (Stage 2b in Figure~\ref{fig:dr-backbone}). The process starts with multiple variants of initial answers. Each variant then undergoes a self-evolving episode where it first interacts with the environment to obtain a fitness score and feedback. It is then revised based on the feedback. This process repeats until the maximum number of iterations is reached. Finally, multiple revised variants from all episodes are merged to produce the final answer.

Report-Level Denoising with Retrieval Algorithm

The report-level denoising with retrieval algorithm involves iteratively refining the initial draft report by incorporating information retrieved from external sources. The algorithm works as follows:

An LLM generates an initial draft report based on the user's query.
The current draft report is fed into Stage 2a of the backbone DR workflow to inform the generation of the next search query.
After obtaining a synthesized answer in Stage 2b, the new information is used to revise the report draft.
This process is repeated until the search process concludes, at which point a final agent writes the final report based on all historical search answers and revisions.

Experimental Results

The authors evaluated TTD-DR on a variety of benchmarks that require intensive search and multi-hop reasoning, including LongForm Research, DeepConsult, HLE-search, HLE-Full, and GAIA. The results show that TTD-DR consistently outperforms existing deep research agents across all benchmarks. For example, compared to OpenAI Deep Research, TTD-DR achieves a 69.1% win rate on the LongForm Research dataset and a 74.5% win rate on the DeepConsult dataset.

Figure 3: Pareto frontier between DR agent performances and latency for HLE-search. The dots from left to right represent 1) Gemini-2.5-pro w/ search tool, 2) Backbone DR Agent, 3) +Self-evolution and 4) +Diffusion with Retrieval, which shows our final algorithm is most efficient in terms of test-time scaling (steepest slope).

Ablation Studies

The authors conducted ablation studies to assess the individual contributions of the two core mechanisms of TTD-DR. The results show that both the report-level refinement via denoising with retrieval and the component-wise optimization via self-evolution contribute significantly to the overall performance of the framework.

Implications and Future Directions

The TTD-DR framework has significant implications for the development of deep research agents. The framework's ability to generate high-quality research reports by iteratively refining an initial draft with the help of external information could be valuable in a variety of applications, such as scientific research, business intelligence, and journalism.

Future research could focus on extending the TTD-DR framework to incorporate other tools, such as web browsing and code execution. Additionally, research could explore the use of reinforcement learning to train the agents within the framework.

Conclusion

The TTD-DR framework represents a significant advancement in the field of deep research agents. The framework's novel approach to report generation, which is inspired by the iterative nature of human research, enables it to achieve state-of-the-art results on a variety of challenging benchmarks. The TTD-DR framework has the potential to transform the way research is conducted in a variety of domains.

PDF Markdown

Follow-up Questions

Related Papers

Authors (18)

First 10 authors:

Tweets

https://twitter.com/rohanpaul_ai/status/1948084623424270417

https://twitter.com/_akhaliq/status/1949814587559153687

https://twitter.com/TheTuringPost/status/1949819922575769827

https://twitter.com/hillbig/status/1948529708448338017

https://twitter.com/DrATrejoPhD/status/1954963637808009328

https://twitter.com/asankhaya/status/1949136958489497994