Papers

Topics

Authors

Recent

View all

Detailed Answer

Quick Answer

Concise responses based on abstracts only

Detailed Answer

Well-researched responses based on abstracts and relevant paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses

Gemini 2.5 Flash

Gemini 2.5 Flash 63 tok/s

Gemini 2.5 Pro 49 tok/s Pro

GPT-5 Medium 14 tok/s Pro

GPT-5 High 19 tok/s Pro

GPT-4o 100 tok/s Pro

Kimi K2 174 tok/s Pro

GPT OSS 120B 472 tok/s Pro

Claude Sonnet 4 37 tok/s Pro

2000 character limit reached

Universal Deep Research: Bring Your Own Model and Strategy (2509.00244v1)

Published 29 Aug 2025 in cs.AI

Abstract: Deep research tools are among the most impactful and most commonly encountered agentic systems today. We observe, however, that each deep research agent introduced so far is hard-coded to carry out a particular research strategy using a fixed choice of tools. We introduce Universal Deep Research (UDR), a generalist agentic system that wraps around any LLM and enables the user to create, edit, and refine their own entirely custom deep research strategies without any need for additional training or finetuning. To showcase the generality of our system, we equip UDR with example minimal, expansive, and intensive research strategies, and provide a user interface to facilitate experimentation with the system.

Collections

Summary

The paper introduces UDR, a system that decouples model selection from research strategy to enable customizable and agentic deep research workflows.
The paper describes a two-phase process—strategy processing and execution—that ensures reliable, auditable progress and reproducible reports.
The paper demonstrates that agentic behavior can be programmed in natural language, allowing users to control research strategies with enhanced efficiency and security.

Universal Deep Research: Architecture, Mechanism, and Implications

Deep Research Tools: Landscape and Limitations

Deep research tools (DRTs) have become central to automating search-intensive tasks in professional and personal contexts. DRTs typically accept a user prompt, autonomously search relevant resources, and produce structured research reports with continuous progress updates. Architecturally, DRTs combine a user interface for prompt input and progress visualization with agentic logic, which may be implemented via code orchestration or LLM-driven tool-calling. This is illustrated in the canonical DRT architecture:

Figure 1: A high-level diagram visualizing the components of a typical deep research tool. Unlike plain conversational LLMs, DRTs tend to continuously update the user on their progress before producing their report.

Current DRTs, such as Gemini, Perplexity, and OpenAI Deep Research, employ iterative web browsing and chain-of-thought reasoning, while enterprise solutions (e.g., NVIDIA AI-Q, SambaNova, ERP AI Deep Research) utilize specialized pipelines or graph-based architectures for internal document research. Despite their utility, these systems exhibit three critical limitations:

Rigid research strategies: Users cannot customize the research methodology beyond the prompt.
Lack of model interchangeability: The underlying LLM is fixed, preventing pairing of optimal models with preferred agentic logic.
Limited user agency: Users cannot enforce resource hierarchies, automate cross-validation, or control search expenses.

These constraints hinder the automation of specialized research workflows and restrict the competitive pairing of models and agentic systems.

Universal Deep Research: System Overview

Universal Deep Research (UDR) is introduced as a generalist agentic system that wraps around any LLM, enabling users to define, edit, and refine custom deep research strategies without additional training or finetuning. UDR accepts both a research strategy and a research prompt as inputs, abstaining from implicit agency unless explicitly instructed. The architecture is depicted below:

Figure 2: A high-level diagram visualizing the components of the UDR. Unlike specialized DRT, UDR receives both the research strategy and the research prompt from the user, allowing for a greater level of customization.

Research Mechanism

UDR operates in two phases:

Strategy Processing: The user-specified research strategy, typically a stepwise natural language description, is converted into executable code by a LLM. The system enforces a generator-based structure, with each step annotated by comments and notifications emitted via yield statements. This approach ensures stepwise fidelity and interpretability, mitigating shortcutting and semantic drift observed in prior prompt-based or fragmented code generation methods.
Strategy Execution: The generated code is executed in an isolated environment. All intermediate data are stored as named variables, decoupling state from the LM context and enabling operation within a small context window (8k tokens sufficed in experiments). Tool use is synchronous and deterministic, and LM reasoning is invoked only for localized tasks (e.g., summarization, ranking) as dictated by the strategy. Progress notifications are structured and user-defined, providing real-time, auditable updates.

Outputs

Notifications: Structured progress updates are emitted throughout execution, with explicit schema and user-controlled granularity.
Research Report: The final output is a comprehensive, Markdown-formatted report constructed from accumulated variable states, ensuring traceability and reproducibility.

User Interface and Interaction

UDR is compatible with any DRT-style UI, but a demonstration interface was developed to showcase its flexibility. The UI supports prompt input, strategy selection and editing, real-time progress notifications, and report visualization.

Figure 3: A screenshot of the user interface developed for the purpose of demonstrating UDR showing the search bar (top), strategy selection list (middle), and the strategy editing text area (bottom).

Figure 4: A screenshot of the UDR demonstration UI showing a completed research workflow, featuring the search bar (top), strategy selection list (top-middle), research progress notification visualizer (bottom-middle), and the report viewer (bottom).

Key features include the ability to interrupt research, generate preliminary reports, and directly edit research strategies in natural language, which are then compiled into executable agentic workflows.

Implementation Considerations

Reliability: End-to-end code generation for the entire strategy yields higher reliability and coherence than stepwise or prompt-based approaches. The code is interpretable and auditable, with minimal failure modes such as step skipping or misapplied logic.
Efficiency: Orchestration is CPU-bound, with LLM inference invoked only for focused reasoning tasks. This design minimizes GPU usage, latency, and cost.
Security: User-defined code is executed in a sandboxed environment (e.g., via Piston), preventing host system access and mitigating prompt injection or code-based exploits.
Scalability: The architecture supports arbitrary model backends and research strategies, facilitating deployment across diverse domains and workloads.

Limitations

LM Code Generation Quality: Fidelity to the strategy depends on the underlying LM's code generation capabilities. Ambiguous or underspecified strategies may result in semantic drift or hallucinated logic.
Strategy Soundness: UDR does not validate the logical coherence of user-authored strategies beyond syntactic and execution checks.
Limited Interactivity: Mid-execution user intervention is not supported; all decision logic must be encoded upfront.

Implications and Future Directions

UDR resolves the principal limitations of existing DRTs by decoupling model and strategy selection, enabling user-defined agentic workflows, and supporting arbitrary LLM backends. This architecture is particularly suited for high-value, specialized research tasks in domains such as finance, legal, healthcare, and government, where bespoke strategies and model selection are critical.

The approach demonstrates that agentic behavior can be "programmed" in natural language and compiled into deterministic, auditable workflows. However, the burden of strategy authoring may be prohibitive for end users; thus, future systems should provide libraries of modifiable strategies and explore automated strategy synthesis from user prompts.

Further research should investigate mechanisms for user control over LM reasoning, dynamic agentic workflows, and scalable deployment of UDR-like systems in enterprise and consumer contexts.

Conclusion

Universal Deep Research establishes a flexible, model-agnostic framework for deep research automation, granting users granular control over both research strategy and model selection. The system achieves reliable, efficient, and secure execution of custom agentic workflows, with explicit progress tracking and reproducible outputs. While strategy authoring remains a challenge, the architecture paves the way for future developments in user-programmable agentic AI, modular research automation, and scalable deployment across diverse domains.