Tongyi DeepResearch: Agentic Research System

Updated 29 October 2025

Tongyi DeepResearch is an advanced agentic large language model engineered for long‑horizon deep research, integrating iterative reasoning with dynamic tool usage.
It employs a novel multi-agent workflow that separates tactical reasoning from strategic context management, enhancing error correction and information synthesis.
The system is trained via synthetic data generation and reinforcement learning, achieving state‑of‑the‑art performance on benchmarks with reduced computational overhead.

Tongyi DeepResearch is an agentic LLM explicitly engineered for long‑horizon, deep information‑seeking research tasks. It is designed to emulate and amplify the research process of human experts by interleaving iterative reasoning, dynamic tool usage, and context‐managed synthesis of information across multi‑step tasks. The system is open‑source and structured to enable scalable autonomous research by combining foundation model pre‑training, synthetic data generation via fully automated pipelines, and advanced reinforcement learning for agentic post‑training.

1. Architecture and Agentic Design

Tongyi DeepResearch is built on the Qwen3‑30B‑A3B‑Base architecture and is parameter‑efficient: although its total parameter count is 30.5 billion, only 3.3 billion parameters are activated per token, facilitating reduced computational overhead during inference. Its design follows the ReAct paradigm, interleaving internal “thought” and “action” signals with environment observations. The model implements a long‑horizon context management strategy by periodically synthesizing working memory into a compressed report, thereby mitigating context window limitations. This architectural separation of tactical execution from strategic oversight is achieved through the integration of specialized agents and context managers that dynamically curate salient information over extended research trajectories.

2. Training Methodology and Data Synthesis

The training pipeline of Tongyi DeepResearch is structured into two principal stages. In the mid‑training phase (Agentic Continual Pre‑training), the model is exposed to large‑scale synthetic data featuring multi‑step question–reasoning–action–decision trajectories, some spanning up to 128K tokens. A fully automated data synthesis pipeline generates diverse research questions and plans via agentic simulation without costly human annotation. Following mid‑training, the post‑training stage comprises supervised fine‑tuning on demonstration trajectories—with mixed ReAct and context‑management modes—and subsequent on‑policy reinforcement learning. The RL training employs policy gradient methods with clipping and advantage estimation using a formulation similar to GRPO. Finally, model variants are merged via weighted parameter interpolation to enhance robustness and generalization.

3. Multi-Agent Workflow and Context Management

A core innovation of Tongyi DeepResearch is its agentic design that recasts the research process into a multi‑agent workflow. Tactical agents, operating in a ReAct‑style loop, handle step‑by‑step reasoning and tool calls based on a concise, dynamically refreshed context provided by a dedicated Context Manager. Simultaneously, asynchronous Meta‑Thinker agents monitor the evolving research trajectory, detect anomalies such as repetitive failures or reasoning drift, and issue strategic interventions (e.g., revise, pivot, verify, or conclude). This explicit role separation avoids context overload and enables robust error correction over long reasoning chains. Such a dual‑loop process—where each iteration integrates fresh, curated context with strategic oversight—ensures that the model’s research outputs remain coherent and comprehensive even on tasks that require deep, multi‑step synthesis.

4. Benchmark Evaluation and Performance

Tongyi DeepResearch has been evaluated on a range of agentic deep research benchmarks including Humanity’s Last Exam, BrowseComp, BrowseComp‑ZH, WebWalkerQA, xbench‑DeepSearch, FRAMES, and xbench‑DeepSearch‑2510. Empirical evaluations demonstrate state‑of‑the‑art performance, with reported improvements over both traditional retrieval‑augmented generation systems and proprietary deep research agents. For instance, on benchmarks such as BrowseComp and GAIA, the model outperforms competitors by maintaining higher accuracy despite the challenges of deep information retrieval and reasoning. Performance metrics such as mean accuracy, macro‑F1 scores, and significance testing via non‑parametric tests (e.g., the Wilcoxon signed‑rank test) confirm that while the system exhibits strong surface‑level extraction of research tasks, it still faces challenges in generating innovative, cross‑disciplinary research questions that require subtle, non‑trivial reasoning.

5. Technical Innovations and Mathematical Foundations

Key technical innovations in Tongyi DeepResearch include its dynamic context compression mechanism and its versatile multi‑agent architecture. The context management paradigm is expressed as a Markovian state reconstruction: $S_t, \tau_{t+1}, a_{t+1} \sim \pi(\cdot | S_{t-1}, a_t, o_t)$ where $S_t$ represents the synthesized report that encapsulates historical reasoning while discarding irrelevant detail. In reinforcement learning, the training objective is formalized via a token‑level policy gradient loss with clipping, as in: $\mathcal{J}(\theta) = \mathbb{E}\left[\frac{1}{\sum_{i=1}^{G}|\mathcal{H}^i|}\sum_{i=1}^{G}\sum_{j=1}^{|\mathcal{H}^i|} \min \left( r_{i,j}(\theta) \hat{A}_{i,j}, \text{clip}(r_{i,j}(\theta), 1-\varepsilon_{low}, 1+\varepsilon_{high}) \hat{A}_{i,j} \right) \right]$ with $r_{i,j}(\theta)$ representing the probability ratio between current and previous policies. These formalizations underpin the agent’s ability to learn from multi‑step interactions and gradually refine its research trajectory in an end‑to‑end automated pipeline.

6. Impact and Future Directions

Tongyi DeepResearch sets a new benchmark for automated, agent‑based research by demonstrating that comprehensive, multi‑step reasoning can be achieved through the integration of advanced context management and multi‑agent reinforcement learning. Its open‑source nature and scalable architecture provide a reproducible framework for future research in deep agentic systems. Future work may focus on enhancing model creativity to propose more innovative and interdisciplinary research tasks, improving knowledge integration for rare or long‑tail scenarios, and extending the agentic paradigm to additional domains beyond scientific research. The explicit documentation of every decision step enhances transparency and lays the groundwork for more trustworthy and interpretable autonomous research systems.

7. Conclusion

By combining efficient parameterization, dynamic context management, and a rigorous multi‑agent training paradigm, Tongyi DeepResearch embodies a full‑stack solution for deep, long‑horizon research automation. Its performance across diverse benchmarks confirms its potential as a research assistant capable of complex planning, evidence synthesis, and strategic tool usage. As the system continues to evolve, its open‑source framework promises to drive further advances in agentic reasoning, enabling more reliable, interpretable, and innovative applications in both science and other research-intensive fields.

Markdown Report Issue Upgrade to Chat

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Tongyi DeepResearch.