DeepAnalyze-8B: Autonomous Data Science LLM

Updated 21 October 2025

DeepAnalyze-8B is an autonomous data science language model integrating explicit action tokens for planning, data ingestion, code execution, and report synthesis.
It employs a two-stage, curriculum-based training paradigm that first fine-tunes individual abilities then reinforces multi-step tasks using group relative policy optimization.
The model autonomously orchestrates end-to-end data workflows, outperforming traditional workflow agents on diverse benchmarks with superior accuracy and completeness.

DeepAnalyze-8B is an agentic LLM developed for autonomous data science, capable of completing the full end-to-end pipeline from raw data ingestion to the generation of analyst-grade research reports. Built on an 8-billion-parameter architecture, it utilizes a curriculum-based training regime and a data-grounded trajectory synthesis framework to achieve expert-level data science proficiency, surpassing prior workflow-based agents, including those leveraging advanced proprietary LLMs (Zhang et al., 19 Oct 2025).

1. Model Architecture and Data-Oriented Interaction Mechanism

DeepAnalyze-8B extends a base LLM by integrating explicit support for stepwise interaction with structured data sources. The model’s vocabulary is augmented with five special action tokens:

Token	Function
⟨Analyze⟩	Planning, reasoning, self-reflection agents
⟨Understand⟩	Structured data ingestion and comprehension (tables, sheets)
⟨Code⟩	Data manipulation code (typically Python) generation
⟨Execute⟩	Output of code execution and environmental feedback
⟨Answer⟩	Final result or report synthesis

At inference, DeepAnalyze-8B autonomously generates action tokens to interact with external data (e.g., database files, CSVs), executing code and incorporating feedback into its reasoning chain. The model orchestrates the sequence: plan → analyze data → generate code → execute → reflect → synthesize answer, emulating the workflow of a skilled data scientist. Action tokens allow the model to choose contextually relevant next steps, with outputs of ⟨Code⟩ blocks executed in a live environment, and the results returned in ⟨Execute⟩ blocks.

2. Curriculum-Based Agentic Training Paradigm

Training DeepAnalyze-8B follows a two-stage curriculum, mirroring the staged learning of human data scientists:

Stage 1: Single-Ability Fine-Tuning

Each skill—reasoning (⟨Analyze⟩), structured data comprehension (⟨Understand⟩), code writing (⟨Code⟩)—is fine-tuned individually on curated datasets.

Stage 2: Multi-Ability Agentic Training

Multi-step end-to-end data science tasks are then learned under reinforcement learning (RL) using a group relative policy optimization (GRPO) objective:

$\mathcal{J}_{GRPO}(\theta) = \mathbb{E}_{q\sim D, \{o_i\}\sim\pi_{\theta_{\text{old}}}} \Big[ \frac{1}{G} \sum_i \min\Big( \frac{\pi_{\theta}(o_i\mid q)}{\pi_{\theta_{\text{old}}}(o_i\mid q)} A_i, \text{clip}(\frac{\pi_{\theta}(o_i\mid q)}{\pi_{\theta_{\text{old}}}(o_i\mid q)}, 1-\epsilon, 1+\epsilon) A_i \Big) - \beta D_{KL}(\pi_{\theta} \,\|\, \pi_{\text{ref}}) \Big]$

Here, $A_i$ is the multi-step reward for trajectory $o_i$ , clipping prevents instability, $D_{KL}$ regularizes to reference policy, and $\epsilon,\beta$ are hyperparameters of the curriculum. This staged progression facilitates gradual acquisition and integration of diverse skills, overcoming issues of reward sparsity and trajectory variance in complex, multi-turn data science problems.

3. Data-Grounded Trajectory Synthesis Framework

High-fidelity, multi-turn reasoning traces are fundamental to agentic training. DeepAnalyze-8B’s data-grounded trajectory synthesis encompasses:

a) Reasoning Trajectory Synthesis:

Starting from base QA pairs (e.g., TableQA), advanced teacher models distill comprehensive stepwise reasoning chains.
Keyword-guided refinement augments traces with explicit references to data inspection (e.g., “let’s take a closer look at the table”), aligning model reflection more closely to structured data understanding.

b) Interaction Trajectory Synthesis:

Employs a multi-agent protocol: A questioner formulates tasks/checklists, a solver uses the five actions with real data environments, and an inspector audits interactions against evaluation criteria.
This configuration yields rich, multi-action datasets (both supervised and RL) encompassing planning, code execution, environment feedback, and report synthesis.

4. Autonomous Data Science Capabilities

DeepAnalyze-8B autonomously orchestrates all aspects of data science:

Data Question Answering: Processes tabular and unstructured data for direct analytical responses.
Specialized Analytical Tasks: Executes data preparation, statistical analysis, visualization, and modeling based on dynamically selected tools/code.
Open-Ended Data Research: Synthesizes analyst-grade deep research reports, incorporating iterative exploration, self-reflection, and adaptive method selection throughout the workflow.

On benchmarks such as DataSciBench, DABStep, DS-1000, and TableQA, DeepAnalyze-8B outperforms workflow-based agents (even those powered by LLMs such as GPT-4/4-Turbo), scoring higher on both task success rate and completion metrics (Zhang et al., 19 Oct 2025). Completion is defined as full, stepwise, error-free execution culminating in an accurate and well-articulated report.

5. Open-Source Contributions and Community Impact

All aspects of DeepAnalyze-8B—model weights, codebase, and training sets—are open sourced, notably including DataScience-Instruct-500K (agentic trajectories and reasoning data). These resources provide:

Transparent evaluation and reproducibility for the community.
Extensible frameworks for researchers to adapt the agentic approach to new subdomains or bespoke data workflows.
Foundations for future work in fully autonomous research assistants in both academic and industrial data contexts.

6. Experimental Results and Quantitative Performance

Experiments across 12 benchmarks demonstrate state-of-the-art or superior results for the 8B model size:

Benchmark	Core Task	Performance Relative to Prior Workflow Agents
DataSciBench	TableQA, complex analytics	Higher success/completion rate
DSBench, DS-1000	Analysis, modeling, coding	Robust improvements in accuracy and coverage
DABStep	Research report synthesis	Superior quality and completeness

Hybrid RL reward modeling—combining trajectory interaction efficacy and output report sophistication—proved essential for optimizing holistic agentic performance. Ablations confirm both the curriculum-based paradigm and the ⟨Understand⟩ data-action token are critical in achieving final results.

7. Significance and Future Directions

DeepAnalyze-8B establishes a paradigm for agentic, autonomous data science LLMs. Its curriculum-based, multi-action, and self-reflective training enable robust, end-to-end orchestration without human-in-the-loop workflow constraints. The open-source release accelerates the field’s progress toward general, adaptive, and fully autonomous systems capable of deep scientific analysis and reporting.

A plausible implication is that similar curriculum and action-token paradigms can be extended to other domains requiring multi-step reasoning and autonomous tool use, such as experimental science, computational engineering, or medical research. Further advances may refine reward modeling, environmental interactions, and orchestration protocols for even higher levels of agentic intelligence.

PDF Markdown Chat (Pro)

References (1)

DeepAnalyze: Agentic Large Language Models for Autonomous Data Science (2025)

Follow Topic

Get notified by email when new papers are published related to DeepAnalyze-8B.