DatawiseAgent: A Notebook-Centric LLM Agent Framework for Adaptive and Robust Data Science Automation

Published 10 Mar 2025 in cs.CL and cs.AI | (2503.07044v2)

Abstract: Existing LLM agents for automating data science show promise, but they remain constrained by narrow task scopes, limited generalization across tasks and models, and over-reliance on state-of-the-art (SOTA) LLMs. We introduce DatawiseAgent, a notebook-centric LLM agent framework for adaptive and robust data science automation. Inspired by how human data scientists work in computational notebooks, DatawiseAgent introduces a unified interaction representation and a multi-stage architecture based on finite-state transducers (FSTs). This design enables flexible long-horizon planning, progressive solution development, and robust recovery from execution failures. Extensive experiments across diverse data science scenarios and models show that DatawiseAgent consistently achieves SOTA performance by surpassing strong baselines such as AutoGen and TaskWeaver, demonstrating superior effectiveness and adaptability. Further evaluations reveal graceful performance degradation under weaker or smaller models, underscoring the robustness and scalability.

Abstract PDF Upgrade to Chat

Authors (8)

Summary

The paper presents a unified notebook interface and a multi-stage finite-state transducer for adaptive data science automation.
It details flexible DFS-like planning, incremental execution, and self-debugging strategies to overcome code execution failures.
Experimental results demonstrate robust performance and cost efficiency on benchmarks like DSBench, InfiAgent-DABench, and MatplotBench.

Overview of "DatawiseAgent: A Notebook-Centric LLM Agent Framework for Adaptive and Robust Data Science Automation"

The paper "DatawiseAgent: A Notebook-Centric LLM Agent Framework for Adaptive and Robust Data Science Automation" focuses on creating an effective framework for automating data science tasks using LLMs. The main contributions include a unified interaction representation within computational notebooks and a multi-stage architecture based on finite-state transducers to facilitate adaptive, robust task execution across diverse data scenarios.

Unified Interaction Representation

DatawiseAgent leverages computational notebooks as the central interface for performing data science tasks. This unified interaction representation expresses all agent interactions as sequences of markdown and executable code cells. This design choice aims to mimic human data scientists' workflows, facilitating long-horizon planning and progressive solution development.

Figure 1: DatawiseAgent performs diverse data science tasks across various models by operating entirely within a computational notebook.

The notebook-centric approach integrates environment details, tool descriptions, and user instructions into a coherent format, enabling the seamless execution of tasks with rich feedback and interaction.

FST-Based Multi-Stage Architecture

The framework introduces a multi-stage architecture modeled as a non-deterministic finite-state transducer (NFST) to govern the agent's behavior. This modular architecture facilitates transitions across four functional stages: DFS-like planning, incremental execution, self-debugging, and post-filtering.

Figure 2: State transition diagram of the FST-based multi-stage architecture.

The NFST enables adaptive exploration and robust recovery from execution failures, guiding the agent through complex task completions in a structured manner. This architecture supports modular extension and fine-grained ablation of components for optimizations.

Stage Details

DFS-Like Planning and Incremental Execution

These stages allow flexible exploration and progressive task completion by dynamically selecting actions based on task progress and feedback. The agent constructs tree-structured task trajectories through non-linear planning and executes these incrementally using markdown and code cells.

Figure 3: Illustration of DatawiseAgent’s task-completion process through DFS-like planning and incremental execution.

Code Repair via Self-Debugging and Post-Filtering

This module focuses on robust recovery from code execution failures. It integrates advanced debugging techniques and generates concise diagnostic reports to prevent misinformation accumulation and guide future decisions.

Experimental Results

The paper evaluates DatawiseAgent on benchmarks including DSBench, InfiAgent-DABench, and MatplotBench, demonstrating consistent state-of-the-art performance across tasks and models.

Figure 4: Performance comparison showing inference time across various tasks.

Figure 5: Performance across Qwen2.5 models showcasing robust results.

DatawiseAgent excels in adaptability and robustness under varying model capacities, achieving high success rates and relative performance gaps while offering competitive cost efficiencies.

Conclusion

DatawiseAgent provides a novel framework for robust data science automation, enabling adaptive planning and recovery strategies through a notebook-centric, multi-stage architecture. This design offers scalability and robustness across diverse LLM configurations and data scenarios, setting a strong baseline for future adaptive data science agent frameworks.

The implementation highlights the potential for these frameworks to automate complex workflows efficiently, catering to resource-constrained environments and enhancing real-world applicability. Future work could explore tool integrations and human-in-the-loop collaboration in broader domain contexts.

Markdown Report Issue