Data Interpreter: An LLM Agent For Data Science (2402.18679v4)
Abstract: LLM-based agents have shown effectiveness across many applications. However, their use in data science scenarios requiring solving long-term interconnected tasks, dynamic data adjustments and domain expertise remains challenging. Previous approaches primarily focus on individual tasks, making it difficult to assess the complete data science workflow. Moreover, they struggle to handle real-time changes in intermediate data and fail to adapt dynamically to evolving task dependencies inherent to data science problems. In this paper, we present Data Interpreter, an LLM-based agent designed to automatically solve various data science problems end-to-end. Our Data Interpreter incorporates two key modules: 1) Hierarchical Graph Modeling, which breaks down complex problems into manageable subproblems, enabling dynamic node generation and graph optimization; and 2) Programmable Node Generation, a technique that refines and verifies each subproblem to iteratively improve code generation results and robustness. Extensive experiments consistently demonstrate the superiority of Data Interpreter. On InfiAgent-DABench, it achieves a 25% performance boost, raising accuracy from 75.9% to 94.9%. For machine learning and open-ended tasks, it improves performance from 88% to 95%, and from 60% to 97%, respectively. Moreover, on the MATH dataset, Data Interpreter achieves remarkable performance with a 26% improvement compared to state-of-the-art baselines. The code is available at https://github.com/geekan/MetaGPT.
- 01-ai. Yi-34B-Chat. https://huggingface.co/01-ai/Yi-VL-34B, 2023.
- Qwen technical report. arXiv preprint, 2023.
- Graph of thoughts: Solving elaborate problems with large language models. arXiv preprint, 2023.
- Deepseek llm: Scaling open-source language models with longtermism. arXiv preprint, 2024.
- The art and practice of data science pipelines: A comprehensive study of data science pipelines in theory, in-the-small, and in-the-large. In ICSE, 2022.
- Data science with llms and interpretable models. arXiv preprint, 2024.
- Large language models as tool makers. arXiv preprint, 2023.
- S-agents: self-organizing agents in open-ended environment. arXiv preprint, 2024a.
- An autonomous large language model agent for chemical literature data mining. arXiv preprint, 2024b.
- Program of thoughts prompting: Disentangling computation from reasoning for numerical reasoning tasks. arXiv preprint, 2022.
- Pal: Program-aided language models. In ICML, 2023.
- Critic: Large language models can self-correct with tool-interactive critiquing. arXiv preprint, 2023.
- Large language model based multi-agents: A survey of progress and challenges. arXiv preprint, 2024.
- Chatgpt as your personal data scientist. arXiv preprint, 2023.
- Measuring mathematical problem solving with the math dataset. arXiv preprint, 2021.
- Metagpt: Meta programming for multi-agent collaborative framework. arXiv preprint, 2023.
- Benchmarking large language models as ai research agents. arXiv preprint, 2023.
- Understanding the planning of llm agents: A survey. arXiv preprint, 2024.
- Mixtral of experts. arXiv preprint, 2024.
- Chain of code: Reasoning with a language model-augmented code emulator. arXiv preprint, 2023.
- Code as policies: Language model programs for embodied control. In ICRA, 2023.
- Controlllm: Augment language models with tools by searching on graphs. arXiv preprint, 2023.
- Winning solutions and post-challenge analyses of the chalearn autodl challenge 2019. TPAMI, 2021.
- Chameleon: Plug-and-play compositional reasoning with large language models. NeurIPS.
- Killian Lucas. GitHub - KillianLucas/open-interpreter: A natural language interface for computers — github.com. https://github.com/KillianLucas/open-interpreter, 2023.
- Self-refine: Iterative refinement with self-feedback. NeurIPS, 2024.
- Ml-plan: Automated machine learning via hierarchical planning. Machine Learning, 2018.
- Hierarchical automated machine learning (automl) for advanced unconventional reservoir characterization. Scientific Reports, 2023.
- OpenAI. GPT-4-Code-Interpreter. https://chat.openai.com/?model=gpt-4-code-interpreter, 2023.
- Art: Automatic multi-step reasoning and tool-use for large language models. arXiv preprint, 2023.
- Creator: Tool creation for disentangling abstract and concrete reasoning of large language models. In Findings of EMNLP, 2023.
- Taskweaver: A code-first agent framework. arXiv preprint, 2023.
- Mathematical discoveries from program search with large language models. Nature, 2023.
- Large language models to the rescue: Reducing the complexity in scientific workflow development using chatgpt. arXiv preprint, 2023.
- Toolformer: Language models can teach themselves to use tools. NeurIPS, 2024.
- Algorithm of thoughts: Enhancing exploration of ideas in large language models. arXiv preprint, 2023.
- Hugginggpt: Solving ai tasks with chatgpt and its friends in hugging face. NeurIPS, 2024.
- Past as a guide: Leveraging retrospective learning for python code completion. arXiv preprint, 2023.
- Reflexion: Language agents with verbal reinforcement learning. Advances in Neural Information Processing Systems, 2024.
- Prioritizing safeguarding over autonomy: Risks of llm agents for science. arXiv preprint, 2024.
- XAgent Team. Xagent: An autonomous agent for complex task solving, 2023.
- Llama 2: Open foundation and fine-tuned chat models. arXiv preprint, 2023.
- Marcel Waldvogel. Fast longest prefix matching: algorithms, analysis, and applications. 2000.
- Voyager: An open-ended embodied agent with large language models. arXiv preprint, 2023.
- Chain-of-thought prompting elicits reasoning in large language models. NeurIPS, 2022.
- Autogen: Enabling next-gen llm applications via multi-agent conversation framework. arXiv preprint, 2023a.
- An empirical study on challenging math problem solving with gpt-4. arXiv preprint, 2023b.
- Matplotagent: Method and evaluation for llm-based agentic scientific data visualization. arXiv preprint, 2024.
- React: Synergizing reasoning and acting in language models. arXiv preprint, 2022.
- Tree of thoughts: Deliberate problem solving with large language models. NeurIPS, 2024.
- Igniting language intelligence: The hitchhiker’s guide from chain-of-thought reasoning to language agents. arXiv preprint, 2023.
- Expel: Llm agents are experiential learners. arXiv preprint, 2023.
- Evolving fully automated machine learning via life-long knowledge anchors. TPAMI, 2021.
- Solving challenging math word problems using gpt-4 code interpreter with code-based self-verification. arXiv preprint, 2023a.
- Agents: An open-source framework for autonomous language agents. arXiv preprint, 2023b.
- Mindstorms in natural language-based societies of mind. arXiv preprint, 2023.
- Language agents as optimizable graphs. arXiv preprint, 2024.