Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
51 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

ART: Automatic multi-step reasoning and tool-use for large language models (2303.09014v1)

Published 16 Mar 2023 in cs.CL
ART: Automatic multi-step reasoning and tool-use for large language models

Abstract: LLMs can perform complex reasoning in few- and zero-shot settings by generating intermediate chain of thought (CoT) reasoning steps. Further, each reasoning step can rely on external tools to support computation beyond the core LLM capabilities (e.g. search/running code). Prior work on CoT prompting and tool use typically requires hand-crafting task-specific demonstrations and carefully scripted interleaving of model generations with tool use. We introduce Automatic Reasoning and Tool-use (ART), a framework that uses frozen LLMs to automatically generate intermediate reasoning steps as a program. Given a new task to solve, ART selects demonstrations of multi-step reasoning and tool use from a task library. At test time, ART seamlessly pauses generation whenever external tools are called, and integrates their output before resuming generation. ART achieves a substantial improvement over few-shot prompting and automatic CoT on unseen tasks in the BigBench and MMLU benchmarks, and matches performance of hand-crafted CoT prompts on a majority of these tasks. ART is also extensible, and makes it easy for humans to improve performance by correcting errors in task-specific programs or incorporating new tools, which we demonstrate by drastically improving performance on select tasks with minimal human intervention.

Overview of ART: Automatic Multi-Step Reasoning and Tool-Use for LLMs

The paper introduces Automatic Reasoning and Tool-use (ART), a framework designed to enhance the performance of LLMs in executing multi-step reasoning tasks and utilizing external tools. LLMs have demonstrated significant potential in performing complex reasoning tasks in few-shot and zero-shot settings by generating intermediate reasoning steps, known as chain of thought (CoT) steps. However, traditional CoT prompting and tool usage often rely on task-specific demonstrations or a careful orchestration between model-generated content and tool employment. ART aims to automate this process without requiring additional finetuning of the LLM.

Core Contributions

ART's primary contribution is its ability to dynamically generate reasoning steps as a program by using a frozen LLM, integrating tool use seamlessly when external computation is necessary. The framework achieves this by employing a library of task demonstrations and a selection mechanism to choose related tasks for few-shot learning. ART's design allows generation to pause when a tool is needed, incorporate the tool’s output, and then proceed with the reasoning process.

Numerical Results

Evaluation of ART shows it achieves significant improvements over few-shot prompting and automatic CoT on unseen tasks across multiple benchmarks like BigBench and MMLU. In particular, ART outperforms traditional few-shot learning approaches on unseen tasks by 10.8% on average, with the tool-use component contributing an additional improvement of over 12.3 percentage points. ART's performance is competitive with hand-crafted CoT prompts on many tasks, with a noted enhancement in arithmetic and algorithmic tasks.

Practical and Theoretical Implications

Practically, ART's framework offers an extensible system that allows human users to enhance task performance by adding new tools or correcting errors with minimal intervention. This adaptability implies that ART can assimilate more sophisticated tools or integrate updated information, broader knowledge, and emerging computational techniques seamlessly. Theoretically, ART propels forward the methodology for improving LLM's execution abilities without altering their core architecture, potentially transforming how models can be enhanced in high-stakes domains like legal reasoning, scientific exploration, and complex data manipulation.

Future Developments

The paper hints at exciting potential trajectories for AI development. Future LLM iterations with scaled finetuning could unlock even greater utility from ART. Furthermore, the adaptive framework suggests potential for cross-task transfer learning, where learning and tool-use improvements can be systematically leveraged across varied domains, opening new avenues for LLM deployment in multidisciplinary fields.

In conclusion, ART represents a step forward in automating reasoning and tool use with LLMs, broadening the sophistication and applicability of these models in handling diverse, real-world tasks. The framework sets a precedent for future AI systems’ capacity to extend their reasoning capabilities beyond static modeling, coupling learned operations with dynamic, computationally potent tool interactions.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Bhargavi Paranjape (18 papers)
  2. Scott Lundberg (17 papers)
  3. Sameer Singh (96 papers)
  4. Hannaneh Hajishirzi (176 papers)
  5. Luke Zettlemoyer (225 papers)
  6. Marco Tulio Ribeiro (20 papers)
Citations (119)
X Twitter Logo Streamline Icon: https://streamlinehq.com
Youtube Logo Streamline Icon: https://streamlinehq.com