Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks (2211.12588v4)

Published 22 Nov 2022 in cs.CL and cs.AI

Abstract: Recently, there has been significant progress in teaching LLMs to perform step-by-step reasoning to solve complex numerical reasoning tasks. Chain-of-thoughts prompting (CoT) is by far the state-of-art method for these tasks. CoT uses LLMs to perform both reasoning and computation in the multi-step thought' process. To disentangle computation from reasoning, we proposeProgram of Thoughts' (PoT), which uses LLMs (mainly Codex) to express the reasoning process as a program. The computation is relegated to an external computer, which executes the generated programs to derive the answer. We evaluate PoT on five math word problem datasets (GSM, AQuA, SVAMP, TabMWP, MultiArith) and three financial-QA datasets (FinQA, ConvFinQA, TATQA) for both few-shot and zero-shot setups. Under both few-shot and zero-shot settings, PoT can show an average performance gain over CoT by around 12\% across all the evaluated datasets. By combining PoT with self-consistency decoding, we can achieve SoTA performance on all math problem datasets and near-SoTA performance on financial datasets. All of our data and code are released in Github https://github.com/wenhuchen/Program-of-Thoughts

Citations (610)

View on Semantic Scholar

Summary

The paper demonstrates that decoupling computation from reasoning via the Program of Thoughts improves numerical accuracy by around 12% over traditional methods.
It introduces a method where language models generate both natural language descriptions and programming code, delegating arithmetic to an external interpreter.
The approach shows robust performance across diverse datasets, including math and financial reasoning tasks, and paves the way for advanced AI applications.

Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks

The paper "Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks" investigates the limitations of conventional LLMs in handling complex numerical reasoning tasks and introduces a novel approach termed "Program of Thoughts" (PoT). Leveraging LLMs such as Codex, PoT succeeds in decoupling computation from reasoning by using LLMs to express reasoning in the form of programming code, thus relegating the actual computation to an external program interpreter like Python.

Research Context and Motivation

The motivation for this paper stems from the inherent limitations observed in LLMs when they attempt numerical calculations, especially with large numbers or complicated mathematical expressions. Traditional methods like Chain-of-Thought (CoT) require LLMs to perform both reasoning and computation, often leading to arithmetic errors, inefficiency in expressing iterative processes, and difficulty in solving complex equations.

Methodology

The authors introduce PoT as a refined prompting technique where the LLMs generate both natural language descriptions and programming language statements to reason through problems, delegating computation to an interpreter. This approach effectively leverages LLMs' language capabilities for reasoning while relying on precise computational execution by interpreters for arithmetic tasks. The paper makes the case that this separation enables more robust and accurate handling of numerical reasoning tasks, particularly in few-shot and zero-shot settings.

Evaluation

The effectiveness of PoT was evaluated across several datasets, including math word problem subsets like GSM8K and financial-QA datasets such as FinQA. The results indicate a substantial performance gain with PoT outperforming CoT by approximately 12% in zero-shot settings, and even better in few-shot contexts. When combined with self-consistency decoding, PoT achieves state-of-the-art results on several datasets.

Key Findings

Several significant findings emerge from the paper:

Improved Accuracy: PoT demonstrates an average performance improvement over CoT of around 12% across diverse datasets, highlighting the enhanced accuracy in dealing with numerical tasks.
Efficiency in Iterative Processes: By allowing the external interpreter to handle iteration and complex calculations, PoT addresses inefficiencies and limitations traditionally faced by LLMs in these areas.
Generality Across Datasets: The approach was effective across a range of problem types, including both math and financial reasoning tasks, indicating a robust generalization capability.

Implications and Future Directions

The decoupling technique proposed through PoT opens doors to new applications in AI that require both reasoning and computation. Practically, this approach can be beneficial in real-world applications such as financial analysis, engineering simulations, and mathematical education tools.

Theoretical implications include a reassessment of the roles LLMs play in problem-solving tasks, encouraging a division of labor between symbolic reasoning and computation. Future work could explore enhancing transparency and error diagnosis in PoT-generated programs, or integrating more sophisticated programming capabilities to handle broader reasoning contexts.

Overall, this paper presents a significant step forward in improving numerical reasoning tasks using AI, and it suggests a promising direction for the integration of symbolic reasoning systems with contemporary machine learning models.