Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks (2211.12588v4)

Published 22 Nov 2022 in cs.CL and cs.AI
Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks

Abstract: Recently, there has been significant progress in teaching LLMs to perform step-by-step reasoning to solve complex numerical reasoning tasks. Chain-of-thoughts prompting (CoT) is by far the state-of-art method for these tasks. CoT uses LLMs to perform both reasoning and computation in the multi-step thought' process. To disentangle computation from reasoning, we proposeProgram of Thoughts' (PoT), which uses LLMs (mainly Codex) to express the reasoning process as a program. The computation is relegated to an external computer, which executes the generated programs to derive the answer. We evaluate PoT on five math word problem datasets (GSM, AQuA, SVAMP, TabMWP, MultiArith) and three financial-QA datasets (FinQA, ConvFinQA, TATQA) for both few-shot and zero-shot setups. Under both few-shot and zero-shot settings, PoT can show an average performance gain over CoT by around 12\% across all the evaluated datasets. By combining PoT with self-consistency decoding, we can achieve SoTA performance on all math problem datasets and near-SoTA performance on financial datasets. All of our data and code are released in Github https://github.com/wenhuchen/Program-of-Thoughts

Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks

The paper "Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks" investigates the limitations of conventional LLMs in handling complex numerical reasoning tasks and introduces a novel approach termed "Program of Thoughts" (PoT). Leveraging LLMs such as Codex, PoT succeeds in decoupling computation from reasoning by using LLMs to express reasoning in the form of programming code, thus relegating the actual computation to an external program interpreter like Python.

Research Context and Motivation

The motivation for this paper stems from the inherent limitations observed in LLMs when they attempt numerical calculations, especially with large numbers or complicated mathematical expressions. Traditional methods like Chain-of-Thought (CoT) require LLMs to perform both reasoning and computation, often leading to arithmetic errors, inefficiency in expressing iterative processes, and difficulty in solving complex equations.

Methodology

The authors introduce PoT as a refined prompting technique where the LLMs generate both natural language descriptions and programming language statements to reason through problems, delegating computation to an interpreter. This approach effectively leverages LLMs' language capabilities for reasoning while relying on precise computational execution by interpreters for arithmetic tasks. The paper makes the case that this separation enables more robust and accurate handling of numerical reasoning tasks, particularly in few-shot and zero-shot settings.

Evaluation

The effectiveness of PoT was evaluated across several datasets, including math word problem subsets like GSM8K and financial-QA datasets such as FinQA. The results indicate a substantial performance gain with PoT outperforming CoT by approximately 12% in zero-shot settings, and even better in few-shot contexts. When combined with self-consistency decoding, PoT achieves state-of-the-art results on several datasets.

Key Findings

Several significant findings emerge from the paper:

  1. Improved Accuracy: PoT demonstrates an average performance improvement over CoT of around 12% across diverse datasets, highlighting the enhanced accuracy in dealing with numerical tasks.
  2. Efficiency in Iterative Processes: By allowing the external interpreter to handle iteration and complex calculations, PoT addresses inefficiencies and limitations traditionally faced by LLMs in these areas.
  3. Generality Across Datasets: The approach was effective across a range of problem types, including both math and financial reasoning tasks, indicating a robust generalization capability.

Implications and Future Directions

The decoupling technique proposed through PoT opens doors to new applications in AI that require both reasoning and computation. Practically, this approach can be beneficial in real-world applications such as financial analysis, engineering simulations, and mathematical education tools.

Theoretical implications include a reassessment of the roles LLMs play in problem-solving tasks, encouraging a division of labor between symbolic reasoning and computation. Future work could explore enhancing transparency and error diagnosis in PoT-generated programs, or integrating more sophisticated programming capabilities to handle broader reasoning contexts.

Overall, this paper presents a significant step forward in improving numerical reasoning tasks using AI, and it suggests a promising direction for the integration of symbolic reasoning systems with contemporary machine learning models.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Wenhu Chen (134 papers)
  2. Xueguang Ma (36 papers)
  3. Xinyi Wang (152 papers)
  4. William W. Cohen (79 papers)
Citations (610)
Github Logo Streamline Icon: https://streamlinehq.com
Youtube Logo Streamline Icon: https://streamlinehq.com
Reddit Logo Streamline Icon: https://streamlinehq.com