Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Dynamic Prompt Learning via Policy Gradient for Semi-structured Mathematical Reasoning (2209.14610v3)

Published 29 Sep 2022 in cs.LG, cs.AI, cs.CL, and cs.CV

Abstract: Mathematical reasoning, a core ability of human intelligence, presents unique challenges for machines in abstract thinking and logical reasoning. Recent large pre-trained LLMs such as GPT-3 have achieved remarkable progress on mathematical reasoning tasks written in text form, such as math word problems (MWP). However, it is unknown if the models can handle more complex problems that involve math reasoning over heterogeneous information, such as tabular data. To fill the gap, we present Tabular Math Word Problems (TabMWP), a new dataset containing 38,431 open-domain grade-level problems that require mathematical reasoning on both textual and tabular data. Each question in TabMWP is aligned with a tabular context, which is presented as an image, semi-structured text, and a structured table. There are two types of questions: free-text and multi-choice, and each problem is annotated with gold solutions to reveal the multi-step reasoning process. We evaluate different pre-trained models on TabMWP, including the GPT-3 model in a few-shot setting. As earlier studies suggest, since few-shot GPT-3 relies on the selection of in-context examples, its performance is unstable and can degrade to near chance. The unstable issue is more severe when handling complex problems like TabMWP. To mitigate this, we further propose a novel approach, PromptPG, which utilizes policy gradient to learn to select in-context examples from a small amount of training data and then constructs the corresponding prompt for the test example. Experimental results show that our method outperforms the best baseline by 5.31% on the accuracy metric and reduces the prediction variance significantly compared to random selection, which verifies its effectiveness in selecting in-context examples.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Pan Lu (42 papers)
  2. Liang Qiu (36 papers)
  3. Kai-Wei Chang (292 papers)
  4. Ying Nian Wu (138 papers)
  5. Song-Chun Zhu (216 papers)
  6. Tanmay Rajpurohit (16 papers)
  7. Peter Clark (108 papers)
  8. Ashwin Kalyan (26 papers)
Citations (211)

Summary

  • The paper introduces PromptPG, a reinforcement learning approach that selects optimal in-context examples to enhance GPT-3's performance on math word problems.
  • It presents the TabMWP dataset with over 38K grade-level problems integrating text and tabular data for multi-step reasoning.
  • Experiments show PromptPG improves accuracy by 5.31% over baselines, stabilizing predictions on semi-structured problems.

Dynamic Prompt Learning via Policy Gradient for Semi-structured Mathematical Reasoning

The paper presents an investigation into the ability of LLMs, specifically GPT-3, to solve math word problems (MWPs) incorporating semi-structured data. Building on the recent success of LLMs in natural language processing tasks, this research explores an innovative approach called PromptPG, which harnesses policy gradient methods to dynamically select prompts for problem-solving.

Overview and Dataset

The authors introduce a new dataset, Tabular Math Word Problems (TabMWP), which consists of 38,431 open-domain, grade-level problems requiring mathematical reasoning over textual and tabular data. Each problem comprises a question and a corresponding tabular context presented in multiple formats: image, semi-structured text, and structured table. This dataset adds complexity to the standard MWP tasks as it demands integration of heterogeneous information.

TabMWP consists of free-text and multiple-choice questions, annotated with gold solutions to elucidate the multi-step reasoning required to solve each problem. The ability to reason over diverse data types in this context represents a significant evolution beyond traditional MWP datasets that primarily involve unstructured text.

Existing Challenges

While GPT-3 and its few-shot capabilities are significant advancements, their performance on complex problems such as those in TabMWP can be unstable, often deteriorating to near chance levels depending on the selection of in-context examples. This instability arises mainly when models need to handle multi-faceted data derived from varied question types and table structures.

Proposed Solution: PromptPG

To address this, the paper proposes PromptPG, a novel method using reinforcement learning. By applying policy gradient techniques, PromptPG learns to select the most suitable in-context examples to include in the prompts fed to the model. This method contrasts with random selection strategies traditionally employed, which do not reliably improve model stability.

The approach involves training an agent interacting with GPT-3, optimizing its performance by dynamically choosing examples that demonstrate the highest prediction accuracy. PromptPG builds on BERT-generated embeddings and applies a policy gradient strategy to guide the agent's decisions, ultimately reducing prediction variance and improving overall accuracy.

Experimental Validation

The authors conduct extensive experiments to benchmark several existing QA methods, GPT-3 in various settings, and PromptPG against the TabMWP dataset. The results indicate that PromptPG surpasses state-of-the-art baselines by 5.31% in accuracy, a noteworthy performance increase. Further, it stabilizes prediction outcomes significantly better than random selection methods. Additionally, the research explores how factors such as training set size and candidate selection affect the learning algorithm, identifying optimal configurations for best results.

Implications and Future Directions

This work demonstrates significant implications for the QA and AI fields. First, it establishes a new benchmark with TabMWP for evaluating models on problems requiring reasoning across structured and unstructured data modalities. Furthermore, PromptPG's framework underscores the potential of integrating reinforcement learning with LLMs to enhance their problem-solving capabilities in semi-structured environments.

Prospective research could explore scaling PromptPG to handle even more complex datasets or adapting it to other domains involving structured data. Additionally, further refining extraction methodologies or integrating holistic reasoning strategies could drive improvements in downstream applications utilizing AI for semi-structured data reasoning tasks.

In summary, this paper contributes an innovative dataset and a reinforcement learning-based approach that together push the boundaries of machine reasoning in handling complex mathematical problems interwoven with multi-modal data.