Metacognitive Capabilities of LLMs: An Exploration in Mathematical Problem Solving (2405.12205v1)

Published 20 May 2024 in cs.AI and cs.LG

Abstract: Metacognitive knowledge refers to humans' intuitive knowledge of their own thinking and reasoning processes. Today's best LLMs clearly possess some reasoning processes. The paper gives evidence that they also have metacognitive knowledge, including ability to name skills and procedures to apply given a task. We explore this primarily in context of math reasoning, developing a prompt-guided interaction procedure to get a powerful LLM to assign sensible skill labels to math questions, followed by having it perform semantic clustering to obtain coarser families of skill labels. These coarse skill labels look interpretable to humans. To validate that these skill labels are meaningful and relevant to the LLM's reasoning processes we perform the following experiments. (a) We ask GPT-4 to assign skill labels to training questions in math datasets GSM8K and MATH. (b) When using an LLM to solve the test questions, we present it with the full list of skill labels and ask it to identify the skill needed. Then it is presented with randomly selected exemplar solved questions associated with that skill label. This improves accuracy on GSM8k and MATH for several strong LLMs, including code-assisted models. The methodology presented is domain-agnostic, even though this article applies it to math problems.

PDF HTML Abstract

Metacognitive Capabilities of LLMs: An Exploration in Mathematical Problem Solving

Abstract

The paper "Metacognitive Capabilities of LLMs: An Exploration in Mathematical Problem Solving" investigates whether LLMs possess metacognitive knowledge and how this knowledge can be harnessed to improve mathematical problem-solving. Specifically, it explores the LLMs' ability to recognize and label the skills required to solve mathematical questions and use these labels to enhance problem-solving accuracy across different tasks and LLMs.

Introduction

The paper situates itself within the landscape of natural language processing and mathematical reasoning, acknowledging that while LLMs have exhibited significant advancements in general and domain-specific tasks, their capabilities in mathematical problem-solving are still fraught with limitations. The core concept investigated is metacognition—defined as thinking about one's own thinking processes—which, if present in LLMs, could be used to improve their performance in solving math problems.

Methodology

The primary methodology adopted in the paper includes the following steps:

Skill Labelling: Using a powerful LLM, e.g., GPT-4, the model is tasked with labeling each question in a math dataset with a specific skill required to solve it. The prompts used encourage the LLM to generate fine-grained and descriptive skill labels.
Skill Clustering: After generating numerous skill labels, the same LLM performs semantic clustering to group these fine-grained skills into broader, more manageable categories. Each cluster of skills is assigned a descriptive label, thereby creating a "Skill Exemplar Repository."
Inference: During the solving of test questions, the model uses the skill labels to identify relevant skills and retrieves corresponding exemplars from the repository. These exemplars are then used in-context to aid in solving the test questions.

Results

The experiments were conducted on various datasets, including the GSM8K dataset, which covers grade-school math problems, and the MATH dataset, known for its high difficulty. The findings demonstrate:

Accuracy Improvements: For the GSM8K dataset, the use of skill-exemplar-based in-context examples improved performance significantly over standard Chain-of-Thought (CoT) prompting methods, achieving an overall accuracy of 94.31% with self-consistency (maj@5) reaching 95.38%.
Enhanced Problem Solving on MATH Dataset: By utilizing skill-based in-context examples, the approach outperformed CoT prompting by 11.6% on average, indicating strong benefits across diverse mathematical topics such as Algebra, Geometry, and Probability.
Program-Based Enhancements: Integrating skill-based text examples with program-aided solutions (PAL) improved PAL performance by 7.52% on the MATH dataset.
Transferability: Skills identified by GPT-4 also improved performance of weaker LLMs like Mixtral 8x7B, and skills labeled on GSM8K were beneficial for other math word problem datasets, confirming the transferability and robustness of the skill knowledge across models and datasets.

Analysis

The paper identifies specific ways in which skill-based prompts enhance LLM performance:

Main Skill Success: The paper demonstrates that the primary advantage of skill-based prompting lies in its ability to reduce main skill errors, thereby allowing the model to focus more effectively on the pertinent mathematical concept.
Reduction of Secondary Errors: The approach also shows a reduction in secondary skill errors and calculation errors, highlighting a broader improvement in problem-solving accuracy.

Implications and Future Work

These findings have both practical and theoretical implications:

Practical Applications: Educators and developers of educational technologies can use similar methodologies to enhance the learning and teaching capabilities of AI systems, integrating metacognitive skill recognition and application.
Theoretical Insights: The research contributes to a deeper understanding of LLM metacognition, suggesting that these models possess a level of self-awareness regarding the skills they employ, which can be harnessed to improve their efficacy.
Future Developments: Future work may explore the application of this methodology to other problem-solving domains beyond mathematics, aiming to further generalize the findings. Additionally, the enhancement of skill annotation techniques and finer granularity of skills remains a promising area for exploration.

In conclusion, this paper provides compelling evidence that LLMs possess metacognitive knowledge that can be systematically leveraged to enhance their problem-solving abilities. The use of skill exemplars, validated through meticulous experimentation, underscores a novel approach to augmenting LLM capabilities in mathematical reasoning, with potential applications spanning far beyond this domain.