Achieving >97% on GSM8K: Deeply Understanding the Problems Makes LLMs Better Solvers for Math Word Problems (2404.14963v4)

Published 23 Apr 2024 in cs.CL and cs.AI

Abstract: Chain-of-Thought (CoT) prompting has enhanced the performance of LLMs across various reasoning tasks. However, CoT still falls short in dealing with complex math word problems, as it usually suffers from three pitfalls: semantic misunderstanding errors, calculation errors, and step-missing errors. Prior studies involve addressing the calculation errors and step-missing errors, but neglect the semantic misunderstanding errors, which is the major factor limiting the reasoning performance of LLMs. To this end, we propose a simple-yet-effective method, namely Deeply Understanding the Problems (DUP), to improve the LLMs' math problem-solving ability by addressing semantic misunderstanding errors. The core of our method is to encourage the LLMs to deeply understand the problems and extract the key problem-solving information used for better reasoning. Extensive experiments on 10 diverse reasoning benchmarks show that our DUP method consistently outperforms the other counterparts by a large margin. More encouragingly, DUP achieves a new SOTA result on the GSM8K benchmark, with an accuracy of 97.1% under the zero-shot setting.

Citations (1)

View on Semantic Scholar

Summary

The paper introduces a novel DUP prompting method that drives LLMs to fully comprehend math problems before proposing solutions.
It demonstrates state-of-the-art improvements with GSM8K accuracy rising from 94.6% to over 97% and comparable gains on other datasets.
The approach effectively reduces reasoning errors and paves the way for extending deep problem understanding to more complex AI tasks.

Enhancing LLM Reasoning with DUP Prompting: Techniques and Evaluations across Diverse Datasets

Introduction to DUP Prompting

The paper presents a novel prompting strategy known as Deep Understanding Problems (DUP) designed to enhance the reasoning capabilities of LLMs. Unlike existing methods that incrementally lead LLMs through a reasoning process, DUP prompting emphasizes a comprehensive understanding of the entire problem before proposing a solution. This approach involves three key stages: extracting the core question, extracting relevant problem-solving information, and generating accurate responses based on this information.

Benefits of DUP Prompting

The primary advantage of DUP prompting over traditional Chain of Thought (CoT) prompting methods is its focus on thoroughly understanding the complete context or scenario of a problem. This technique helps in reducing different categories of errors such as understanding, calculation, and process errors, which are frequently encountered in existing approaches. Remarkable improvements have been noted across various datasets, demonstrating the effectiveness of DUP in enhancing reasoning accuracy.

Experimental Results

Experiments conducted on ten reasoning datasets show significant outperformance of DUP prompting over Zero-Shot CoT and other baseline methods, including Few-shot manual CoT and automatic CoT setups. For instance, on SVAMP and GSM8K datasets, DUP prompting attained state-of-the-art performance with scores improving from 94.6% to 97.1% and 90.4% to 94.2%, respectively. These results testify to the superior reasoning process facilitated by a deeper problem understanding achieved through DUP prompting.

Implications and Future Directions

The demonstrated success of DUP prompts in fostering a better problem understanding in LLMs suggests potential expansions beyond the tested scope. While this prompting strategy excels in reducing typical reasoning errors and enhancing problem-solving accuracy in current LLMs, its integration with other AI systems or its adaptation for more complex or novel task formats could be explored further.

Conclusive Thoughts

The DUP prompting strategy marks a significant methodological improvement in the way LLMs perform reasoning tasks. By enabling models to first comprehend the problem in its entirety, it reduces error rates and increases solution accuracy significantly. These enhancements open new vistas for the application of LLMs in sectors requiring critical reasoning and decision-making capabilities. Future research could explore extending this approach to more complex reasoning or even different forms of cognitive tasks beyond the current datasets and models.

Related Papers

Tweets

https://twitter.com/arankomatsuzaki/status/1782944318573199397

https://twitter.com/fly51fly/status/1783252195510730933

https://twitter.com/BenjaminKlieger/status/1822649196845187249

https://twitter.com/asankhaya/status/1831952509868634426

https://twitter.com/knishimae0531/status/1783086910115905968

https://twitter.com/jurajsalapa/status/1783245723066101821

YouTube

Show All Videos