Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Achieving >97% on GSM8K: Deeply Understanding the Problems Makes LLMs Better Solvers for Math Word Problems (2404.14963v4)

Published 23 Apr 2024 in cs.CL and cs.AI

Abstract: Chain-of-Thought (CoT) prompting has enhanced the performance of LLMs across various reasoning tasks. However, CoT still falls short in dealing with complex math word problems, as it usually suffers from three pitfalls: semantic misunderstanding errors, calculation errors, and step-missing errors. Prior studies involve addressing the calculation errors and step-missing errors, but neglect the semantic misunderstanding errors, which is the major factor limiting the reasoning performance of LLMs. To this end, we propose a simple-yet-effective method, namely Deeply Understanding the Problems (DUP), to improve the LLMs' math problem-solving ability by addressing semantic misunderstanding errors. The core of our method is to encourage the LLMs to deeply understand the problems and extract the key problem-solving information used for better reasoning. Extensive experiments on 10 diverse reasoning benchmarks show that our DUP method consistently outperforms the other counterparts by a large margin. More encouragingly, DUP achieves a new SOTA result on the GSM8K benchmark, with an accuracy of 97.1% under the zero-shot setting.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Qihuang Zhong (22 papers)
  2. Kang Wang (72 papers)
  3. Ziyang Xu (28 papers)
  4. Juhua Liu (37 papers)
  5. Liang Ding (159 papers)
  6. Bo Du (264 papers)
Citations (1)

Summary

Enhancing LLM Reasoning with DUP Prompting: Techniques and Evaluations across Diverse Datasets

Introduction to DUP Prompting

The paper presents a novel prompting strategy known as Deep Understanding Problems (DUP) designed to enhance the reasoning capabilities of LLMs. Unlike existing methods that incrementally lead LLMs through a reasoning process, DUP prompting emphasizes a comprehensive understanding of the entire problem before proposing a solution. This approach involves three key stages: extracting the core question, extracting relevant problem-solving information, and generating accurate responses based on this information.

Benefits of DUP Prompting

The primary advantage of DUP prompting over traditional Chain of Thought (CoT) prompting methods is its focus on thoroughly understanding the complete context or scenario of a problem. This technique helps in reducing different categories of errors such as understanding, calculation, and process errors, which are frequently encountered in existing approaches. Remarkable improvements have been noted across various datasets, demonstrating the effectiveness of DUP in enhancing reasoning accuracy.

Experimental Results

Experiments conducted on ten reasoning datasets show significant outperformance of DUP prompting over Zero-Shot CoT and other baseline methods, including Few-shot manual CoT and automatic CoT setups. For instance, on SVAMP and GSM8K datasets, DUP prompting attained state-of-the-art performance with scores improving from 94.6% to 97.1% and 90.4% to 94.2%, respectively. These results testify to the superior reasoning process facilitated by a deeper problem understanding achieved through DUP prompting.

Implications and Future Directions

The demonstrated success of DUP prompts in fostering a better problem understanding in LLMs suggests potential expansions beyond the tested scope. While this prompting strategy excels in reducing typical reasoning errors and enhancing problem-solving accuracy in current LLMs, its integration with other AI systems or its adaptation for more complex or novel task formats could be explored further.

Conclusive Thoughts

The DUP prompting strategy marks a significant methodological improvement in the way LLMs perform reasoning tasks. By enabling models to first comprehend the problem in its entirety, it reduces error rates and increases solution accuracy significantly. These enhancements open new vistas for the application of LLMs in sectors requiring critical reasoning and decision-making capabilities. Future research could explore extending this approach to more complex reasoning or even different forms of cognitive tasks beyond the current datasets and models.

Youtube Logo Streamline Icon: https://streamlinehq.com