Fill in the Blank: Exploring and Enhancing LLM Capabilities for Backward Reasoning in Math Word Problems (2310.01991v2)
Abstract: While forward reasoning (i.e., find the answer given the question) has been explored extensively in recent literature, backward reasoning is relatively unexplored. We examine the backward reasoning capabilities of LLMs on Math Word Problems (MWPs): given a mathematical question and its answer, with some details omitted from the question, can LLMs effectively retrieve the missing information? On modifying three benchmark datasets for this task, to evaluate this task: GSM8k, SVAMP, and MultiArith, we find a significant drop in the accuracy of models on this task compared to forward reasoning across SOTA LLMs (GPT4, GPT3.5, PaLM-2, and LLaMa). Motivated by the fact backward reasoning can be seen as the ''inverse'' of forward reasoning, we propose variations of three different forward reasoning strategies to improve performance. Rephrase reformulates the given problem into a forward reasoning problem, PAL-Tools combines the idea of Program-Aided LLMs to produce a set of equations that can be solved by an external solver, and Check your Work exploits the availability of natural verifier of high accuracy in the forward direction, interleaving solving and verification steps. Finally, realizing that each of our base methods correctly solves a different set of problems, we propose a novel Bayesian formulation for creating an ensemble over the base methods to further boost the accuracy. Extensive experimentation demonstrates successive improvement in the performance of LLMs on the backward reasoning task, using our strategies, with our ensemble-based method resulting in significant performance gains compared to the SOTA forward reasoning strategies we adapt.
- Palm 2 technical report. arXiv preprint arXiv:2305.10403, 2023.
- Abductive commonsense reasoning. In ICLR, 2020.
- Language models are few-shot learners. arXiv preprint arXiv:2005.14165, 2020.
- Training verifiers to solve math word problems. arXiv preprint arXiv:2110.14168, 2021.
- Optq: Accurate quantization for generative pre-trained transformers. In ICLR, 2022.
- Pal: Program-aided language models. In ICML, 2023.
- Solving math word problems by combining language models with symbolic solvers. arXiv preprint arXiv:2304.09102, 2023.
- How can we know when language models know? on the calibration of language models for question answering. Transactions of the Association for Computational Linguistics, 9:962–977, 2021.
- Learning to reason deductively: Math word problem solving as complex relation extraction. 2022.
- Parsing algebraic word problems into equations. Transactions of the Association for Computational Linguistics, 3:585–597, 2015.
- Learning to automatically solve algebra word problems. In ACL, pp. 271–281, 2014.
- Graph-to-tree neural networks for learning structured input-output translation with applications to semantic parsing and math word problem. In Findings EMNLP, pp. 2841–2852, 2020.
- Program induction by rationale generation: Learning to solve and explain algebraic word problems. In ACL, pp. 158–167, 2017.
- A survey of deep learning for mathematical reasoning. arXiv preprint arXiv:2212.10535, 2022.
- Self-refine: Iterative refinement with self-feedback. arXiv preprint arXiv:2303.17651, 2023.
- Sympy: Symbolic computing in python. May 2016.
- OpenAI. Gpt-4 technical report. arXiv preprint arXiv:2303.08774, 2023.
- Are NLP models really able to solve simple math word problems? In NAACL, 2021.
- Back to the future: Unsupervised backprop-based decoding for counterfactual and abductive commonsense reasoning. In EMNLP, 2020.
- COLD decoding: Energy-based constrained text generation with langevin dynamics. In NeurIPS, 2022.
- Reversibility of thought: An instance in multiplicative tasks. The Journal of Mathematical Behavior, 27(2):138–151, 2008.
- FD Rivera. On the pitfalls of abduction: Compolicities and complexities in patterning activity. For the learning of mathematics, 28(1):17–25, 2008.
- Solving general arithmetic word problems. In EMNLP, 2015.
- Mapping to declarative knowledge for word problem solving. Transactions of the Association for Computational Linguistics, 6:159–172, 2018.
- Toolformer: Language models can teach themselves to use tools. arXiv preprint arXiv:2302.04761, 2023.
- Generate & rank: A multi-task framework for math word problems. In Findings of EMNLP, pp. 2269–2279, 2021.
- Sequence to sequence learning with neural networks. In NeurIPS, 2014.
- Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023.
- Self-consistency improves chain of thought reasoning in language models. In ICLR, 2023.
- Deep neural solver for math word problems. In EMNLP, pp. 845–854, 2017.
- Chain-of-thought prompting elicits reasoning in large language models. In NeurIPS, 2022.
- Generating sequences by learning to self-correct. In ICLR, 2023.
- Large language models are better reasoners with self-verification. arXiv preprint arXiv:2212.09561, 2022.
- Calibrate before use: Improving few-shot performance of language models. In ICML, pp. 12697–12706, 2021.
- Progressive-hint prompting improves reasoning in large language models. arXiv preprint arXiv:2304.09797, 2023.
- Solving challenging math word problems using gpt-4 code interpreter with code-based self-verification. arXiv preprint arXiv:2308.07921, 2023.
- Aniruddha Deb (3 papers)
- Neeva Oza (2 papers)
- Sarthak Singla (1 paper)
- Dinesh Khandelwal (13 papers)
- Dinesh Garg (20 papers)
- Parag Singla (39 papers)