Emergent Mind

Abstract

While forward reasoning (i.e. find the answer given the question) has been explored extensively in the recent literature, backward reasoning is relatively unexplored. We examine the backward reasoning capabilities of LLMs on Math Word Problems (MWPs): given a mathematical question and its answer, with some details omitted from the question, can LLMs effectively retrieve the missing information? In this paper, we formally define the backward reasoning task on math word problems and modify three datasets to evaluate this task: GSM8k, SVAMP and MultiArith. Our findings show a significant drop in the accuracy of models on backward reasoning compared to forward reasoning across four SOTA LLMs (GPT4, GPT3.5, PaLM-2, and LLaMa-2). Utilizing the specific format of this task, we propose three novel techniques that improve performance: Rephrase reformulates the given problem into a forward reasoning problem, PAL-Tools combines the idea of Program-Aided LLMs to produce a set of equations that can be solved by an external solver, and Check your Work exploits the availability of natural verifier of high accuracy in the forward direction, interleaving solving and verification steps. Finally, realizing that each of our base methods correctly solves a different set of problems, we propose a novel Bayesian formulation for creating an ensemble over these base methods aided by a verifier to further boost the accuracy by a significant margin. Extensive experimentation demonstrates that our techniques successively improve the performance of LLMs on the backward reasoning task, with the final ensemble-based method resulting in a substantial performance gain compared to the raw LLMs with standard prompting techniques such as chain-of-thought.

We're not able to analyze this paper right now due to high demand.

Please check back later (sorry!).

Generate a detailed summary of this paper with a premium account.

We ran into a problem analyzing this paper.

Please try again later (sorry!).

Get summaries of trending AI papers delivered straight to your inbox

Unsubscribe anytime.

References
  1. PaLM 2 Technical Report
  2. Abductive commonsense reasoning. In ICLR
  3. Language Models are Few-Shot Learners
  4. Training Verifiers to Solve Math Word Problems
  5. Optq: Accurate quantization for generative pre-trained transformers. In ICLR
  6. Pal: Program-aided language models. In ICML
  7. Solving Math Word Problems by Combining Language Models With Symbolic Solvers
  8. How can we know when language models know? on the calibration of language models for question answering. Transactions of the Association for Computational Linguistics, 9:962–977
  9. Learning to reason deductively: Math word problem solving as complex relation extraction. 2022.
  10. Parsing algebraic word problems into equations. Transactions of the Association for Computational Linguistics, 3:585–597
  11. Learning to automatically solve algebra word problems. In ACL, pp.  271–281
  12. Graph-to-tree neural networks for learning structured input-output translation with applications to semantic parsing and math word problem. In Findings EMNLP, pp.  2841–2852
  13. Program induction by rationale generation: Learning to solve and explain algebraic word problems. In ACL, pp.  158–167
  14. A Survey of Deep Learning for Mathematical Reasoning
  15. Self-Refine: Iterative Refinement with Self-Feedback
  16. Sympy: Symbolic computing in python. May 2016.
  17. GPT-4 Technical Report
  18. Are NLP models really able to solve simple math word problems? In NAACL
  19. Back to the future: Unsupervised backprop-based decoding for counterfactual and abductive commonsense reasoning. In EMNLP
  20. COLD decoding: Energy-based constrained text generation with langevin dynamics. In NeurIPS
  21. Reversibility of thought: An instance in multiplicative tasks. The Journal of Mathematical Behavior, 27(2):138–151
  22. FD Rivera. On the pitfalls of abduction: Compolicities and complexities in patterning activity. For the learning of mathematics, 28(1):17–25
  23. Solving general arithmetic word problems. In EMNLP
  24. Mapping to declarative knowledge for word problem solving. Transactions of the Association for Computational Linguistics, 6:159–172
  25. Toolformer: Language Models Can Teach Themselves to Use Tools
  26. Generate & rank: A multi-task framework for math word problems. In Findings of EMNLP, pp.  2269–2279
  27. Sequence to sequence learning with neural networks. In NeurIPS
  28. Llama 2: Open Foundation and Fine-Tuned Chat Models
  29. Self-consistency improves chain of thought reasoning in language models. In ICLR
  30. Deep neural solver for math word problems. In EMNLP, pp.  845–854
  31. Chain-of-thought prompting elicits reasoning in large language models. In NeurIPS
  32. Generating sequences by learning to self-correct. In ICLR
  33. Large Language Models are Better Reasoners with Self-Verification
  34. Calibrate before use: Improving few-shot performance of language models. In ICML, pp.  12697–12706
  35. Progressive-Hint Prompting Improves Reasoning in Large Language Models
  36. Solving Challenging Math Word Problems Using GPT-4 Code Interpreter with Code-based Self-Verification

Show All 36