Emergent Mind

A Survey of Deep Learning for Mathematical Reasoning

(2212.10535)
Published Dec 20, 2022 in cs.AI , cs.CL , cs.CV , and cs.LG

Abstract

Mathematical reasoning is a fundamental aspect of human intelligence and is applicable in various fields, including science, engineering, finance, and everyday life. The development of artificial intelligence (AI) systems capable of solving math problems and proving theorems has garnered significant interest in the fields of machine learning and natural language processing. For example, mathematics serves as a testbed for aspects of reasoning that are challenging for powerful deep learning models, driving new algorithmic and modeling advances. On the other hand, recent advances in large-scale neural language models have opened up new benchmarks and opportunities to use deep learning for mathematical reasoning. In this survey paper, we review the key tasks, datasets, and methods at the intersection of mathematical reasoning and deep learning over the past decade. We also evaluate existing benchmarks and methods, and discuss future research directions in this domain.

We're not able to analyze this paper right now due to high demand.

Please check back later (sorry!).

Generate a detailed summary of this paper with a premium account.

We ran into a problem analyzing this paper.

Please try again later (sorry!).

Get summaries of trending AI papers delivered straight to your inbox

Unsubscribe anytime.

References
  1. Deepmath - deep sequence models for premise selection. Advances in neural information processing systems (NeurIPS), 29.
  2. Armath: a dataset for solving arabic math word problems. In Proceedings of the Thirteenth Language Resources and Evaluation Conference (LREC), pages 351–362.
  3. Synthesis of solutions for shaded area geometry problems. In The Thirtieth International Flairs Conference.
  4. Mathqa: Towards interpretable math word problem solving with operation-based formalisms. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), pages 2357–2367.
  5. Connor Anderson and Ryan Farrell. 2022. Improving fractal pre-training. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 1300–1309.
  6. Bottom-up and top-down attention for image captioning and visual question answering. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pages 6077–6086.
  7. Exploring length generalization in large language models. In Advances in Neural Information Processing Systems (NeurIPS).
  8. Program Synthesis with Large Language Models
  9. Neural machine translation by jointly learning to align and translate. In International Conference on Learning Representations (ICLR).
  10. Holist: An environment for machine learning of higher order logic theorem proving. In International Conference on Machine Learning (ICML), pages 454–463. PMLR.
  11. The coq proof assistant reference manual. INRIA, version, 6(11).
  12. Taylor Berg-Kirkpatrick and Daniel Spokoyny. 2020. An empirical investigation of contextualized number prediction. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 4754–4764.
  13. A Survey of Question Answering for Math and Science Problem
  14. Daniel G Bobrow. 1964. Natural language input for a computer problem solving system. AI Technical Reports.
  15. Language models are few-shot learners. Advances in Neural Information Processing Systems (NeurIPS), 33:1877–1901.
  16. Jie Cao and Jing Xiao. 2022. An augmented benchmark dataset for geometric question answering through dual parallel text encoding. In Proceedings of the 29th International Conference on Computational Linguistics (COLING), pages 1511–1520.
  17. A bottom-up dag structure extraction model for math word problems. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), pages 39–46.
  18. François Charton. 2022. Linear algebra with transformers. Transactions on Machine Learning Research.
  19. Unigeo: Unifying geometry logical reasoning via reformulating mathematical expression. In The 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP).
  20. Geoqa: A geometric question answering benchmark towards multimodal numerical reasoning. In Findings of the Association for Computational Linguistics (ACL), pages 513–523.
  21. Evaluating Large Language Models Trained on Code
  22. Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks
  23. TheoremQA: A Theorem-driven Question Answering dataset
  24. Finqa: A dataset of numerical reasoning over financial data. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 3697–3711.
  25. ConvFinQA: Exploring the Chain of Numerical Reasoning in Conversational Finance Question Answering
  26. Ting-Rui Chiang and Yun-Nung Chen. 2019. Semantically-aligned equation generation for solving and reasoning math word problems. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), pages 2656–2668.
  27. Unifying vision-and-language tasks via text generation. In Proceedings of the 38th International Conference on Machine Learning (ICML), pages 1931–1942.
  28. Learning phrase representations using rnn encoder–decoder for statistical machine translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1724–1734.
  29. Automated generation of readable proofs with geometric invariants. Journal of Automated Reasoning, 17(3):325–347.
  30. PaLM: Scaling Language Modeling with Pathways
  31. From ‘f’to ‘a’on the ny regents science exams: An overview of the aristo project. AI Magazine, 41(4):39–53.
  32. Training Verifiers to Solve Math Word Problems
  33. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), pages 4171–4186.
  34. Drop: A reading comprehension benchmark requiring discrete reasoning over paragraphs. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), pages 2368–2378.
  35. Edward A Feigenbaum et al. 1963. Computers and thought. McGraw-Hill.
  36. Injecting Numerical Reasoning Skills into Knowledge Base Question Answering Models
  37. Deborah Ferreira and André Freitas. 2020a. Natural language premise selection: Finding supporting statements for mathematical text. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 2175–2182.
  38. Deborah Ferreira and André Freitas. 2020b. Premise selection in natural language mathematical texts. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL), pages 7365–7374.
  39. Complexity-based prompting for multi-step reasoning. In International Conference on Learning Representations (ICLR).
  40. The Pile: An 800GB Dataset of Diverse Text for Language Modeling
  41. PAL: Program-aided Language Models
  42. Dynamic fusion with intra-and inter-modality attention flow for visual question answering. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 6639–6648.
  43. TacticToe: Learning to Prove with Tactics. Journal of Automated Reasoning.
  44. Convolutional sequence to sequence learning. In International conference on machine learning (ICML), pages 1243–1252. PMLR.
  45. Empirical explorations of the geometry theorem machine. In Papers presented at the May 3-5, 1960, western joint IRE-AIEE-ACM computer conference, pages 143–149.
  46. Injecting numerical reasoning skills into language models. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL), pages 946–958.
  47. Distributed asynchronous online learning for natural language processing. In Proceedings of the Fourteenth Conference on Computational Natural Language Learning, pages 213–222.
  48. Improving alignment of dialogue agents via targeted human judgements
  49. Four decades of mizar. Journal of Automated Reasoning, 55(3):191–198.
  50. Retrieval augmented language model pre-training. In International Conference on Machine Learning (ICML), pages 3929–3938. PMLR.
  51. Proof artifact co-training for theorem proving with language models. In International Conference on Learning Representations (ICLR).
  52. Pgdp5k: A diagram parsing dataset for plane geometry problems. In 26th International Conference on Pattern Recognition (ICPR).
  53. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pages 770–778.
  54. Measuring massive multitask language understanding. In International Conference on Learning Representations (ICLR).
  55. Measuring mathematical problem solving with the math dataset. In 35th Conference on Neural Information Processing Systems (NeurIPS) Track on Datasets and Benchmarks.
  56. Pretrained transformers improve out-of-distribution robustness. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL), pages 2744–2751.
  57. Tapas: Weakly supervised table parsing via pre-training. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL), pages 4320–4333.
  58. Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation, 9(8):1735–1780.
  59. Learning by fixing: Solving math word problems with weak supervision. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), pages 4959–4967.
  60. Smart: A situation model for algebra story problems via attributed grammar. In AAAI, pages 13009–13017.
  61. Learning to solve arithmetic word problems with verb categorization. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP).
  62. Gamepad: A learning environment for theorem proving. In International Conference on Learning Representations (ICLR).
  63. Neural math word problem solver with reinforcement learning. In Proceedings of the 27th International Conference on Computational Linguistics (COLING), pages 213–223.
  64. Learning fine-grained expressions to solve math word problems. In Proceedings of Empirical Methods in Natural Language Processing (EMNLP), pages 805–814.
  65. How well do computers solve math word problems? large-scale dataset construction and evaluation. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL), pages 887–896.
  66. Draft, sketch, and prove: Guiding formal theorem provers with informal proofs. In Submitted to The Eleventh International Conference on Learning Representations.
  67. Lisa: Language models of isabelle proofs. In 6th Conference on Artificial Intelligence and Theorem Proving (AITP).
  68. Thor: Wielding hammers to integrate language models and automated theorem provers. Advances in Neural Information Processing Systems (NeurIPS), 35:8360–8373.
  69. Learning to reason deductively: Math word problem solving as complex relation extraction. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (ACL), pages 5944–5955.
  70. Maieutic prompting: Logically consistent reasoning with recursive explanations. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1266–1279.
  71. Dvqa: Understanding data visualizations via question answering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 5648–5656.
  72. Figureqa: An annotated figure dataset for visual reasoning. In International Conference on Learning Representations (ICLR).
  73. Holstep: A machine learning dataset for higher-order logic theorem proving. In International Conference on Learning Representations (ICLR).
  74. How much coffee was consumed during emnlp 2019? fermi problems: A new reasoning challenge for ai. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 7318–7328.
  75. Large Language Models Struggle to Learn Long-Tail Knowledge
  76. Unifiedqa: Crossing format boundaries with a single qa system. In Findings of the Association for Computational Linguistics (EMNLP), pages 1896–1907.
  77. Decomposed Prompting: A Modular Approach for Solving Complex Tasks
  78. Point to the expression: Solving algebraic word problems using the expression-pointer transformer model. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 3768–3779.
  79. Bilinear attention networks. In Advances in Neural Information Processing Systems (NeurIPS), pages 1571–1581.
  80. Vilt: Vision-and-language transformer without convolution or region supervision. In Proceedings of the 38th International Conference on Machine Learning (ICML), pages 5583–5594.
  81. Large language models are zero-shot reasoners. In 36th Conference on Neural Information Processing Systems (NeurIPS).
  82. Mawps: A math word problem repository. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL), pages 1152–1157.
  83. Parsing algebraic word problems into equations. Transactions of the Association for Computational Linguistics (TACL), 3:585–597.
  84. Does pretraining for summarization require knowledge transfer? In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 3178–3189
  85. Learning to automatically solve algebra word problems. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (ACL), pages 271–281.
  86. Guillaume Lample and François Charton. 2020. Deep learning for symbolic mathematics. In International Conference on Learning Representations (ICLR).
  87. Hypertree proof search for neural theorem proving. Advances in Neural Information Processing Systems (NeurIPS), 35:26337–26349.
  88. Mwptoolkit: an open-source framework for deep learning-based math word problem solvers. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), pages 13188–13190.
  89. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
  90. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324.
  91. BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL), pages 7871–7880.
  92. Solving quantitative reasoning problems with language models. In Advances in Neural Information Processing Systems (NeurIPS).
  93. Modeling intra-relation in math word problems with different functional multi-head attentions. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL), pages 6162–6167.
  94. Dialogue learning with human-in-the-loop. In International Conference on Learning Representations (ICLR).
  95. What does bert with vision look at? In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL), pages 5265–5275
  96. Graph-to-tree neural networks for learning structured input-output translation with applications to semantic parsing and math word problem. In Findings of the Association for Computational Linguistics (EMNLP), pages 2841–2852.
  97. Isarstep: a benchmark for high-level mathematical reasoning. In International Conference on Learning Representations (ICLR).
  98. Making Large Language Models Better Reasoners with Step-Aware Verifier
  99. Seeking patterns, not just memorizing procedures: Contrastive learning for solving math word problems. In Findings of the Association for Computational Linguistics (ACL), pages 2486–2496.
  100. Holistic Evaluation of Language Models
  101. Percy Liang and Dan Klein. 2009. Online em for unsupervised models. In Proceedings of human language technologies: The 2009 annual conference of the North American chapter of the association for computational linguistics (NAACL), pages 611–619.
  102. Mwp-bert: Numeracy-augmented pre-training for math word problem solving. In Findings of the Association for Computational Linguistics (NAACL), pages 997–1009.
  103. Birds have four legs?! numersense: Probing numerical commonsense knowledge of pre-trained language models. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 6862–6868.
  104. Hms: A hierarchical solver with dependency-enhanced understanding for math word problem. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), pages 4232–4240.
  105. Program induction by rationale generation: Learning to solve and explain algebraic word problems. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (ACL), pages 158–167.
  106. What makes good in-context examples for gpt-3? In Proceedings of Deep Learning Inside Out (DeeLIO 2022): The 3rd Workshop on Knowledge Extraction and Integration for Deep Learning Architectures, pages 100–114
  107. TAPEX: Table pre-training via learning a neural SQL executor. In International Conference on Learning Representations.
  108. Reverse operation based data augmentation for solving math word problems. IEEE Transactions on Audio, Speech and Language Processing.
  109. Tree-structured decoding for solving math word problems. In Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP), pages 2370–2379.
  110. Roberta: A robustly optimized bert pretraining approach. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT).
  111. Deep Network Guided Proof Search
  112. Inter-gps: Interpretable geometry problem solving with formal language and symbolic reasoning. In The 59th Annual Meeting of the Association for Computational Linguistics (ACL).
  113. Learn to explain: Multimodal reasoning via thought chains for science question answering. In The 36th Conference on Neural Information Processing Systems (NeurIPS).
  114. Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models
  115. Dynamic prompt learning via policy gradient for semi-structured mathematical reasoning. In International Conference on Learning Representations (ICLR).
  116. Iconqa: A new benchmark for abstract diagram understanding and visual language reasoning. In The 35th Conference on Neural Information Processing Systems (NeurIPS) Track on Datasets and Benchmarks.
  117. Fantastically ordered prompts and where to find them: Overcoming few-shot prompt order sensitivity. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (ACL), pages 8086–8098.
  118. The mathlib Community. 2020. The lean mathematical library. In CPP 2020 - Proceedings of the 9th ACM SIGPLAN International Conference on Certified Programs and Proofs, co-located with POPL 2020.
  119. A Survey in Mathematical Language Processing
  120. Norman D. Megill and David A. Wheeler. 2019. Metamath: A Computer Language for Mathematical Proofs. Lulu Press, Morrisville, North Carolina. http://us.metamath.org/downloads/metamath.pdf.

  121. Solving Math Word Problems with Double-Decoder Transformer
  122. A diverse corpus for evaluating and developing english math word problem solvers. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL), pages 975–984.
  123. Rethinking the role of demonstrations: What makes in-context learning work? Proceedings of Empirical Methods in Natural Language Processing (EMNLP)
  124. Deep learning based text classification: a comprehensive review. ACM Computing Surveys (CSUR), 54(3):1–40.
  125. Lila: A unified benchmark for mathematical reasoning. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP).
  126. Numglue: A suite of fundamental yet challenging mathematical reasoning tasks. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (ACL), pages 3505–3523.
  127. Enhancing self-consistency and performance of pretrained language models with nli. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics.
  128. The lean theorem prover (system description). In International Conference on Automated Deduction, pages 378–388. Springer.
  129. WebGPT: Browser-assisted question-answering with human feedback
  130. Empirical explorations of the logic theory machine: A case study in heuristic. In Proceedings of the Western Joint Computer Conference, IRE-AIEE-ACM 1957.
  131. Learning from self-sampled correct and partially-correct programs. In International Conference on Learning Representations (ICLR).
  132. Show Your Work: Scratchpads for Intermediate Computation with Language Models
  133. Training language models to follow instructions with human feedback. In Advances in Neural Information Processing Systems (NeurIPS).
  134. Are nlp models really able to solve simple math word problems? In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HIT), pages 2080–2094
  135. Lawrence C. Paulson. 1994. Isabelle - A Generic Theorem Prover (with a contribution by T. Nipkow), volume 828 of Lecture Notes in Computer Science. Springer.
  136. Film: Visual reasoning with a general conditioning layer. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI).
  137. Formal Mathematics Statement Curriculum Learning
  138. Generative Language Modeling for Automated Theorem Proving
  139. Neural-symbolic solver for math word problems with auxiliary tasks. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL), pages 5870–5881.
  140. Semantically-aligned universal tree-structured solver for math word problems. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 3780–3789.
  141. Valuenet: A new dataset for human value driven dialogue system. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), pages 2468–2484.
  142. Towards socially intelligent agents with mental state transition and human value. In Proceedings of the 23rd Annual Meeting of the Special Interest Group on Discourse and Dialogue, pages 146–158.
  143. Pre-trained models for natural language processing: A survey. Science China Technological Sciences, 63(10):1872–1897.
  144. Language models are unsupervised multitask learners. OpenAI Blog.
  145. Scaling Language Models: Methods, Analysis & Insights from Training Gopher
  146. Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research (JMLR), 21:1–67.
  147. Equate: A benchmark evaluation framework for quantitative reasoning in natural language inference. In Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL), pages 349–361.
  148. Impact of pretraining term frequencies on few-shot numerical reasoning. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 840–854.
  149. Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems (NeurIPS), 28.
  150. Ryokan Ri and Yoshimasa Tsuruoka. 2022. Pretraining with artificial language: Studying transferable knowledge in language models. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (ACL), pages 7302–7315.
  151. Data-Driven Methods for Solving Algebra Word Problems
  152. Subhro Roy and Dan Roth. 2015. Solving general arithmetic word problems. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1743–1752.
  153. Subhro Roy and Dan Roth. 2017. Unit dependency graph and its application to arithmetic word problem solving. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI).
  154. Subhro Roy and Dan Roth. 2018. Mapping to declarative knowledge for word problem solving. Transactions of the Association for Computational Linguistics (TACL), 6:159–172.
  155. Reasoning about quantities in natural language. Transactions of the Association for Computational Linguistics (TACL), 3:1–13.
  156. Learning to retrieve prompts for in-context learning. North American Chapter of the Association for Computational Linguistics (NAACL).
  157. From textbooks to knowledge: A case study in harvesting axiomatic knowledge from textbooks to solve geometry problems. In Proceedings of Empirical Methods in Natural Language Processing (EMNLP), pages 773–784.
  158. Mrinmaya Sachan and Eric Xing. 2017. Learning to solve geometry problems from natural language demonstrations in textbooks. In Proceedings of the 6th Joint Conference on Lexical and Computational Semantics, pages 251–261.
  159. Analysing mathematical reasoning abilities of neural models. In International Conference on Learning Representations (ICLR).
  160. Programming puzzles. In Thirty-fifth Conference on Neural Information Processing Systems (NeurIPS) Datasets and Benchmarks Track.
  161. Neural machine translation of rare words with subword units. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL), pages 1715–1725.
  162. Solving geometry problems: Combining text and diagram interpretation. In Proceedings of Empirical Methods in Natural Language Processing (EMNLP), pages 1466–1476.
  163. Generate & rank: A multi-task framework for math word problems. In Findings of the Association for Computational Linguistics (EMNLP), pages 2269–2279.
  164. Yibin Shen and Cheqing Jin. 2020. Solving math word problems with multi-encoders and multi-decoders. In Proceedings of the 28th International Conference on Computational Linguistics (COLING), pages 2924–2934.
  165. Automatically solving number word problems by semantic parsing and reasoning. In Proceedings of the 2015 conference on empirical methods in natural language processing (EMNLP), pages 1132–1142.
  166. Mass: Masked sequence to sequence pre-training for language generation. In 36th International Conference on Machine Learning (ICML).
  167. Dream: A challenge data set and models for dialogue-based reading comprehension. Transactions of the Association for Computational Linguistics (TACL), 7:217–231.
  168. Sequence to sequence learning with neural networks. Advances in neural information processing systems (NeurIPS), 27.
  169. Quarel: A dataset and models for answering questions about qualitative relationships. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), pages 7063–7071.
  170. Improved semantic representations from tree-structured long short-term memory networks. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (ACL), pages 1556–1566.
  171. Estimating numbers without regression. In 36th Conference on Neural Information Processing Systems (NeurIPS 2022) Workshop on MATH-AI.
  172. Representing numbers in nlp: a survey and a vision. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HIT), pages 644–656.
  173. Shounaak Ughade and Satish Kumbhar. 2019. Survey on mathematical word problem solving using natural language processing. In 2019 1st International Conference on Innovations in Information and Communication Technology (ICIICT), pages 1–5. IEEE.
  174. Shyam Upadhyay and Ming-Wei Chang. 2015. Draw: A challenging and diverse algebra word problem set. Technical report, Citeseer.
  175. Shyam Upadhyay and Ming-Wei Chang. 2017. Annotating derivations: A new evaluation strategy and dataset for algebra word problems. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics (ACL), pages 494–504.
  176. Josef Urban. 2006. Mptp 0.2: Design, implementation, and initial experiments. Journal of Automated Reasoning, 37(1):21–43.
  177. Attention is all you need. In Advances in Neural Information Processing Systems (NeurIPS), pages 5998–6008.
  178. Do nlp models know numbers? probing numeracy in embeddings. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 5307–5315.
  179. Translating a math word problem to a expression tree. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1064–1069.
  180. Mathdqn: Solving arithmetic word problems via deep reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI).
  181. Template-based math word problem solvers with recursive neural networks. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), pages 7144–7151.
  182. Self-consistency improves chain of thought reasoning in language models. In International Conference on Learning Representations (ICLR).
  183. Deep neural solver for math word problems. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 845–854.
  184. Chain of thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems (NeurIPS).
  185. Naturalproofs: Mathematical theorem proving in natural language. In Thirty-fifth Conference on Neural Information Processing Systems (NeurIPS) Datasets and Benchmarks Track.
  186. Naturalprover: Grounded mathematical proof generation with language models. In Advances in Neural Information Processing Systems (NeurIPS).
  187. Generating sequences by learning to self-correct. In International Conference on Learning Representations (ICLR).
  188. Symbolic brittleness in sequence models: on systematic generalization in symbolic mathematics. In AAAI.
  189. Wu Wen-Tsun. 1986. Basic principles of mechanical theorem proving in elementary geometries. Journal of automated Reasoning, 2(3):221–252.
  190. Holophrasm: a neural Automated Theorem Prover for higher-order logic
  191. Cbam: Convolutional block attention module. In Proceedings of the European conference on computer vision (ECCV), pages 3–19.
  192. A knowledge-aware sequence-to-tree network for math word problem solving. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 7137–7146.
  193. An edge-enhanced hierarchical graph-to-tree network for math word problem solving. In Findings of the Association for Computational Linguistics (EMNLP), pages 1473–1482.
  194. Math word problem solving with explicit numerical values. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL), pages 5859–5869.
  195. A survey of human-in-the-loop for machine learning. Future Generation Computer Systems.
  196. Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
  197. Int: An inequality benchmark for evaluating generalization in theorem proving. In International Conference on Learning Representations (ICLR).
  198. Autoformalization with large language models. In Advances in Neural Information Processing Systems (NeurIPS).
  199. Insights into Pre-training via Simpler Synthetic Tasks
  200. Lime: Learning inductive bias for primitives of mathematical reasoning. In International Conference on Machine Learning (ICML), pages 11251–11262. PMLR.
  201. Zhipeng Xie and Shichao Sun. 2019. A goal-driven tree-structured neural model for math word problems. In International Joint Conference on Artificial Intelligence (IJCAI), pages 5299–5305.
  202. Show, attend and tell: Neural image caption generation with visual attention. In International conference on machine learning (ICML), pages 2048–2057. PMLR.
  203. Kaiyu Yang and Jia Deng. 2019. Learning to prove theorems via interacting with proof assistants. In International Conference on Machine Learning (ICML), pages 6984–6994. PMLR.
  204. An introduction to java geometry expert. In International workshop on automated deduction in geometry, pages 189–195. Springer.
  205. Geore: A relation extraction dataset for chinese geometry problems. In 35th Conference on Neural Information Processing Systems (NeurIPS) Workshop on Math AI for Education (MATHAI4ED).
  206. Improving math word problems with pre-trained knowledge and hierarchical reasoning. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 3384–3394.
  207. Generate rather than retrieve: Large language models are strong context generators. In International Conference on Learning Representations (ICLR).
  208. Solving arithmetic word problems by scoring equations with recursive neural networks. Expert Systems with Applications, 174:114704.
  209. The gap of semantic parsing: A survey on automatic math word problem solvers. IEEE transactions on pattern analysis and machine intelligence, 42(9):2287–2305.
  210. Teacher-student networks with multiple decoders for solving math word problem. In International Joint Conference on Artificial Intelligence (IJCAI).
  211. Graph-to-tree learning for solving math word problems. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL), pages 3928–3937.
  212. Learning to understand plane geometry diagram. In 36th Conference on Neural Information Processing Systems (NeurIPS) Workshop on MATH-AI.
  213. Noahqa: Numerical reasoning with interpretable graph question answering dataset. In Findings of the Association for Computational Linguistics (EMNLP), pages 4147–4161.
  214. Machine number sense: A dataset of visual arithmetic problems for abstract and relational reasoning. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), pages 1332–1340.
  215. Do language embeddings capture scales? In Proceedings of the Third BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP, pages 292–299
  216. Dialogpt: Large-scale generative pre-training for conversational response generation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations.
  217. Automatic chain of thought prompting in large language models. In International Conference on Learning Representations (ICLR).
  218. Ape210K: A Large-Scale and Template-Rich Dataset of Math Word Problems
  219. Multihiertt: Numerical reasoning over multi hierarchical tabular and textual data. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (ACL), pages 6588–6600.
  220. Calibrate before use: Improving few-shot performance of language models. In International Conference on Machine Learning (ICML), pages 12697–12706. PMLR.
  221. Minif2f: a cross-system benchmark for formal olympiad-level mathematics. In International Conference on Learning Representations (ICLR).
  222. "Going on a vacation" takes longer than "Going for a walk": A Study of Temporal Commonsense Understanding. In Proc. of the Conference on Empirical Methods in Natural Language Processing (EMNLP).
  223. Least-to-most prompting enables complex reasoning in large language models. In International Conference on Learning Representations (ICLR).
  224. Tat-qa: A question answering benchmark on a hybrid of tabular and textual content in finance. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL-JCNLP), pages 3277–3287.

Show All 224

Test Your Knowledge

You answered out of questions correctly.

Well done!