Papers
Topics
Authors
Recent
Search
2000 character limit reached

FormulaReasoning: A Dataset for Formula-Based Numerical Reasoning

Published 20 Feb 2024 in cs.CL | (2402.12692v5)

Abstract: The application of formulas (e.g., physics formulas) is a fundamental ability of humans when solving numerical reasoning problems. Existing numerical reasoning datasets seldom explicitly indicate the formulas employed in reasoning, as their questions rely on implicit commonsense mathematical knowledge. In contrast, in this paper, we introduce FormulaReasoning, a new dataset specifically designed for formula-based numerical reasoning. Each of the 4,751 questions in our dataset requires numerical calculation with external physics formulas, making it a more challenging benchmark for evaluating LLMs. We offer normalized fine-grained annotations for the questions, available in English and Chinese, including formula structures, parameter names, symbols, numerical values, and units, derived from extensive manual effort with LLM assistance for guaranteed quality. We also provide a consolidated formula database to serve as an external knowledge base accompanying the dataset. We employ FormulaReasoning to evaluate LLMs with 7B to over 100B parameters, and explore retrieval-augmented generation with the formula database. Our evaluation also covers supervised methods that break down the reasoning process into formula generation, parameter extraction, and numerical calculation, as well as direct preference optimization methods based on derived preference data.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (36)
  1. MathQA: Towards interpretable math word problem solving with operation-based formalisms. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 2357–2367, Minneapolis, Minnesota. Association for Computational Linguistics.
  2. Qwen technical report. ArXiv, abs/2309.16609.
  3. A multitask, multilingual, multimodal evaluation of chatgpt on reasoning, hallucination, and interactivity. ArXiv, abs/2302.04023.
  4. Teaching neural module networks to do arithmetic. In Proceedings of the 29th International Conference on Computational Linguistics, pages 1502–1510, Gyeongju, Republic of Korea. International Committee on Computational Linguistics.
  5. Training verifiers to solve math word problems. ArXiv, abs/2110.14168.
  6. Training verifiers to solve math word problems. arXiv preprint arXiv:2110.14168.
  7. Pre-training with whole word masking for chinese bert.
  8. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota. Association for Computational Linguistics.
  9. DROP: A reading comprehension benchmark requiring discrete reasoning over paragraphs. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 2368–2378, Minneapolis, Minnesota. Association for Computational Linguistics.
  10. Mathematical capabilities of chatgpt. ArXiv, abs/2301.13867.
  11. SimCSE: Simple contrastive learning of sentence embeddings. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 6894–6910, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
  12. Neural module networks for reasoning over text. In International Conference on Learning Representations.
  13. Measuring mathematical problem solving with the math dataset. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2).
  14. Learning to solve arithmetic word problems with verb categorization. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 523–533, Doha, Qatar. Association for Computational Linguistics.
  15. Exploiting numerical-contextual knowledge to improve numerical reasoning in question answering. In Findings of the Association for Computational Linguistics: NAACL 2022, pages 1811–1821, Seattle, United States. Association for Computational Linguistics.
  16. Large language models are zero-shot reasoners. Advances in neural information processing systems, 35:22199–22213.
  17. MAWPS: A math word problem repository. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1152–1157, San Diego, California. Association for Computational Linguistics.
  18. Learning to automatically solve algebra word problems. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 271–281, Baltimore, Maryland. Association for Computational Linguistics.
  19. Tsqa: tabular scenario based question answering. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 13297–13305.
  20. Dyrren: A dynamic retriever-reranker-generator model for numerical reasoning over tabular and textual data. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 13139–13147.
  21. Making language models better reasoners with step-aware verifier. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 5315–5333, Toronto, Canada. Association for Computational Linguistics.
  22. Textbooks are all you need ii: phi-1.5 technical report. ArXiv, abs/2309.05463.
  23. Birds have four legs?! NumerSense: Probing Numerical Commonsense Knowledge of Pre-Trained Language Models. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 6862–6868, Online. Association for Computational Linguistics.
  24. Guiding mathematical reasoning via mastering commonsense formula knowledge. Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining.
  25. A survey of deep learning for mathematical reasoning. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 14605–14631, Toronto, Canada. Association for Computational Linguistics.
  26. OpenAI. 2023. Gpt-4 technical report.
  27. Automatically solving number word problems by semantic parsing and reasoning. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pages 1132–1142, Lisbon, Portugal. Association for Computational Linguistics.
  28. InternLM Team. 2023. Internlm: A multilingual language model with progressively enhanced capabilities. https://github.com/InternLM/InternLM.
  29. Alberto Testolin. 2023. Can neural networks do arithmetic? a survey on the elementary numerical skills of state-of-the-art deep learning models. ArXiv, abs/2303.07735.
  30. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288.
  31. Template-based math word problem solvers with recursive neural networks. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 7144–7151.
  32. Self-consistency improves chain of thought reasoning in language models. In The Eleventh International Conference on Learning Representations.
  33. Deep neural solver for math word problems. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 845–854, Copenhagen, Denmark. Association for Computational Linguistics.
  34. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837.
  35. mT5: A massively multilingual pre-trained text-to-text transformer. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 483–498, Online. Association for Computational Linguistics.
  36. Least-to-most prompting enables complex reasoning in large language models. In The Eleventh International Conference on Learning Representations.

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 0 likes about this paper.