Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 65 tok/s
Gemini 2.5 Pro 40 tok/s Pro
GPT-5 Medium 26 tok/s Pro
GPT-5 High 24 tok/s Pro
GPT-4o 113 tok/s Pro
Kimi K2 200 tok/s Pro
GPT OSS 120B 445 tok/s Pro
Claude Sonnet 4.5 34 tok/s Pro
2000 character limit reached

Enhancing Quantitative Reasoning Skills of Large Language Models through Dimension Perception (2312.17532v1)

Published 29 Dec 2023 in cs.CL, cs.AI, and cs.LG

Abstract: Quantities are distinct and critical components of texts that characterize the magnitude properties of entities, providing a precise perspective for the understanding of natural language, especially for reasoning tasks. In recent years, there has been a flurry of research on reasoning tasks based on LLMs, most of which solely focus on numerical values, neglecting the dimensional concept of quantities with units despite its importance. We argue that the concept of dimension is essential for precisely understanding quantities and of great significance for LLMs to perform quantitative reasoning. However, the lack of dimension knowledge and quantity-related benchmarks has resulted in low performance of LLMs. Hence, we present a framework to enhance the quantitative reasoning ability of LLMs based on dimension perception. We first construct a dimensional unit knowledge base (DimUnitKB) to address the knowledge gap in this area. We propose a benchmark DimEval consisting of seven tasks of three categories to probe and enhance the dimension perception skills of LLMs. To evaluate the effectiveness of our methods, we propose a quantitative reasoning task and conduct experiments. The experimental results show that our dimension perception method dramatically improves accuracy (43.55%->50.67%) on quantitative reasoning tasks compared to GPT-4.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (43)
  1. A. Thawani, J. Pujara, F. Ilievski, and P. Szekely, “Representing numbers in NLP: a survey and a vision,” in Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.   Online: Association for Computational Linguistics, Jun. 2021, pp. 644–656. [Online]. Available: https://aclanthology.org/2021.naacl-main.53
  2. Q. Ran, Y. Lin, P. Li, J. Zhou, and Z. Liu, “NumNet: Machine reading comprehension with numerical reasoning,” in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 2019, pp. 2474–2484.
  3. B. Y. Lin, S. Lee, R. Khanna, and X. Ren, “Birds have four legs?! NumerSense: Probing Numerical Commonsense Knowledge of Pre-Trained Language Models,” in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP).   Online: Association for Computational Linguistics, Nov. 2020, pp. 6862–6868. [Online]. Available: https://aclanthology.org/2020.emnlp-main.557
  4. S. Mishra, A. Mitra, N. Varshney, B. S. Sachdeva, and C. Baral, “Towards question format independent numerical reasoning: A set of prerequisite tasks,” ArXiv, vol. abs/2005.08516, 2020.
  5. T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell et al., “Language models are few-shot learners,” Advances in neural information processing systems, vol. 33, pp. 1877–1901, 2020.
  6. A. Chowdhery, S. Narang, J. Devlin, M. Bosma, G. Mishra, A. Roberts, P. Barham, H. W. Chung, C. Sutton, S. Gehrmann et al., “Palm: Scaling language modeling with pathways,” arXiv preprint arXiv:2204.02311, 2022.
  7. H. W. Chung, L. Hou, S. Longpre, B. Zoph, Y. Tay, W. Fedus, E. Li, X. Wang, M. Dehghani, S. Brahma et al., “Scaling instruction-finetuned language models,” arXiv preprint arXiv:2210.11416, 2022.
  8. V. Sanh, A. Webson, C. Raffel, S. H. Bach, L. Sutawika, Z. Alyafeai, A. Chaffin, A. Stiegler, T. L. Scao, A. Raja et al., “Multitask prompted training enables zero-shot task generalization,” arXiv preprint arXiv:2110.08207, 2021.
  9. J. Huang and K. C.-C. Chang, “Towards reasoning in large language models: A survey,” arXiv preprint arXiv:2212.10403, 2022.
  10. J. Wei, X. Wang, D. Schuurmans, M. Bosma, E. Chi, Q. Le, and D. Zhou, “Chain of thought prompting elicits reasoning in large language models,” arXiv preprint arXiv:2201.11903, 2022.
  11. X. L. Li, A. Kuncoro, J. Hoffmann, C. de Masson d’Autume, P. Blunsom, and A. Nematzadeh, “A systematic investigation of commonsense knowledge in large language models,” in Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing.   Abu Dhabi, United Arab Emirates: Association for Computational Linguistics, Dec. 2022, pp. 11 838–11 855. [Online]. Available: https://aclanthology.org/2022.emnlp-main.812
  12. S. Park, S. Ryu, and E. Choi, “Do language models understand measurements?” in Findings of the Association for Computational Linguistics: EMNLP 2022.   Abu Dhabi, United Arab Emirates: Association for Computational Linguistics, Dec. 2022, pp. 1782–1792. [Online]. Available: https://aclanthology.org/2022.findings-emnlp.128
  13. N. Lee, K. Sreenivasan, J. D. Lee, K. Lee, and D. Papailiopoulos, “Teaching arithmetic to small transformers,” arXiv preprint arXiv:2307.03381, 2023.
  14. Z. Xie and S. Sun, “A goal-driven tree-structured neural model for math word problems.” in Ijcai, 2019, pp. 5299–5305.
  15. D. Dua, Y. Wang, P. Dasigi, G. Stanovsky, S. Singh, and M. Gardner, “DROP: A reading comprehension benchmark requiring discrete reasoning over paragraphs,” in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers).   Minneapolis, Minnesota: Association for Computational Linguistics, Jun. 2019, pp. 2368–2378. [Online]. Available: https://aclanthology.org/N19-1246
  16. M. Geva, A. Gupta, and J. Berant, “Injecting numerical reasoning skills into language models,” in ACL, 2020.
  17. A. Thawani, J. Pujara, and F. Ilievski, “Numeracy enhances the literacy of language models,” in Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing.   Online and Punta Cana, Dominican Republic: Association for Computational Linguistics, Nov. 2021, pp. 6960–6967. [Online]. Available: https://aclanthology.org/2021.emnlp-main.557
  18. B. Y. Lin, S. Lee, R. Khanna, and X. Ren, “Birds have four legs?! numersense: Probing numerical commonsense knowledge of pre-trained language models,” arXiv preprint arXiv:2005.00683, 2020.
  19. H. Liu and P. Singh, “Conceptnet—a practical commonsense reasoning tool-kit,” BT technology journal, vol. 22, no. 4, pp. 211–226, 2004.
  20. S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak, and Z. Ives, “Dbpedia: A nucleus for a web of open data,” in The semantic web.   Springer, 2007, pp. 722–735.
  21. K. Bollacker, C. Evans, P. Paritosh, T. Sturge, and J. Taylor, “Freebase: a collaboratively created graph database for structuring human knowledge,” in Proceedings of the 2008 ACM SIGMOD international conference on Management of data, 2008, pp. 1247–1250.
  22. D. Vrandečić and M. Krötzsch, “Wikidata: a free collaborative knowledgebase,” Communications of the ACM, vol. 57, no. 10, pp. 78–85, 2014.
  23. B. Xu, Y. Xu, J. Liang, C. Xie, B. Liang, W. Cui, and Y. Xiao, “Cn-dbpedia: A never-ending chinese knowledge extraction system,” in Advances in Artificial Intelligence: From Theory to Practice: 30th International Conference on Industrial Engineering and Other Applications of Applied Intelligent Systems, IEA/AIE 2017, Arras, France, June 27-30, 2017, Proceedings, Part II.   Springer, 2017, pp. 428–438.
  24. T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in vector space,” arXiv preprint arXiv:1301.3781, 2013.
  25. J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of deep bidirectional transformers for language understanding,” in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers).   Minneapolis, Minnesota: Association for Computational Linguistics, Jun. 2019, pp. 4171–4186. [Online]. Available: https://aclanthology.org/N19-1423
  26. Y. Wang, X. Liu, and S. Shi, “Deep neural solver for math word problems,” in Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing.   Copenhagen, Denmark: Association for Computational Linguistics, Sep. 2017, pp. 845–854. [Online]. Available: https://aclanthology.org/D17-1088
  27. W. Zhao, M. Shang, Y. Liu, L. Wang, and J. Liu, “Ape210k: A large-scale and template-rich dataset of math word problems,” arXiv preprint arXiv:2009.11506, 2020.
  28. V. Kumar, R. Maheshwary, and V. Pudi, “Practice makes a solver perfect: Data augmentation for math word problem solvers,” arXiv preprint arXiv:2205.00177, 2022.
  29. J. A. Sivakumar and N. S. Moosavi, “Fermat: An alternative to accuracy for numerical reasoning,” arXiv preprint arXiv:2305.17491, 2023.
  30. Z. Zhou, M. Ning, Q. Wang, J. Yao, W. Wang, X. Huang, and K. Huang, “Learning by analogy: Diverse questions generation in math word problem,” arXiv preprint arXiv:2306.09064, 2023.
  31. OpenAI, “Gpt-4 technical report,” 2023.
  32. L. Ouyang, J. Wu, X. Jiang, D. Almeida, C. Wainwright, P. Mishkin, C. Zhang, S. Agarwal, K. Slama, A. Ray et al., “Training language models to follow instructions with human feedback,” Advances in Neural Information Processing Systems, vol. 35, pp. 27 730–27 744, 2022.
  33. R. Anil, A. M. Dai, O. Firat, M. Johnson, D. Lepikhin, A. Passos, S. Shakeri, E. Taropa, P. Bailey, Z. Chen, E. Chu, J. H. Clark, L. E. Shafey, Y. Huang, K. Meier-Hellstern, G. Mishra, E. Moreira, M. Omernick, K. Robinson, S. Ruder, Y. Tay, K. Xiao, Y. Xu, Y. Zhang, G. H. Abrego, J. Ahn, J. Austin, P. Barham, J. Botha, J. Bradbury, S. Brahma, K. Brooks, M. Catasta, Y. Cheng, C. Cherry, C. A. Choquette-Choo, A. Chowdhery, C. Crepy, S. Dave, M. Dehghani, S. Dev, J. Devlin, M. Díaz, N. Du, E. Dyer, V. Feinberg, F. Feng, V. Fienber, M. Freitag, X. Garcia, S. Gehrmann, L. Gonzalez, G. Gur-Ari, S. Hand, H. Hashemi, L. Hou, J. Howland, A. Hu, J. Hui, J. Hurwitz, M. Isard, A. Ittycheriah, M. Jagielski, W. Jia, K. Kenealy, M. Krikun, S. Kudugunta, C. Lan, K. Lee, B. Lee, E. Li, M. Li, W. Li, Y. Li, J. Li, H. Lim, H. Lin, Z. Liu, F. Liu, M. Maggioni, A. Mahendru, J. Maynez, V. Misra, M. Moussalem, Z. Nado, J. Nham, E. Ni, A. Nystrom, A. Parrish, M. Pellat, M. Polacek, A. Polozov, R. Pope, S. Qiao, E. Reif, B. Richter, P. Riley, A. C. Ros, A. Roy, B. Saeta, R. Samuel, R. Shelby, A. Slone, D. Smilkov, D. R. So, D. Sohn, S. Tokumine, D. Valter, V. Vasudevan, K. Vodrahalli, X. Wang, P. Wang, Z. Wang, T. Wang, J. Wieting, Y. Wu, K. Xu, Y. Xu, L. Xue, P. Yin, J. Yu, Q. Zhang, S. Zheng, C. Zheng, W. Zhou, D. Zhou, S. Petrov, and Y. Wu, “Palm 2 technical report,” 2023.
  34. H. Touvron, T. Lavril, G. Izacard, X. Martinet, M.-A. Lachaux, T. Lacroix, B. Rozière, N. Goyal, E. Hambro, F. Azhar et al., “Llama: Open and efficient foundation language models,” arXiv preprint arXiv:2302.13971, 2023.
  35. G. Wang, S. Cheng, X. Zhan, X. Li, S. Song, and Y. Liu, “Openchat: Advancing open-source language models with mixed-quality data,” arXiv preprint arXiv:2309.11235, 2023.
  36. N. Gupta, K. Lin, D. Roth, S. Singh, and M. Gardner, “Neural module networks for reasoning over text,” arXiv preprint arXiv:1912.04971, 2019.
  37. X. Zhang, D. Ramachandran, I. Tenney, Y. Elazar, and D. Roth, “Do language embeddings capture scales?” in Findings of the Association for Computational Linguistics: EMNLP 2020.   Online: Association for Computational Linguistics, Nov. 2020, pp. 4889–4896. [Online]. Available: https://aclanthology.org/2020.findings-emnlp.439
  38. Y. Elazar, A. Mahabal, D. Ramachandran, T. Bedrax-Weiss, and D. Roth, “How large are lions? inducing distributions over quantitative attributes,” arXiv preprint arXiv:1906.01327, 2019.
  39. A. Ravichander, A. Naik, C. Rose, and E. Hovy, “Equate: A benchmark evaluation framework for quantitative reasoning in natural language inference,” arXiv preprint arXiv:1901.03735, 2019.
  40. L. Gao, A. Madaan, S. Zhou, U. Alon, P. Liu, Y. Yang, J. Callan, and G. Neubig, “Pal: Program-aided language models,” in International Conference on Machine Learning.   PMLR, 2023, pp. 10 764–10 799.
  41. T. Schick, J. Dwivedi-Yu, R. Dessì, R. Raileanu, M. Lomeli, L. Zettlemoyer, N. Cancedda, and T. Scialom, “Toolformer: Language models can teach themselves to use tools,” arXiv preprint arXiv:2302.04761, 2023.
  42. B. Paranjape, S. Lundberg, S. Singh, H. Hajishirzi, L. Zettlemoyer, and M. T. Ribeiro, “Art: Automatic multi-step reasoning and tool-use for large language models,” arXiv preprint arXiv:2303.09014, 2023.
  43. S. Yao, J. Zhao, D. Yu, N. Du, I. Shafran, K. Narasimhan, and Y. Cao, “React: Synergizing reasoning and acting in language models,” arXiv preprint arXiv:2210.03629, 2022.
Citations (2)

Summary

We haven't generated a summary for this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube