Emergent Mind

Mathematical Language Models: A Survey

Published Dec 12, 2023 in cs.CL


In recent years, there has been remarkable progress in leveraging Language Models (LMs), encompassing Pre-trained Language Models (PLMs) and Large-scale Language Models (LLMs), within the domain of mathematics. This paper conducts a comprehensive survey of mathematical LMs, systematically categorizing pivotal research endeavors from two distinct perspectives: tasks and methodologies. The landscape reveals a large number of proposed mathematical LLMs, which are further delineated into instruction learning, tool-based methods, fundamental CoT techniques, and advanced CoT methodologies. In addition, our survey entails the compilation of over 60 mathematical datasets, including training datasets, benchmark datasets, and augmented datasets. Addressing the primary challenges and delineating future trajectories within the field of mathematical LMs, this survey is positioned as a valuable resource, poised to facilitate and inspire future innovation among researchers invested in advancing this domain.
Differences between Autoregression Language Models and Non-Autoregression Language Models.


  • This survey examines mathematical language models, specifically pre-trained and large-scale models, and their utility in computational and reasoning tasks within mathematics.

  • Language models have evolved from simple arithmetic operations to handling complex mathematical reasoning, with strategies like tool-assisted solving and thought process generation.

  • The paper categorizes methodologies within language models and highlights autoregression and non-autoregression approaches as core to understanding mathematical expressions.

  • It discusses over 60 datasets crucial for training and assessing mathematical models, and addresses challenges faced in model fidelity, multi-modal capabilities, and educational applications.

  • The conclusion stresses the significant potential of mathematical language models in transforming mathematical problem-solving and encourages further research in this burgeoning area.


Recent advancements in language models have led to remarkable progress in their application within the field of mathematics. This survey concentrates on mathematical language models (LMs), which include Pre-trained Language Models (PLMs) and Large-scale Language Models (LLMs). These models play a pivotal role in addressing various mathematical tasks such as performing calculations and reasoning, and this ability is transforming mathematical exploration and practical usage.

Mathematical Tasks

Mathematical tasks tackled by LMs fall into two main categories: mathematical calculation and mathematical reasoning. Mathematical calculation primarily involves arithmetic operations and the representation of numerical data. Initially, LMs portrayed basic computational skills using textual number representations; however, over time, they have evolved to handle arithmetic operations more proficiently. Approaches like GenBERT and NF-NSM have inserted numeric data directly into PLMs to enhance their mathematical performance.

Mathematical reasoning, on the other hand, involves solving complex problems that require logical thought processes. Recent studies show that LLMs can generate elaborate chains of thoughts when provided with exemplary reasoning examples, achieving higher success rates on various tasks.

Language Model Methodologies

The methodologies employed in achieving mathematical proficiency can be categorized based on their underlying PLMs and LLMs structures. Among PLMs, autoregression (ALMs) and non-autoregression (NALMs) are two significant approaches used to comprehend and generate mathematical expressions. LLMs employ strategies such as instruction learning, tool-based methods, and chain-of-thought (CoT) techniques to improve mathematical reasoning capabilities. They draw on tools like symbolic solvers and computer programs to assist in problem-solving, while also leveraging training techniques like fine-tuning to bolster performance on certain arithmetic tasks.

Datasets and Challenges

Over 60 mathematical datasets have been compiled, which are divided into training datasets, benchmark datasets, and augmented datasets. These datasets are critical in both the training and evaluation of mathematical models and cover a wide range of complexity levels, from basic arithmetic to advanced theorem proving.

Despite the progress, challenges persist, including ensuring faithfulness in model output, enhancing multi-modal capabilities to handle non-textual mathematical information, managing uncertainty in calculations, devising robust evaluation metrics, and finding applications in educational settings as teaching aids.


The intersection of artificial intelligence and mathematical problem-solving is witnessing significant expansion, driven by the innovative capabilities of mathematical language models. By addressing existing hurdles and harnessing the potential of PLMs and LLMs, these models are poised to revolutionize the domain of mathematics and its numerous applications. This survey aims to ignite further research by providing a detailed account of current successes, areas for growth, and outlining prospective directions for advancements in this exciting field.

Get summaries of trending AI papers delivered straight to your inbox

Unsubscribe anytime.