Large Language Models for Mathematicians
Abstract: LLMs such as ChatGPT have received immense interest for their general-purpose language understanding and, in particular, their ability to generate high-quality text or computer code. For many professions, LLMs represent an invaluable tool that can speed up and improve the quality of work. In this note, we discuss to what extent they can aid professional mathematicians. We first provide a mathematical description of the transformer model used in all modern LLMs. Based on recent studies, we then outline best practices and potential issues and report on the mathematical abilities of LLMs. Finally, we shed light on the potential of LLMs to change how mathematicians work.
- Layer normalization. arXiv preprint arXiv:1607.06450, 2016.
- The Coq proof assistant reference manual: Version 6.1. PhD thesis, Inria, 1997.
- A neural probabilistic language model. Advances in Neural Information Processing Systems, 13, 2000.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
- Evaluating language models for mathematics through interactions. arXiv preprint arXiv:2306.01694, 2023.
- The Lean theorem prover (system description). In Automated Deduction-CADE-25: 25th International Conference on Automated Deduction, pages 378–388, 2015.
- BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
- GLaM: Efficient scaling of language models with mixture-of-experts. In International Conference on Machine Learning, pages 5547–5569. PMLR, 2022.
- R. Durrett. Probability: Theory and Examples. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, 2019.
- GPTs are GPTs: An early look at the labor market impact potential of large language models. arXiv preprint arXiv:2303.10130, 2023.
- Baldur: whole-proof generation and repair with large language models. arXiv preprint arXiv:2303.04910, 2023.
- LLM vs ITP. In The 3rd Workshop on Mathematical Reasoning and AI at NeurIPS’23, 2023.
- Mathematical capabilities of ChatGPT. In Advances in Neural Information Processing Systems, volume 36, 2023.
- The Pile: An 800GB dataset of diverse text for language modeling. arXiv preprint arXiv:2101.00027, 2020.
- Transformer language models without positional encodings still learn positional information. arXiv preprint arXiv:2203.16634, 2022.
- Measuring mathematical problem solving with the math dataset. arXiv preprint arXiv:2103.03874, 2021.
- D. Hendrycks and K. Gimpel. Gaussian error linear units (GELUs). arXiv preprint arXiv:1606.08415, 2016.
- The curious case of neural text degeneration. arXiv preprint arXiv:1904.09751, 2019.
- Humans are still better than ChatGPT: Case of the IEEEXtreme competition. arXiv preprint arXiv:2305.06934, 2023.
- C. Li. OpenAI’s GPT-3 language model: A technical overview, 2020. https://lambdalabs.com/blog/demystifying-gpt-3.
- Multi-head or single-head? an empirical comparison for transformer training. arXiv preprint arXiv:2106.09650, 2021.
- RoBERTa: A robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692, 2019.
- Recent advances in natural language processing via large pre-trained language models: A survey. ACM Computing Surveys, 56(2):1–40, 2023.
- J. R. Munkres. Topology. Prentice-Hall, 2000.
- OpenAI. Introducing ChatGPT, 2022. https://openai.com/blog/chatgpt.
- OpenAI. GPT-4 technical report. arXiv preprint 2303.0877, 2023.
- Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744, 2022.
- Improving language understanding by generative pre-training, 2018. https://openai.com/research/language-unsupervised.
- Language models are unsupervised multitask learners, 2019. https://github.com/openai/gpt-2.
- PanGu-ΣΣ\Sigmaroman_Σ: Towards trillion parameter language model with sparse heterogeneous computing. arXiv preprint arXiv:2303.10845, 2023.
- W. Rudin. Functional analysis. McgGraw-Hill, 1991.
- DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108, 2019.
- Toolformer: Language models can teach themselves to use tools. arXiv preprint arXiv:2302.04761, 2023.
- Neural machine translation of rare words with subword units. arXiv preprint arXiv:1508.07909, 2015.
- Energy and policy considerations for deep learning in NLP. arXiv preprint arXiv:1906.02243, 2019.
- BERT rediscovers the classical NLP pipeline. arXiv preprint arXiv:1905.05950, 2019.
- LaMDA: Language models for dialog applications. arXiv preprint arXiv:2201.08239, 2022.
- Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023.
- Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023.
- Attention is all you need. Advances in Neural Information Processing Systems, 30, 2017.
- Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, 2020.
- K. Yang and J. Deng. Learning to prove theorems via interacting with proof assistants. In International Conference on Machine Learning, pages 6984–6994. PMLR, 2019.
- LeanDojo: Theorem proving with retrieval-augmented language models. arXiv preprint arXiv:2306.15626, 2023.
- A survey of large language models. arXiv preprint arXiv:2303.18223, 2023.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.