Papers

Topics

Authors

Recent

View all

Gemini 2.5 Flash

Gemini 2.5 Flash 88 tok/s

Gemini 2.5 Pro 54 tok/s Pro

GPT-5 Medium 18 tok/s

GPT-5 High 27 tok/s Pro

GPT-4o 97 tok/s

GPT OSS 120B 451 tok/s Pro

Kimi K2 212 tok/s Pro

2000 character limit reached

Llemma: An Open Language Model For Mathematics (2310.10631v3)

Published 16 Oct 2023 in cs.CL, cs.AI, and cs.LO

Abstract: We present Llemma, a LLM for mathematics. We continue pretraining Code Llama on the Proof-Pile-2, a mixture of scientific papers, web data containing mathematics, and mathematical code, yielding Llemma. On the MATH benchmark Llemma outperforms all known open base models, as well as the unreleased Minerva model suite on an equi-parameter basis. Moreover, Llemma is capable of tool use and formal theorem proving without any further finetuning. We openly release all artifacts, including 7 billion and 34 billion parameter models, the Proof-Pile-2, and code to replicate our experiments.

References (87)

Citations (209)

View on Semantic Scholar

Collections

Summary

The paper introduces Llemma, a specialized language model that significantly outperforms open-base models on mathematical benchmarks.
It employs continued pretraining on a 55-billion-token, math-focused dataset with techniques like FlashAttention to enhance reasoning capabilities.
Llemma demonstrates robust performance in formal proof generation and tool-augmented problem solving, marking a milestone in math language modeling.

An Overview of Llemma: Optimizing LLMs for Mathematical Problems

The paper introduces "Llemma," a specialized LLM for addressing mathematical problems, built upon the foundation of Code Llama. The development of Llemma is motivated by the requirements of domain-specific LLMs in effectively handling mathematical reasoning tasks, an area where generalist models fall short.

Model and Training

Llemma’s architecture is based on the Code Llama variants with 7 billion and 34 billion parameters. It undergoes further pretraining on a meticulously curated dataset named "Proof," which comprises 55 billion tokens from scientific papers, web data, and code specifically focusing on mathematics. This strategy of continued pretraining is grounded in the hypothesis that mathematical problem-solving demands intensive domain-specific knowledge and reasoning capabilities.

The training methodology involves autoregressive LLMing with optimizations like FlashAttention for enhanced computational efficiency. The dataset for training is thoughtfully composed of scientific literature, quality web content, and a selection of mathematical code that includes formally verified proofs.

Evaluation of Mathematical Proficiency

The paper emphasizes rigorous evaluation over several benchmarks:

Mathematical Problem Solving: Benchmarks such as MATH and GSM8k were employed to test Llemma’s capabilities in solving mathematical problems under conditions without external computational aid.
Tool Augmentation: Llemma’s effectiveness is further explored in scenarios allowing the use of tools, such as Python interpreters for intermediate computational steps, enhancing its problem-solving performance.
Formal Mathematics: Llemma is capable of generating formal proofs using frameworks like Lean and Isabelle, which are critical for validating mathematical reasoning in a more formal and structured environment.

Performance and Competitiveness

The results indicate that Llemma outperforms all open-base models on the MATH benchmark and offers competitive performance when compared to proprietary models like Minerva. Most notably, the continued pretraining on the domain-specific Proof dataset directly contributes to significant improvements over the original Code Llama models.

The open access nature of the Llemma models and datasets encourages further research and development, providing a robust base for future advancements in mathematical LLMing.

Implications and Future Directions

Llemma’s development and public release mark a significant milestone in the application of LLMs to mathematical domains. The implications of this work span both practical applications where precise mathematical reasoning is required and theoretical advancements in model training methodologies. Future directions could involve refining Llemma’s capabilities in diverse mathematical subfields and enhancing its integration with formal verification tools.

In conclusion, Llemma embodies a successful case of domain-specific adaptation in LLMs, offering insights and a framework that can be extrapolated to other specialized domains in AI research.

PDF Markdown

Paper Prompts

Explore 10 Community Prompts

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Generate Now

Authors (9)

Tweets

https://twitter.com/AiEleuther/status/1757252926035161326

https://twitter.com/eclecticruchira/status/1786413354564616638

https://twitter.com/ArcadianComp/status/1753474621951672714

YouTube

Show All Videos