mathlib4: Lean 4's Formal Math Library

Updated 28 February 2026

mathlib4 is the central mathematical library for Lean 4, offering rigorously verified definitions and theorems across algebra, analysis, topology, and more.
It enables both interactive theorem proving and automated reasoning with efficient tactic-based strategies like LeanSSR that reduce proof sizes without performance loss.
The library underpins advanced applications such as ML dataset creation, semantic search, and large-scale theorem generation, driving innovations in formal proof engineering.

mathlib4 is the central mathematical library for the Lean 4 theorem prover, providing a comprehensive, rigorously verified foundation for formalizing undergraduate to research-level mathematics. It enables not only formal proof development but also supports a broad ecosystem of toolchains, machine learning datasets, and automated reasoning systems.

1. Scope, Design, and Role of mathlib4

mathlib4 serves as Lean 4’s principal mathematics library, encompassing thousands of formally verified definitions, theorems, and proofs covering diverse domains such as algebra, analysis, topology, combinatorics, and number theory. At the time of recent studies, the library comprised tens of thousands of declarations and over 110,000 tactic-based proofs (Gao et al., 2024). Its infrastructure underlies much of Lean 4’s formal ecosystem and is essential for both interactive theorem proving and the development of automated reasoning systems (Gao et al., 2024).

mathlib4 is the de facto standard for new formalization efforts in Lean 4. It organizes content according to a consistent naming convention, namespace structure, and modular hierarchy. This community-driven library is the reference point for both human users and downstream AI applications, supporting everything from undergraduate calculus to advanced research topics (Gao et al., 2024).

2. Foundational Contributions to Formalization and Proof Engineering

mathlib4’s structure and content enable a range of proof methodologies and architectural patterns:

Formalization Workflows: mathlib4 provides core abstract algebraic structures (e.g., UniqueFactorizationMonoid), combinatorics, set theory, Euclidean domains, and associated canonical notations. Notable is its support for importing, extending, and integrating proof techniques across domains, such as the formalization of Mason–Stothers' Theorem and its corollaries in polynomial algebras (Baek et al., 2024).
Proof Strategy Patterns: mathlib4 supports traditional tactic-based scripting as well as new paradigms such as LeanSSR, which provides small-scale reflection capabilities directly within Lean 4. LeanSSR effectively unifies computational reduction and logical rewriting, yielding concise, maintainable, and forward- or backward-style proof scripts. Refactorings in the finite-set library, for example, demonstrate proof-size reductions of 30–50% with no measurable performance penalty. The extensibility of LeanSSR—a result of its implementation purely at the Lean 4 library level—makes it straightforward for users to add new patterns or reflective instances (Gladshtein et al., 2024).

3. mathlib4 as an Enabler for Automated Reasoning and Machine Learning

mathlib4 provides the formal corpus for the construction of heterogeneous network datasets and large-scale machine learning benchmarks:

Graph-Based Datasets: The MLFMF dataset represents mathlib4 as a heterogeneous directed multigraph $G = (V, E)$ , where nodes comprise the library, modules, and entries (definitions, theorems, datatypes), and edges capture containment and reference relationships in both declarations and proof bodies. Each entry’s abstract syntax tree is exported as a three-part s-expression for machine learning applications (Bauer et al., 2023).
Baseline Link Prediction and Recommendation: Experimental results highlight the value of graph structure: node2vec embeddings with a tree-bagging classifier yield 95% edge prediction accuracy and a mean minimal rank of 195 for relevant entry recommendations, outperforming purely text-based baselines (Bauer et al., 2023).
Parallel NL–FL Datasets: Herald constructs large-scale, high-fidelity parallel natural language (NL) to formal language (FL) datasets from mathlib4. By extracting, informalizing, and augmenting every statement and tactic-based proof, Herald enables the direct training and evaluation of autoformalization models, leading to state-of-the-art translation accuracy (Gao et al., 2024).
Concept-Centric Indexing: CRAMF automatically constructs a structured concept–definition knowledge base from mathlib4, capturing more than 26,000 formal definitions linked to over 1,000 core mathematical concepts. This supports retrieval-augmented generation and enables domain- and context-aware automated formalization (Lu et al., 9 Aug 2025).

4. Search Infrastructure and Semantic Benchmarking

The complexity and vastness of mathlib4 make semantic retrieval essential:

Search Engine Architecture: A dedicated semantic search system parses every mathlib4 theorem into Lean syntax and natural language, then embeds both in a shared vector space. At query time, user-supplied informal mathematical queries are processed by LLMs to generate both formal and informal equivalents, encoded and compared via dense vector representations (e.g., cosine similarity in ℝᵈ). Fast retrieval is implemented via HNSW-based nearest neighbor search (Gao et al., 2024).
Benchmarking and Model Comparison: A retrieval benchmark consisting of 50 queries across 18 mathematical intent groups reveals that semantic search engines leveraging both formal and informal text, with query augmentation, achieve nDCG@20 up to 0.733, Precision@10 up to 0.196, and Recall@10 up to 0.913 with E5_mistral-7b embeddings. This markedly surpasses both proprietary baselines and single-channel formal or informal embedding methods.

Model (F: Formal, IF: Informal)	nDCG@20	Precision@10	Recall@10
OpenAI 3-large (F+IF)	0.691	0.178	0.837
E5_mistral-7b (F+IF)	0.733	0.196	0.913

Query augmentation and dual text channels thus yield state-of-the-art retrieval for informal-to-formal mathematical search (Gao et al., 2024).

5. mathlib4 as a Foundation for Programmatic and Data-Driven Proof Generation

The structural regularity and accessibility of mathlib4 support advanced automated proving and proof engineering pipelines:

Large-Scale Theorem Generation: LeanNavigator explores mathlib4’s entire theorem corpus, constructing state transition graphs of proof states linked by tactic applications. By exhaustively traversing these graphs, it generates 4.7 million new, machine-verifiable theorems, each accompanied by a minimal proof of up to eight tactics. This dataset comprises 1 billion Lean tokens—over an order of magnitude larger than previous corpora—and demonstrably improves automated proof generation models on MIL and MiniF2F benchmarks (Yin et al., 16 Feb 2025).
Proof Engineering Benchmarks: APE-Bench I systematically mines mathlib4’s Git commit history to extract 10,928 file-level proof engineering tasks based on real-world feature additions, refactorings, and bug fixes. Each candidate task is rigorously validated by both Lean 4 compilation and LLM-based semantic judgment. State-of-the-art LLMs show significant performance degradation on structurally complex or multi-line file edits, with best case (o3-mini) reaching only 33.58% semantic pass@16. These results highlight outstanding challenges for automated agents in real-world proof engineering (Xin et al., 27 Apr 2025).

6. Applications in Formalizing Advanced Mathematics and Toolchain Integration

mathlib4’s infrastructure enables rapid formalization of advanced results and facilitates their integration into the library:

Algebraic and Number-Theoretic Formalizations: The formalization of the Mason–Stothers theorem and its numerous corollaries—including a polynomial form of Fermat’s Last Theorem, Davenport’s theorem, and obstructions to elliptic curve parametrizations—relies on reusable mathlib4 components such as polynomial structures, finite multisets, and bilinear form properties. New definitions (e.g., wronskian, radical, divRadical) are integrated directly into mathlib4 with systematic naming and documentation (Baek et al., 2024).
Proof Tactics and Refactoring: LeanSSR and related tactic frameworks, built atop mathlib4, provide concise paradigms that unify rewriting and computation, further enhancing maintainability and user onboarding (Gladshtein et al., 2024).
Automated and Human-in-the-Loop Formalization: By leveraging mathlib4 via the Herald pipeline’s hierarchical data extraction, retrieval-augmented prompting, and sophisticated augmentation strategies, LLMs achieve statement-level formalization accuracy of 93.2% on miniF2F (Pass@128), surpassing previous large models by substantial margins. Section-level translation is feasible, with demonstrated translations of multi-theorem Stacks Project sections into runnable Lean 4 files (Gao et al., 2024).

7. Evolving Ecosystem and Future Directions

mathlib4 continues to act as a fertile ground for advances in formal mathematics, proof automation, and AI-mathematics integration:

Benchmarking and Open Datasets: The graph, sequential, and parallel NL–FL datasets released from mathlib4 are foundational for evaluating models for premise selection, autoformalization, and proof synthesis (Bauer et al., 2023, Gao et al., 2024, Lu et al., 9 Aug 2025).
Integration of Automated Agents: Roadmaps for benchmarks such as APE-Bench II/III call for multi-file, project-scale, and autonomous agent evaluation—directly applicable to mathlib4’s PR and CI workflows (Xin et al., 27 Apr 2025).
Continual Growth and Tool Interoperability: Ongoing merging, refactoring, and doc-string improvements in mathlib4, alongside compatibility with domain-specialized LLMs and external automated theorem provers, suggest sustained expansion and utility (Baek et al., 2024, Gao et al., 2024).

mathlib4 thus remains both an essential foundation for mechanized mathematics in Lean 4 and a primary engine for research at the intersection of formalization, proof engineering, and AI-based mathematical reasoning.