Papers
Topics
Authors
Recent
Search
2000 character limit reached

Mathlib4: Lean 4 Mathematical Library

Updated 2 July 2026
  • Mathlib4 is a community-maintained formal mathematical library for Lean 4, offering over 100,000 declarations and 44,000 tactic proofs across disciplines.
  • It utilizes a DAG-based file structure with rigorous kernel-checked proofs and systematic namespace organization to ensure consistency and extensibility.
  • The library drives research in mechanized mathematics, enabling automated proof verification, formalization methodologies, and machine-augmented theorem discovery.

Mathlib4 is the flagship, community-maintained mathematical library for the Lean 4 proof assistant. Serving as the foundational formal library for Lean 4, it provides a unified, extensible corpus supporting advanced formalization in algebra, analysis, topology, logic, category theory, combinatorics, and beyond. Characterized by rigorous kernel-checked proofs, systematic file and namespace organization, and active contribution workflows, mathlib4 is central to research in mechanized mathematics, formal proof automation, and machine-augmented theorem discovery.

1. Structural Foundations and Coverage

Mathlib4 is architected as a directed acyclic graph (DAG) of Lean 4 files and namespaces, with explicit imports determining dependency structure. By 2026, mathlib4 encompasses over 100,000 declarations and 44,000 tactic-style proofs, with theorems, definitions, typeclasses, and inductive types present across all major theorem-proving domains. Its documented community guidelines ensure uniform proof and naming style, enabling both maintainability and extensibility (Gao et al., 2024).

Notable modules include deep hierarchies such as Algebra, Analysis, CategoryTheory, and Topology, supplemented by specialized areas like combinatorics, finite group theory, and functional analysis. Examples of major developments include the classification of groups of order p3p^3 (Xiang, 20 Jun 2026), formal elementary number theory (e.g., Mason–Stothers and Fermat over k[X]k[X] (Baek et al., 2024)), formalizations of Stokes' theorem for smooth singular cubes (Hulak et al., 1 May 2026), and bridges to tactic-based automation (Gladshtein et al., 2024).

The library maintains high consistency through structural metadata, including machine-readable theorem and proof information, leveraged by translation pipelines and toolchains for both formal and natural language representations (Gao et al., 2024).

2. Contribution Model, Growth Dynamics, and Engineering

Mathlib4 is under active collaborative development, with approximately 19,867 commits from August 2023 to March 2025, and the total count of Lean source files reaching 9,762 by April 2026. Empirical studies demonstrate that mathlib4's growth initially followed a pure power-law regime, but as the library approached a more mature, saturated state, new file additions slowed, favoring a saturating power-law model:

  • Pure Power-Law: dN/dt=KNkdN/dt = K N^k
  • Saturating Power-Law: dN/dt=KNkeμNdN/dt = K N^k e^{-\mu N}

In forecasting tasks, the saturating law outperformed the pure power-law by approximately 7× (RMSE 3,279 vs 23,896) on held-out data for new file growth, indicating that mathlib4 is now operating in its saturation regime, in contrast to Coq's mathcomp, which still fits a pure power-law (Rovai, 14 May 2026). This transition reflects both architectural completeness and finite domain coverage.

APE-Bench I (Xin et al., 27 Apr 2025), a large-scale benchmark derived from mathlib4's commit history, emphasizes proof engineering tasks such as feature addition, proof refactoring, and bug fixing. It highlights a distribution with roughly 45% feature additions, 30% refactorings, and 25% bug fixes, spanning all major submodules. Automated tools like Eleanstic use content-addressable storage and semantic LLM-based proof verification for scalable proof engineering, reflecting the infrastructural sophistication underpinning mathlib4's evolution.

3. Search, Translation, and Data Extraction

The large scale, coupled with terse machine-oriented naming conventions, made semantic search and translation in mathlib4 a challenge. Theorems frequently appear under identifiers that obscure their mathematical intent (e.g., Cauchy's Mean Value Theorem is named exists_ratio_deriv_eq_ratio_slope) (Gao et al., 2024). Many items lack docstrings, impeding both human and automated discovery.

  • Semantic Theorem Search: Dense retrieval systems, such as the leansearch.net engine, employ LLM-generated informal statements and hybrid Lean 4/natural language corpora to facilitate high-precision semantic lookup. Augmented queries and instruction-tuned embedding models (E5_mistral-7b, OpenAI text-embedding-3-large) yield recall@10 up to 0.91 and nDCG@20 over 0.73, outperforming traditional knowledge-based search tools (see table below).
Model (with informalization) nDCG@20 Precision@10 Recall@10
E5_mistral-7b 0.733 0.196 0.913
OpenAI v3 (3-large) 0.691 0.178 0.837
Moogle (baseline) 0.365 0.092 0.513

LLM-driven informalization and bi-encoder embeddings, paired with vector search, overcome documentation and naming hurdles (Gao et al., 2024).

  • Natural Language ↔ Formal Language Translation: The Herald dataset, built on mathlib4, provides a large parallel corpus of theorem and proof pairs in both formal Lean 4 and natural language, leveraging structural metadata (dependencies, docstrings, proof steps). Its translation pipeline includes retrieval-augmented demonstration, dual augmentation (tactic-based, informal-based), and human-expert feedback loops. The Herald Translator achieves 93.2% Pass@128 on the miniF2F-test, substantially outperforming InternLM2-Math-Plus-7B (74.0%) and TheoremLlama (50.1%) (Gao et al., 2024).

Section-level translation pipelines enable the auto-formalization of entire segments of graduate-level texts, with real-world deployment demonstrated on sections from the Stacks Project.

4. Extensible Formalization: Case Studies and Patterns

Mathlib4 supports advanced, extensible formalization methodologies and workflows. Notable examples include:

  • Group Theory: The complete classification of groups of order p3p^3 (Xiang, 20 Jun 2026), partitioned across six Lean files, leverages mathlib4's abelian classification, Sylow theory, and bespoke group constructions (Heisenberg, semidirect product, dihedral and quaternion groups). Techniques include the use of structure for algebraic constructions, the Subgroup.zpowers subgroups API, and fine-grained pattern-matching tactics. All resulting theorems are mutual nonisomorphism statements, explicit isomorphism constructions, and exponent dichotomy for non-abelian p3p^3-groups.
  • Analysis and Topology: Stokes' theorem for smooth singular cubes (Hulak et al., 1 May 2026) is formulated entirely in Lean 4 via mathlib4 structures, Bochner integration, singular cubical chain complexes, and bridges to the abstract extDeriv API. Supporting proofs of chain-level 2=0\partial^2 = 0 and connections to classical calculus (Green's theorem, FTC) highlight the depth and breadth of analysis infrastructure.
  • Elementary Number Theory: The formalization of Mason–Stothers and its corollaries (polynomial abc, function field FLT, Davenport, non-parametrizability) was achieved by leveraging mathlib4’s existing UFD, normalization, and bilinear form theories, introducing new Wronskian and radical APIs for polynomials and monoids (Baek et al., 2024).
  • Tactic Methodologies: The LeanSSR package brings SSReflect-style reflection and rewriting to Lean 4, with full integration into mathlib4’s finite set theory and cardinality lemmas. Meta-programmed tactic languages, macro-expansion, and state extensions support concise, maintainable formal scripts (Gladshtein et al., 2024).

5. Logical and Topological Organization

Empirical studies on mathlib4’s multilayer dependency topology demonstrate strong alignment between import-based and co-development-based hub profiles (Pearson rdeg=0.777r_{\deg}=0.777, Spearman ρdeg=0.733\rho_{\deg}=0.733, p=0.004p=0.004). High-persistence modules (such as Algebra) serve as both foundational imports and active development hubs, while rank-divergent modules (e.g., CategoryTheory, Order) reveal decoupled usage versus development layers (Ivanov, 1 Jun 2026).

Betweenness-based hub persistence (k[X]k[X]0) exposes critical load-bearing “bottleneck modules,” directly correlating with “operational logic” as domain experts understand it. This multilayer structural analysis illuminates both historical evolution and optimal points for automation, maintenance, and education.

A plausible implication is that curriculum and onboarding efforts should prioritize foundational hubs (e.g., Algebra, Analysis) and target stable versus fast-evolving modules in proportion to developer expertise and the desired stability of APIs.

6. Automation, Benchmarks, and Future Directions

  • Proof Engineering: Mathlib4's real-world edit history underpins large-scale benchmarks such as APE-Bench I, supporting realistic file-level proof engineering (addition, refactoring, bug fixing). Automation pipelines, e.g., Eleanstic, afford parallel rebuilds, content-addressable file system snapshots, and hybrid syntactic/semantic verification combining the Lean compiler and LLM-based semantic judgments (Xin et al., 27 Apr 2025).
  • LLM Integration: Both proof search (semantics-driven) and formal-language translation pipelines exemplify bidirectional cross-pollination between mathlib4 and LLMs. Experiments demonstrate sharp drop-off in LLM performance on complex or non-localized edits, motivating ongoing research into agentic and multi-file editing workflows, as well as project-scale automated verification.
  • Library Saturation and Scaling: The growth dynamics and topological analysis indicate that mathlib4 is entering a maturity (saturation) phase, with future work organized around deeper automation, rigorous benchmarks (multi-file, agentic), and maintaining efficiency in the face of architectural closure (Rovai, 14 May 2026).
  • Open-Source Commitment: Datasets (e.g., Herald, APE-Bench I), trained models, and verification code are systematically released for reproducibility, extensibility, and downstream research (Gao et al., 2024, Xin et al., 27 Apr 2025).

7. Summary Table: Key Mathlib4 Research Results and Tools

Area Research/Tool Functionality Reference
Group Theory p³-classification Full formalization of k[X]k[X]1 groups (Abelian + Non-Abelian cases) (Xiang, 20 Jun 2026)
Search & Translation LeanSearch / Herald Semantic theorem search, NL↔FL auto-translation benchmarks (Gao et al., 2024, Gao et al., 2024)
Growth & Dynamics Scaling laws, topology Quantified scaling, role geometry, saturation regime (Rovai, 14 May 2026, Ivanov, 1 Jun 2026)
Tactic Language LeanSSR SSR-style tactic language and finite set proofs (Gladshtein et al., 2024)
Proof Engineering APE-Bench I, Eleanstic Real-world proof-editing, agentic benchmarks, parallel verification (Xin et al., 27 Apr 2025)
Analysis/Topology Stokes’ theorem Fully-mechanized chain-level Stokes, box and singular cubes (Hulak et al., 1 May 2026)
Number Theory Mason–Stothers Polynomial abc, FLT, elliptic non-parametrization, Davenport (Baek et al., 2024)

Mathlib4 thus represents a mature, extensible mathematical corpus supporting advanced formalization, research in proof automation, and data-driven approaches to mechanized reasoning, scaling toward the demands of modern mathematical science.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Mathlib4.