LeanHammer: Neural ATP for Lean 4

Updated 2 June 2026

LeanHammer is an end-to-end system that utilizes a Transformer-based neural retriever for dynamic premise selection in Lean proofs.
It orchestrates both internal proof search with Aesop and external proving via Zipperposition, followed by proof reconstruction using Duper.
Empirical results show a 21% increase in goal solvability, demonstrating significant automation improvements in the Lean 4 ecosystem.

LeanHammer is an end-to-end, domain-general hammer for the Lean proof assistant, introducing deep neural premise selection, ATP orchestration, and integrated proof reconstruction tailored to dependent type theory. The tactic leverages a specialized transformer-based retriever for the selection of library facts, robust translation pipelines for interoperability with higher-order logic ATPs such as Zipperposition, and completes the automated reasoning workflow by reconstructing external proofs within Lean via the Duper prover. The design systematically bridges the longstanding gap between large-theory hammering capabilities prominent in proof assistants like Isabelle/HOL and the modern Lean 4 ecosystem, while retaining minimal trusted code and competitive empirical performance (Zhu et al., 9 Jun 2025).

1. System Architecture

LeanHammer orchestrates the hammer loop in Lean through three primary phases: neural premise selection, proof search (internal and external), and proof reconstruction.

Premise Selection: A neural retriever, implemented via an encoder-only Transformer, ranks relevant facts from Mathlib and project-local libraries for the goal at hand.
Proof Search: The tactic first attempts proof search with Aesop (using only built-in rules, under a short timeout). If unsuccessful, it invokes premise selection to retrieve two disjoint sets of premises: one for Lean-auto-backed external provers, and another for direct Aesop application.
External Proving & Proof Reconstruction: Lean-auto translates the goal and selected premises to TH0, invoking Zipperposition under a 10-second limit. Upon a successful ATP proof, Duper reconstructs a Lean proof term from the traced premises. The tactic closes if all subgoals are discharged, subject to a cumulative 300-second upper bound.

The pipeline incorporates both green-path (guaranteed execution), yellow-path (non-terminal, error-tolerant), and blue-path (terminal, handles proof completion) phases, maximizing robustness (Zhu et al., 9 Jun 2025).

2. Neural Premise Selection

The premise selector is a Transformer of the Sentence-BERT family, parameterized in three sizes (6/12 layers and 384/768 dimensions). Premise retrieval is formulated as a dense vector search: both proof goals and premise signatures are linearized as token sequences (eschewing notation shorthands, using fully qualified constant names), mapped to $\mathbb{R}^d$ by the Transformer encoder $E$ . For a given goal $g$ and premise $p$ , similarity is computed via cosine similarity $E(g) \cdot E(p)$ .

Embeddings for Mathlib are pre-computed and cached, accelerating runtime queries; only user-defined constants require on-the-fly encoding. Nearest neighbor search is implemented with FAISS, yielding top- $k$ premise lists for downstream search. This method enables dynamic adaptation to the user’s context, with empirical results demonstrating a 21% increase in goal solvability compared to prior Lean premise selectors (Zhu et al., 9 Jun 2025).

The model is trained to prioritize not only local syntactic similarity but also semantic connections relevant in dependent type theory—addressing the limitations of classical Sledgehammer-style feature engineering.

3. Proof Search and Reconstruction

LeanHammer’s orchestration of proof search proceeds in tiers:

Internal Proof Search: Aesop, restricted to built-in rules, attempts direct proof search with a short timeout (≈1s).
Premise-augmented Search: If unresolved, the top premises are supplied as (a) “unsafe” Aesop rules and (b) initial context for Lean-auto.
External Automated Proving: Lean-auto translates the current subgoal and its premises to TH0 for Zipperposition. Zipperposition’s successful proofs return a proof object referencing the premises used.
Lean-side Reconstruction: Duper is applied to the minimal set of premises used by the ATP to synthesize a Lean proof term.

Termination hinges on all subgoals being resolved within resource bounds. Reconstruction is designed to be fail-fast and conservative, falling back if replay is unsuccessful.

The system is exposed as a user-facing Lean tactic, invoked as by hammer in Lean proof blocks. All interaction with premise selection, proof search, and proof replay is fully automated from the user’s perspective (Zhu et al., 9 Jun 2025).

4. Evaluation and Empirical Results

Comprehensive experiments demonstrate LeanHammer’s effectiveness:

The neural premise selector enables a 21% increase in goals solved compared to previous Lean premise selectors, signifying substantial improvement in both recall and ranking robustness.
The system reliably generalizes across a variety of Lean domains (e.g., algebra, topology, combinatorics), as shown by extensive benchmarks with real-world Mathlib proof states.
Integration with Aesop and Duper ensures proof reconstruction fidelity, and the fallback strategy with internal provers addresses challenging edge cases.

A key design principle is minimal reliance on trusted code: proof search, translation, and reconstruction are orchestrated with full traceability, and all reconstructed proofs are checked natively by Lean (Zhu et al., 9 Jun 2025).

5. Comparative Landscape and Impact

LeanHammer fulfills a critical gap in the Lean ecosystem previously served in other proof assistants by tools such as Sledgehammer and SMTCoq. Key points of distinction include:

System	Library Selection	ATP Integration	Tailored to DTT	Proof Reconstruction
Sledgehammer	Heuristic/Symbolic	Multiple (E, SPASS, veriT)	HOL	ATP-producer + replay
SMTCoq	None	SMT (veriT, CVC4)	HOL	Verified OCaml checker
LeanHammer	Neural, dynamic	Zipperposition	DTT (Lean 4)	ATP trace → Lean+Duper

LeanHammer’s neural approach supports flexible adaptation to user codebases, direct leverage of Mathlib, and deep integration with Lean’s dependent type system. Its introduction lowers the barrier for Lean users to harness external ATPs with minimal configuration and strengthens the broader push for increased automation in formal mathematics (Zhu et al., 9 Jun 2025).

6. Limitations and Future Work

Current limitations of LeanHammer include:

Dependence on precomputed embeddings for the curated Mathlib corpus; user-defined constellations with highly dynamic libraries may exhibit latency due to re-embedding.
Reliance on Duper and the current generation of Lean proof replay for proof objects; failures in reconstruction or ATP interop are subject to ongoing robustness improvement.
Zipperposition’s support for domain-specific logics (e.g., arrays, bit-vectors) is bounded by available translation and proof export modules.

Planned extensions are focused on expanding premise selection coverage, enhancing context sensitivity, improving reconstruction algorithms, and broadening support for additional ATPs and theories. The integration of Lean-SMT and similar tactics is anticipated to provide further synergies, especially in first-order and arithmetic-heavy subdomains (Mohamed et al., 21 May 2025).

7. Significance in Automated Reasoning

LeanHammer establishes a paradigm for holistic, machine learning–driven automation in the Lean proof assistant. By unifying neural premise selection, symbolic proof search, ATP interoperation, and native proof reconstruction, the system marks a convergence of neural and symbolic automation techniques in formal verification. The approach is broadly generalizable and constitutes a significant step towards parity with the automation infrastructure of more mature theorem provers while retaining the expressivity and rigor of dependent type theory (Zhu et al., 9 Jun 2025).

Markdown Report Issue Upgrade to Chat

References (2)

Premise Selection for a Lean Hammer (2025)

Lean-SMT: An SMT tactic for discharging proof goals in Lean (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to LeanHammer.

LeanHammer: Neural ATP for Lean 4

1. System Architecture

2. Neural Premise Selection

3. Proof Search and Reconstruction

4. Evaluation and Empirical Results

5. Comparative Landscape and Impact

6. Limitations and Future Work

7. Significance in Automated Reasoning

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

LeanHammer: Neural ATP for Lean 4

1. System Architecture

2. Neural Premise Selection

3. Proof Search and Reconstruction

4. Evaluation and Empirical Results

5. Comparative Landscape and Impact

6. Limitations and Future Work

7. Significance in Automated Reasoning

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research