LeanConjecturer: Automated Lean Conjecturing

Updated 21 August 2025

LeanConjecturer is a systematic pipeline that automatically generates formal Lean theorems by combining rule-based context extraction with LLM-driven synthesis.
The approach employs an iterative loop for generating, validating, and filtering conjectures to ensure syntactic validity and mathematical non-triviality.
The generated conjectures serve as valuable training data for reinforcement learning, advancing both automated theorem proving and formal mathematical discovery.

LeanConjecturer is a systematic pipeline for automatic mathematical conjecture generation in Lean 4, targeting university-level mathematics and formal theorem proving. It combines rule-based extraction of formal context from source libraries with LLM–based theorem statement synthesis, yielding a scalable source of new, non-trivial Lean theorems. The pipeline addresses the persistent challenge of data scarcity for training and evaluating automated theorem proving systems, and enables new modes of mathematical discovery via iterative generation, filtering, and reinforcement learning. Beyond producing thousands of novel, syntactically valid, and non-trivial conjectures, LeanConjecturer demonstrates its impact by generating and helping verify previously unformalized results in topics such as topology, and by serving as specialized training data for reinforcement learning–based Lean provers (Onda et al., 27 Jun 2025).

1. System and Pipeline Architecture

LeanConjecturer’s architecture integrates rule-based context mining and LLM-driven theorem generation in an iterative pipeline, structured as follows:

Context Extraction: Given a Mathlib seed file, static rules extract Lean 4 imports, open namespace commands, and global declarations to form a minimal self-contained context. This establishes the symbol and type environment for downstream generation.
Theorem Statement Generation: An LLM is prompted using a carefully crafted system instruction requiring output statements to start with theorem and end with := by. The prompt requests the model to output as many conjectures as possible per context. This ensures resilience to seed files with few or many existing theorems.
Context Prepending: The extracted context is prepended to each generated theorem, forming self-contained Lean statements.
Iterative Feedback Loop: After generation, conjectures are validated and filtered (see Section 2) and the syntactically valid, nontrivial, and novel conjectures are appended to the context pool. The process iterates, with subsequent rounds using the augmented context to drive further LLM conjecture proposals. Iterations continue until a fixed point or an explicit limit (e.g., 15 rounds) is reached.

This hybrid static-dynamic pipeline ensures both the formal well-formedness of statements (via context provision and syntactic guards) and the creative exploration of novel conjectural space via LLM generation (Onda et al., 27 Jun 2025).

2. Conjecture Validation and Filtering

The output of each generation cycle undergoes rigorous post-processing:

Syntactic Validity: Each conjecture is first parsed by Lean 4, replacing any proof script with sorry. Only statements producing a single warning (indicating the placeholder nature of the proof) and no errors are retained.
Novelty Assessment: Using the exact? functionality, conjectures are compared against the original Mathlib content and previously generated conjectures to reject trivial variants, duplicates, or restatements.
Non-triviality Check: The automated proof tactic aesop is applied. Statements unprovable by aesop are flagged as non-trivial; those that are provable are not discarded, but prioritized differently in downstream applications.
Iterative Aggregation: Newly validated conjectures are collected in an output file, which in turn becomes the context for further rounds of generation, ensuring that the conjecture space expands incrementally and avoids local redundancy.

This multi-stage process ensures that retained conjectures are both formally well-formed and of mathematical substance, suitable as training input for provers or as candidate objects for mathematical exploration.

3. Corpus Production and Scalability

The LeanConjecturer pipeline has demonstrated high-yield, scalable generation:

Metric	Value (on 40 seed files)
Conjectures produced	12,289
Syntactically valid	10,950
Novel statements	4,130
Nontrivial	3,776
Average novel/seed	103.25

This high per-file yield suggests that scaling to the full Mathlib library ( $\sim$ 6,000 files) could produce hundreds of thousands of high-quality conjectures. The method’s scalability is tied to the automation of context extraction, the batch LLM-driven generation, and the systematic filtering. By rapidly expanding the universe of available formal statements, LeanConjecturer provides a foundational resource for both symbolic and learning-driven ATP research.

4. Integration with Reinforcement Learning

LeanConjecturer directly addresses the need for domain-adapted, challenging training data for Lean-based RL theorem provers. In the GRPO (Group Relative Policy Optimization) paradigm:

Generated conjectures form the RL environments—theorems for which learning-based provers are trained to find proofs.
For each conjecture, a prover (such as DeepSeek Prover-V2, 7B) samples multiple tactical proof trajectories; success is scored with a binary reward (proof found/not found).
The RL policy is refined by adjusting the sampling probability and learning rate according to performance on these conjectures, with special emphasis on conjectures in challenging or targeted domains (such as topology).
This targeted RL regime leads to measurable improvements in the prover’s generalization and problem-solving capability within specific areas of mathematics, as evidenced by experimental gains reported on nontrivial conjecture sets (Onda et al., 27 Jun 2025).

The tight feedback cycle between conjecture generation and prover training underlies LeanConjecturer’s practical impact as a system for ATP model improvement.

5. Mathematical Discovery in Topology

LeanConjecturer has demonstrated the ability to both rediscover and reveal novel interrelations among sets defined via generalized interior and closure operators in topology. Notable outputs include:

Formally Defined Properties:

SemiOpen: $A \subseteq \overline{\operatorname{int} A}$
AlphaOpen: $A \subseteq \operatorname{int}(\overline{\operatorname{int} A})$
PreOpen: $A \subseteq \operatorname{int}(\overline{A})$

Sample Theorems Output and Verified:

Theorem	Area
The union of two semi-open sets is semi-open	Topology
The union of two alpha-open sets is alpha-open	Topology
The closure of a pre-open set is semi-open	Topology
The union of two pre-open sets is pre-open	Topology

These results not only confirm LeanConjecturer’s capacity to generate known structural theorems, but also to suggest new relationships among nuanced generalizations of open sets that warrant further mathematical analysis. This capability indicates that the system acts as a generator of research hypotheses within contemporary pure mathematics domains.

The LeanConjecturer methodology distinguishes itself in the landscape of automated mathematical conjecturing and formal theorem proving:

Versus Database-Constrained Frameworks: Unlike methods that rely strictly on combinatorial generation over finite object–property tables (Davila, 2023), LeanConjecturer operates natively in the rich formal language of Lean 4, allowing generation of universal statements and abstract theorems.
Relative to LLM-driven Conjecturing: Existing LLM-based neural conjecturing pipelines, whether specialized for premise selection (Piotrowski et al., 2023), lemma template abstraction (Alhessi et al., 7 Apr 2025), or answer-construction (Sun et al., 24 May 2025), do not combine rule-based formal context extraction, Lean 4–native syntactic validation, high-throughput filtration, and reinforcement learning of such scale and precision.
Integration with Automated Provers: Unlike pure synthesis tools, LeanConjecturer’s outputs are immediately useful as RL tasks for state-of-the-art Lean ATP models, demonstrably improving statistical performance on benchmark validation sets (Onda et al., 27 Jun 2025).

7. Broader Implications and Prospects

By offering a scalable and formally sound method for generating new conjectures, LeanConjecturer provides critical infrastructure for:

Overcoming the data scarcity bottleneck in training formal theorem provers and related machine learning systems for mathematics.
Catalyzing mathematical discovery by bridging LLM creativity with formal verification, enabling workflows where human mathematicians can explore, validate, and build upon machine-generated conjectures.
Serving as a testbed for RL techniques in formal mathematics, particularly in curriculum design and targeted problem domain specialization.
Advancing the Lean formal mathematics ecosystem by continuously expanding the corpus of conjecturable and provable statements.

The success of LeanConjecturer in systematic conjecture generation and downstream prover training supports the view that automatic, high-quality, domain-adapted conjecture generation is a key driver for progress in formalized mathematical reasoning and AI-driven discovery (Onda et al., 27 Jun 2025).