Aristotle: IMO-Level Theorem Prover

Updated 4 October 2025

Aristotle is an AI-powered system that combines formal Lean proof search, lemma-based informal reasoning, and dedicated geometry solving to address IMO-level problems.
It employs a 200B+ parameter transformer with Monte Carlo Graph Search and test-time training to efficiently explore and validate complex proof strategies.
Its modular architecture enables iterative refinement through autoformalization and parallel lemma discovery, achieving near human-competitive performance in contest mathematics.

Aristotle is an AI-powered automated theorem proving system specifically engineered to reach gold-medal-equivalent performance on International Mathematical Olympiad (IMO)–level problems by harmoniously integrating formal proof search, lemma-based informal reasoning, and dedicated geometric computation. Aristotle’s modular architecture allows decomposition of highly complex competition mathematics into tractable subproblems, which are then rigorously formalized and checked in the Lean proof assistant. The system’s design facilitates robust state-of-the-art scaling properties and supports both algebraic and geometric domains encountered at the IMO.

1. System Architecture and Integration

Aristotle consists of three tightly coupled components:

Lean Proof Search System: The core engine utilizes a Lean proof search algorithm operating on Lean “sketches”—partially written code blocks with gaps marked by sorry. Its search process is organized as a Monte Carlo Graph Search (MCGS), a generalization of Monte Carlo Tree Search where Lean states are vertices in a directed graph (with equivalence classes capturing superficially distinct formal contexts). Actions correspond to Lean tactics (e.g., intro, cases), and each action may create multiple subgoals. A large transformer model, scaled beyond 200B parameters, serves as unified policy and value function; it selects promising tactics and estimates the likelihood of future proof success.
Lemma-Based Informal Reasoning: To address IMO-level complexity, Aristotle employs a natural language module that decomposes hard problems into lists of informally reasoned lemmas. This module elicits high-level proof sketches and supporting claims, then autoformalizes them into Lean for formal proving. The pipeline features iterative error feedback: Lean verification errors are parsed and fed back to revise both informal and formal statements, iteratively improving the formalization and capturing creative auxiliary definitions often characteristic of IMO solutions.
Dedicated Geometry Solver (Yuclid): Geometry problems are handled by Yuclid—a high-performance solver in C++—which uses deductive databases and algebraic reasoning (Gaussian elimination, numerical rule matching) to preprocess diagrams and generate structural relationships (e.g., midpoints, perpendiculars). Identified configurations are codified as auxiliary facts in Lean. Yuclid’s implementation is optimized for speed (deduplication, fast memory management) and scales efficiently for large geometric datasets.

This modular approach enables Aristotle to initiate solution strategies via informal reasoning, formalize supporting claims, and then rely on high-fidelity Lean proof search and specialized geometry handling.

2. Lean Proof Search and Reinforcement Learning

Aristotle’s Lean proof search leverages the following mechanisms:

MCGS Framework: Proof states are maintained as nodes in a directed graph, with edges corresponding to tactic applications. The equivalence relations on states help avoid redundant exploration. PUCT-based variants (Predictor Upper Confidence Bound applied to Trees) prioritize actions with high estimated value and balanced exploration.
Transformer Model: Policy and value functions are unified in a large transformer, trained via reinforcement learning expert iteration. Successful search paths—partial or complete—are replayed to refine the tactic selection and value prediction. Actions are treated as “successful” only if all resulting subgoals they generate are resolved, consistent with Lean’s branching proof semantics.
Parallelization and Test-Time Training: Multiple instances of proof search run in parallel, each exploring different lemma decompositions or proof tactics. The model leverages test-time training—learning from its own inference-time search traces—to adapt tactic selection to the structure of each problem.
Formal Verification: Lean acts as the formal gatekeeper. Proof gaps are filled only when Lean’s kernel verifies their correctness, enforcing rigorous validation.

A key technical point is the system’s ability to combine “blind” search with feedback-driven exploration, scaling efficiently via transformer-based guidance.

3. Lemma Discovery and Informal–Formal Bridging

The informal reasoning system decomposes IMO-level problems by:

Proof Narrative Elicitation: Starting from the problem statement, the module generates a narrative of the intended proof, which is then decomposed into supporting lemmas. This mirrors expert human problem solving, where auxiliary claims are formulated and tackled individually.
Auxiliary and Novel Definitions: Aristotle produces creative auxiliary definitions that are not typically stated in the original problem, e.g.,

$\text{def } S(f : \mathbb{N}_+ \to \mathbb{N}_+) : \text{Set } \mathbb{N}_+ := \{ p \mid \text{Nat.Prime}(p) \land f(p) > 1 \}$

These definitions enable the isolation and formalization of key structural properties.

Autoformalization Pipeline: Natural language lemmas are converted iteratively to Lean statements. Errors encountered during formalization are fed back to the informal layer for correction, ensuring robustness and adaptability even on challenging inputs.
Iterative Refinement: Failed attempts trigger revisions, both at informal and formal levels—distinctive for handling IMO problems, which often require multiple lines of attack.

This decomposition allows systematic tackling of complex contest problems in a staged, verifiably correct manner.

4. Geometry Module: Algebraic Deductive Reasoning

Yuclid, the geometric solver in Aristotle, is specialized for competition geometry:

Diagram Preprocessing: It scans diagrams for standard configurations (midpoints, bisectors, similar triangles). These are identified using numeric rule matching.
Algebraic Reasoning: Configurations are encoded into equations and inequalities handled by Gaussian elimination and algebraic rule tables. Generic relationships (e.g., perpendicularity via length identities

$AB \perp CD \iff AC^2 + BD^2 = AD^2 + BC^2$

) are formalized and inserted as Lean auxiliary facts.

Deductive Database: Yuclid’s foundations leverage a deductive database with extensive rule tables, enabling rapid deduction. Deduplication and optimized update strategies allow tens of geometry problems to be solved within milliseconds on a single core.

This module extends the system’s reach in domains historically resistant to formalization due to diagrammatic and algebraic subtleties.

5. System Performance and Scaling Properties

Aristotle achieves state-of-the-art results:

IMO 2025 Problems: The system correctly solved five out of six formal IMO 2025 problems, failing only on the most challenging (even for human competitors).
Scaling: Aristotle utilizes parallelized lemma generation and tactic search. The transformer backbone—scaled to 200B+ parameters—enables simultaneous exploration of diverse proof strategies, supporting efficient reasoning even for highly intricate problems.
Test-Time Training: Learning from inference trace feedback at test time continually adapts and refines tactic selection, providing robustness against previously unseen problem structures.

These capabilities position Aristotle among the most effective automated systems for high-level competition mathematics.

6. Significance and Future Directions

Aristotle’s hybrid approach offers several implications:

Automated Mathematical Creativity: The system’s capacity for creative lemma discovery and auxiliary definition generation aligns with advanced mathematical problem-solving.
Robust Formal Verification: By rigorously verifying every step in Lean, Aristotle provides strong guarantees on proof correctness—a critical requirement in contest and research settings.
Scalable Collaboration: The integration of formal, informal, and geometric reasoning modules represents a scalable template for future AI–mathematician collaboration.
Potential Extensions: While focused on IMO-level problems, the architecture invites extension to other major mathematics competitions and theoretical research, including open-conjecture formalization.

A plausible implication is that continuing this line of work, with further scaling of models and expansion of domain-specific modules, may enable AI systems to not only verify but also actively contribute to the discovery of new mathematics at expert levels.

7. Illustrative Mathematical Formalisms

Aristotle’s workflow incorporates several representative mathematical formalizations:

Auxiliary Set Definition:

$S(f) = \{\,p \mid \text{Nat.Prime}(p) \land f(p) > 1\,\}$

Geometric Perpendicularity:

$AB \perp CD \iff AC^2 + BD^2 = AD^2 + BC^2$

These expressions typify the multifaceted mathematical knowledge Aristotle draws upon in solving diverse IMO problems.

Aristotle exemplifies the current frontier of automated theorem proving at IMO-level complexity by leveraging modular decomposition, large-scale deep learning, and efficient formal verification, thereby enabling nearly human-competitive performance in formal mathematics contests (Achim et al., 1 Oct 2025).

PDF Markdown Chat (Pro)

References (1)

Aristotle: IMO-level Automated Theorem Proving (2025)

Whiteboard

Generate a whiteboard explanation of this topic.

Follow Topic

Get notified by email when new papers are published related to Aristotle: IMO-Level Automated Theorem Proving.