Autoformalizing Euclidean Geometry
- Autoformalizing Euclidean geometry is the process of translating informal geometric statements into machine-verifiable proofs via formal systems like Lean and Coq.
- It leverages neuro-symbolic architectures that combine SMT solvers and large language models to automate diagrammatic inferences and formal construction validation.
- Experimental results show modest success with current methodologies, highlighting challenges in dataset size and inference complexity while paving the way for broader applications.
Autoformalizing Euclidean geometry refers to the algorithmic translation of informal geometric statements and proofs into formal, machine-verifiable theorems and proof scripts. This area leverages the axiomatization of classical geometry, neuro-symbolic systems, and automated reasoning tools to bridge the gap between human geometric reasoning—often reliant on diagrams and implicit inference—and the precise languages and provers required for formal verification (Murphy et al., 27 May 2024, Błaszczyk et al., 20 Mar 2025, Ivashkevich, 2019, Avigad et al., 2008).
1. Foundational Formal Systems for Euclidean Geometry
Multiple formal systems model Euclid’s geometry, capturing constructions, diagrammatic properties, and metric relations. System E ("A formal system for Euclid's Elements" (Avigad et al., 2008)) provides a two-level sequent calculus mirroring Books I–IV of the Elements, with object types for points, lines, circles, and terms for segments, angles, and areas. Formal statements have the shape , distinguishing between universally quantified givens and new existentially introduced objects.
Axioms are grouped into:
- Diagrammatic: Incidence, betweenness, same side, Pasch, and intersection properties; all formulated as universally quantified logical rules (e.g., collinearity and ordering constraints).
- Metric: Lengths, angles, and areas, with transfer axioms bridging diagrammatic relations to arithmetic equations (e.g., ).
- Construction Rules: Existential specifications for ruler, compass, and intersection operations.
- Superposition: Deductive rules for triangle congruence (SAS, SSS).
System E has been implemented in Lean (Murphy et al., 27 May 2024), and similar frameworks exist in Coq, notably Ivashkevich’s two-tiered constructive-deductive method (Ivashkevich, 2019). This method distinguishes constructive algorithms (dependent pairs realizing geometric constructions) from deductive classical reasoning for geometric properties, encoding postulates and axioms as Coq definitions and propositions.
2. Neuro-Symbolic and Automated Reasoning Frameworks
Contemporary autoformalization pipelines combine machine learning and symbolic reasoning to automate the formalization process. "Autoformalizing Euclidean Geometry" introduces a neuro-symbolic architecture comprising:
- Domain Knowledge: A Lean-encoded variant of system E, with definitions for six geometric object types and nine primitive relations;
- Symbolic Engine (SMT): SMT solvers (e.g. Z3, CVC5) are used to fill diagrammatic gaps by automatically proving unstated facts—collinearity, intersection, same side—that are implicit in human proofs. The system constructs clause sets from axioms and context, and proves unsatisfiability to validate premises.
- LLM Prompting: LLMs (GPT-4, GPT-4V) are prompted with several examples, system axioms, and rules, tasked with producing Lean theorem statements or tactic-style proof scripts. Diagrammatic gaps are automatically filled by SMT, allowing LLMs to focus on explicit textual formalization (Murphy et al., 27 May 2024).
3. Autoformalization Workflows
The translation from informal proofs to formal scripts proceeds as follows:
- Parsing and Variable Assignment: Explicitly named objects in informal text are assigned formal variables; implicit geometric relationships are made explicit via primitive predicates or properties.
- Construction Steps: Informal "Let..." statements are matched with formal construction rules, validated as direct consequences or via SMT reasoning, and their existential specifications are added to the script.
- Deductive Steps: "Hence..." and similar assertions are tagged as diagrammatic, metric, or transfer in nature, routed to the appropriate inference engine (diagrammatic closure via first-order prover, metric/transfer via SMT).
- Case Analysis and Superposition: Branching on cases or applying triangle congruence is handled through context splits and elimination rules (e.g., SAS/SSS).
Formalized benchmarks (e.g., LeanEuclid) provide hundreds of statements and proofs, both sourced from Euclid and synthesized from modern datasets, enabling systematic evaluation of autoformalization accuracy (Murphy et al., 27 May 2024).
4. Diagrammatic Information Extraction and Semantic Evaluation
Human geometric proofs typically omit "obvious" diagrammatic facts. In autoformalization, these gaps are addressed by:
- SMT-Driven Diagrammatic Inference: Automated routines consume all diagrammatic and metric rules, conjoin existing facts, and attempt refutation of the negated goal clause using SMT; a successful proof adds the consequence to the context.
- Semantic Equivalence Checking: To evaluate correctness, semantic equivalence is checked by encoding axioms plus one theorem’s clauses plus the negation of the other and searching for unsatisfiability in both directions. Clause-level scoring supplements full logical equivalence when the match is partial (Murphy et al., 27 May 2024).
5. Automation Architectures: The Area Method and GCLC
Modern automation architectures, such as the Area Method, provide field-theoretic, coordinate-free proof procedures suitable for autoformalization:
- Language: Two-sorted formalism with Points and Field-elements, governing directed segment lengths and signed triangle areas.
- Axioms: 13 core rules (segment/area relations, parallelism, perpendicularity, etc.), all first-order and amenable to mechanization.
- Elimination Lemmas: Theorems are encoded as conjunctions of atoms; proof scripts operate by sequential construction (adding auxiliary points) and elimination (applying a library of ~30 elimination lemmas—ground instances of axioms and field identities).
- Proof Closure: Term-rewriting and Gröbner-base simplification over the field guarantee the proof’s correctness, terminating when trivial field equations are established.
- Software: GCLC implements this four-step recipe, requiring only high-level construction and the final predicate from the user (Błaszczyk et al., 20 Mar 2025).
This architecture supports extension to non-Archimedean settings, facilitating proofs over the hyperreal field via the Transfer Principle, and enables conservative, mechanization-friendly autoformalization spanning the classical and modern continuum.
6. Experimental Results and Limitations
Current neuro-symbolic autoformalization approaches yield nontrivial but modest success rates:
- Statement Formalization: Five-shot prompting with GPT-4V results in up to 21% accuracy on full Elements + UniGeo problems; errors stem from variable ordering, missing/extra preconditions.
- Proof Formalization: GPT-4V achieves up to 31% on UniGeo proofs (five-shot), but only 4% of Elements proofs are correct out-of-the-box. Most proofs are repairable with minimal modifications (median Levenshtein similarity to repaired script ∼62%).
- SMT-Based Evaluation: Reliable semantic equivalence checking; manual review shows ~15% false negatives, negligible false positives (Murphy et al., 27 May 2024).
Limitations include small dataset size (173 examples), incompleteness of the SMT engine for construction tasks, and the complexity of extending diagrammatic/SAT-driven frameworks beyond Euclidean geometry. Quotient types and native geometric continuity require further development for comprehensive coverage in proof assistants such as Coq and Lean (Ivashkevich, 2019).
7. Historical and Conceptual Bridging
The progression from Euclid’s geometric synthesis, through Hilbert’s existential axiomatization, and the Area Method’s algebraic formalism, to modern mechanized proof architectures, illustrates the deep interplay between traditional deductive structure and formal rigor. The Area Method plus hyperreal extension provides an autoformalizable, mechanization-friendly reconstruction of Book VI, preserving Euclid’s deductive motifs and supplying modern axioms where gaps existed. These primitives and axioms integrate smoothly into proof assistants, with elimination modules yielding short high-level scripts for the majority of Euclid’s propositions. This alignment between historical structure and modern formalization is crucial for synthetic, coordinate-free automation in both Archimedean and non-Archimedean domains (Błaszczyk et al., 20 Mar 2025).
A plausible implication is that as neuro-symbolic and symbolic architectures mature, the controlled domain of Euclidean geometry will serve as a proving ground for extending autoformalization paradigms to broader areas of mathematics, contingent on richer diagrammatic theories and semantic evaluation tools.