Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash 91 tok/s
Gemini 2.5 Pro 49 tok/s Pro
GPT-5 Medium 31 tok/s
GPT-5 High 36 tok/s Pro
GPT-4o 95 tok/s
GPT OSS 120B 478 tok/s Pro
Kimi K2 223 tok/s Pro
2000 character limit reached

AlphaGeometry2: Advanced Geometry Solver

Updated 16 August 2025
  • AlphaGeometry2 is an advanced automated theorem prover and geometry solver that formalizes Olympiad-level problems using an extended formal language and enhanced nonconstructive reasoning.
  • It introduces new quantitative predicates, handles locus-based and nonconstructive statements, and achieves an 88% IMO coverage rate with a dramatic reduction in problem-solving time.
  • Coupling an upgraded symbolic inference engine (DDAR) with the Gemini LM, AG2 enables natural language formalization and efficient automated proof search in complex geometric configurations.

AlphaGeometry2 (AG2) is an advanced automated theorem prover and geometry problem solver designed to address and formalize complex Olympiad-level geometry problems, particularly those featured in the International Mathematical Olympiad (IMO) from 2000 to 2024. AG2 extends the AlphaGeometry system by generalizing the domain language, enhancing reasoning capabilities for nonconstructive statements, introducing an upgraded symbolic inference engine, and integrating a state-of-the-art LLM (Gemini architecture), thereby achieving a new benchmark in automated geometric problem-solving performance (Chervonyi et al., 5 Feb 2025).

1. Language Extension and Formalism

A central innovation in AG2 is the expansion of the formal language used for encoding geometric problems and proofs. The original AlphaGeometry language has been systematically extended:

  • New Predicates for Quantitative Reasoning: AG2 introduces predicates such as {acompute} and {rcompute} for direct computation of angles and ratios, enabling formalization of problems that ask for explicit values (e.g., “find the angle between AB and CD”).
  • Linear Equations Among Geometric Quantities: To address problems involving relationships between distances, angles, or ratios, predicates like {distmeq}, {distseq}, and {angeq} were added. For example, the predicate

distmeq a1b1a2b2anbnt1t2tny\texttt{distmeq } a_1 b_1\, a_2 b_2\, \ldots\, a_n b_n\, t_1 t_2 \ldots t_n\, y

expresses the constraint

t1log(A1B1)+t2log(A2B2)++tnlog(AnBn)+y=0t_1 \log(A_1B_1) + t_2 \log(A_2B_2) + \cdots + t_n \log(A_nB_n) + y = 0

facilitating formalization of geometrically motivated logarithmic-linear relationships.

  • Locus-Type and Nonconstructive Statements: AG2 supports formalization of locus-type problems involving the movement of points, lines, or circles by introducing eleven distinct locus cases (with generalization via a special wildcard symbol). This enables reasoning about statements where objects are not defined purely through constructions but by their constraints within a configuration.

This extension results in an elevated coverage rate on IMO geometry problems: 66% in AG1 versus 88% in AG2.

2. Nonconstructive Reasoning and Double Points

A major advance lies in AG2's ability to handle problems that cannot be attacked through purely constructive means:

  • Relaxed Constraints on Point Definitions: AG2 removes AG1's requirement that each point be defined by no more than two predicates. Points in AG2 may now have multiple defining constraints, a necessity for handling nonconstructive or locus-based settings.
  • “Double Points” and Auxiliary Constructions: AG2 employs strategies using “double points,” whereby a point may be introduced via auxiliary intersections (for instance, introducing XX' as the intersection of a line and circle, then deducing X=XX = X'). This technique is critical for problems that demand proving concurrency, collinearity, or incidence properties not manifestly present in the original statement.
  • Locus Case Formalism: Explicit support for locus constraints provides direct representation of movement or parameterized families of geometric objects, extending AG2's reach beyond deterministic geometric constructions.

The capacity to formalize and automate nonconstructive reasoning is essential for a substantial subset of IMO-style problems previously inaccessible to automated systems.

3. Symbolic Engine Enhancements and Algorithmic Advances

The symbolic reasoning engine in AG2, known as DDAR (Deductive Database Arithmetic Reasoning), has been iteratively refined:

  • Stronger Inferential Power: DDAR now implements robust support for “double points” and locus-based logic, facilitating auxiliary constructions and more streamlined deduction chains.
  • Search Complexity Reduction: Frequently used rules for similar triangles, cyclic quadrilaterals, and fundamental geometric principles have been hard-coded, thereby pruning the search space and accelerating inference.
  • Efficient Numerical Back-End: Implementation of Gaussian elimination in C++ has reduced mean reasoning time per problem from approximately 1179.6 seconds (AG1) to 3.45 seconds (AG2) for benchmark sets.
  • Synthetic Data Improvements: The data generator now produces larger diagrams and balances problems with/without auxiliary points, fostering a richer and more difficult training set for the learning components.

These advances yield a dramatic speedup and enable AG2 to address proof tasks with much higher step and variable counts.

4. Integration of Gemini LLM and Knowledge Sharing

AlphaGeometry2 incorporates a Gemini-based LLM (LM), resulting in several key improvements:

  • Natural Language Formalization: The Gemini LM is used in a few-shot prompting framework to translate natural language problem descriptions into AG2’s formal language, enhancing autonomy and reducing the need for expert annotation.
  • Feedback-Augmented Inference Pipeline: The model receives structured semantic feedback (“analysis strings”, recording deduction fact subsets S1S2S3S_1 \subseteq S_2 \subseteq S_3 generated by DDAR) to guide and constrain reasoning steps, improving deduction reliability.
  • Knowledge Sharing Among Search Trees: The SKEST (Shared Knowledge Ensemble of Search Trees) algorithm runs parallel beam searches that exchange learned facts and deduction paths, thus boosting both solution rate and efficiency for complex tasks.

This combination of symbolic and neural components underpins AG2’s natural language understanding and proof search performance.

5. Empirical Results and Benchmark Performance

AlphaGeometry2 demonstrates significant empirical improvement across a range of geometry problem-solving benchmarks:

Metric AG1 AG2
IMO-2000–2024 Coverage 66% 88%
Overall Solve Rate 54% 84%
Time per IMO Problem (mean) ~1179.6 s 3.45 s

AG2 solves 42 of 50 published IMO geometry problems (IMO-AG-50 benchmark), surpassing the performance of an average gold medalist over the same period. The knowledge-sharing ensemble search further increases success rates in challenging configurations. The enhanced symbolic engine is compatible with problems requiring deep auxiliary constructions and can accommodate larger geometric environments (i.e., diagrams with more objects and proof steps).

6. Toward Fully Automated and Generalized Geometry Solving

Current development is directed at automating all stages of geometry problem solving:

  • Autonomous Formalization and Diagram Generation: The Gemini LM is further leveraged to output AG2-language translations and generate diagram representations from raw problem text. Deduction-level “analysis strings” bridge between natural and formal language.
  • Open Challenges: AG2 currently does not natively address three-dimensional geometry, inequalities, nonlinear geometric constraints, or problems with arbitrarily large configurations. Extending AG2 to cover inversion, projective techniques, radical axes, and advanced locus constructions remains a priority for achieving universal coverage.
  • Improving Formal Translation Accuracy: Increased use of supervised fine-tuning and more formalized prompt exemplars are aimed at reducing translation errors and hallucinated statements during natural language ingestion.

A plausible implication is that overcoming these challenges will lay the groundwork for end-to-end geometry solvers capable of tackling the full breadth of Olympiad and research-level geometric inference.

7. Significance and Future Prospects

AlphaGeometry2 demonstrates that systematic expansion of formal language, algorithmic enhancements to symbolic reasoning engines, and tight integration with powerful LLMs can jointly produce automated systems surpassing expert human performance for a major class of mathematical problems.

  • This methodology defines a framework for further work in formal mathematical reasoning, with clear paths toward generalizing to higher-dimensional, non-Euclidean, and non-constructive mathematical domains.
  • The progress toward autonomous translation, diagram generation, and proof production indicates that fully automated geometry problem-solving—given only raw textual statements—may be achievable with continued refinement.

The results and design of AlphaGeometry2 set a new state-of-the-art for formal geometry solvers, providing a robust benchmark and technical foundation for subsequent advances in computer-assisted mathematical reasoning.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)