- The paper introduces AlphaGeometry2, a neuro-symbolic system that surpasses average gold medalist performance on IMO geometry problems.
- It expands the domain-specific language by incorporating new predicates and automated diagram generation using a two-stage optimization approach.
- Innovations include a Shared Knowledge Ensemble of Search Trees, an enhanced DDAR engine, and a sparse mixture-of-expert Transformer model, solving 42 out of 50 IMO problems.
The paper introduces AlphaGeometry2 (AG2), an enhanced neuro-symbolic system for solving Olympiad geometry problems. AG2 builds upon its predecessor, AlphaGeometry (AG1), and achieves a substantial performance boost, surpassing the average gold medalist in solving International Mathematical Olympiad (IMO) geometry problems from 2000-2024.
A key improvement in AG2 is an expanded domain-specific language. The original AG1 language had limitations in addressing problems involving movements of objects, linear equations of angles/ratios/distances, and "find the angle..." type questions. To address these limitations, AG2 introduces new predicates:
{acompute a b c d}
: Find the angle between AB and CD.
- a: Point A.
- b: Point B.
- c: Point C.
- d: Point D.
{rcompute a b c d}
: Find the ratio AB/CD.
- a: Point A.
- b: Point B.
- c: Point C.
- d: Point D.
{distmeq a1 b1 a2 b2 ... an bn t1 t2 ... tn y}
: t1log(A1B1)+t2log(A2B2)+...+tnlog(AnBn)+y=0.
- ai: Point Ai.
- bi: Point Bi.
- ti: Coefficient for the log-distance between Ai and Bi.
- y: Constant term in the equation.
{distseq a1 b1 a2 b2 ... an bn t1 t2 ... tn}
: t1A1B1+t2A2B2+...+tnAnBn=0.
- ai: Point Ai.
- bi: Point Bi.
- ti: Coefficient for the distance between Ai and Bi.
{angeq a1 b1 a2 b2 ... an bn t1 t2 ... tn y}
: t1d(A1B1)+t2d(A2B2)+...+tnd(AnBn)+y=0, where d(AB) is the angle between the undirected line AB and the horizontal line.
- ai: Point Ai.
- bi: Point Bi.
- ti: Coefficient for the directed angle of line AiBi.
- y: Constant term in the equation.
AG2 introduces syntax to handle locus problems, which involve the movement of geometric objects. The syntax utilizes a placeholder token {*}
to denote the fixed-point, allowing AG2 to reason about geometric constraints that hold true as points, lines, or circles move according to certain rules. AG2 also incorporates predicates for diagram checks, ensuring topological consistency and non-degeneracy:
{sameclock a b c d e f}
: Direction A→B→C has the same clockwise direction to D→E→F.
- a: Point A.
- b: Point B.
- c: Point C.
- d: Point D.
- e: Point E.
- f: Point F.
{noverlap a b}
: Points A and B are distinct.
- a: Point A.
- b: Point B.
{lessthan a b c d}
: AB<CD.
- a: Point A.
- b: Point B.
- c: Point C.
- d: Point D.
{overlap a b}
: Points A and B are overlapping points.
- a: Point A.
- b: Point B.
{cyclic_with_center a1 a2 ... an x}
: a1=a2=⋯=ax is the center of the circle that goes through ax+1...an.
- ai: Point Ai.
- x: Index up to which points Ai are the same (center of the circle).
Furthermore, AG2 relaxes the constraint of AG1, which requires each point to be defined by at most two predicates. In AG2, points can be defined by three or more predicates, enabling the system to handle non-constructive problems where diagram construction is not straightforward. The domain language coverage is improved from 66% to 88% on IMO geometry problems from 2000 to 2024 because of these changes.
The paper also discusses automating problem formalization and diagram generation. AG2 employs Gemini to translate problems from natural language into the AlphaGeometry language using a few-shot prompting approach. For automated diagram generation, the approach builds upon existing methodologies, using a two-stage optimization method involving ADAM gradient descent and the Gauss-Newton-Levenberg method to find numerical solutions for the system of non-linear equations that define the geometric constraints. The system was able to generate diagrams for 41 out of 44 IMO problems formalized in the AG language.
The Deductive Database Arithmetic Reasoning (DDAR) symbolic engine in AG2 has been enhanced with three main improvements: the ability to handle double points, a faster algorithm, and a faster implementation.
- Handling double points: AG2 can handle scenarios where two points with different names have the same coordinates. This is achieved by constructing a new point, proving its equivalence to an existing point, and then using this equivalence to deduce new facts.
- Faster algorithm: Essential rules are hard-coded to reduce query times for the Arithmetic Reasoning (AR) sub-engine to at most cubic complexity. Explicit rules for angles and distances are discarded, with all deductions handled automatically in the AR engine. An improved algorithm (DDAR2) is designed for identifying similar triangles and cyclic quadrilaterals.
- Faster implementation: The core computation (Gaussian Elimination) is implemented in C++, resulting in a significant speedup compared to AG1.
AG2 employs synthetic data generation. Compared to AG1, AG2 explores random diagrams at twice the size, produces theorems at up to 2x more complexity, produces proofs at up to 10x more complexity, has a more balanced data distribution between question types, and has a more balanced data distribution between problems with and without auxiliary points. The data generation algorithm also produces problems of "locus" type. To achieve this, the movement dependency for each point X is recorded during random diagram generation through a function P(A), which represents the set of points that control the movements of A.
AG2 features a novel search algorithm called Shared Knowledge Ensemble of Search Trees (SKEST). SKEST involves running several differently configured beam searches in parallel, allowing them to share knowledge through a shared facts database. The following types of search trees are used to make sure different parts of the search space are explored effectively:
- "Classic" search tree: The same beam tree search used in AG1, where a LLM is asked to produce one auxiliary point at each node.
- Tree predicting multiple auxiliary points at each node: A LLM is allowed to produce as many auxiliary points as it wants at each tree node.
- Tree predicting different types of aux points uniformly.
- Deep but narrow tree (e.g. beam size 64 and depth 10).
- Shallow but wide tree (e.g. beam size 512 and depth 4).
The final key improvement in AG2 is a new LLM, which builds on Gemini. The LLM is a sparse mixture-of-expert Transformer-based model. The LLMs are trained with a custom tokenizer in the domain-specific language, or by fine-tuning pre-trained custom math specialized Gemini models in natural language, or by multimodal training with an additional image input. During inference, AG2 enriches the neuro-symbolic interface by letting the LLM know about the deductions made by DDAR before proposing auxiliary constructions. The LLM receives the original problem statement and an "analysis string" that contains the set of facts deducible by DDAR given the original problem premises, the set of facts deducible by DDAR given the original problem premises and assuming the goal predicate is also true, and the set of facts that are correct numerically.
The paper presents results demonstrating AG2's performance on IMO geometry problems. AG2 solves 42 out of 50 of all 2000-2024 IMO geometry problems. Ablation studies on inference settings reveal that the optimal configuration for a single search tree involves a beam size of 128, a beam depth of 4, and 32 samples. Additional evaluations were performed on 30 of the hardest IMO shortlist problems, which are formalizable in the AG2 language, and AG2 solves 20 out of 30 problems.
The paper concludes by discussing future work, including expanding the domain language to cover a variable number of points, non-linear equations, and problems involving inequalities, breaking problems into subproblems and applying Reinforcement learning approaches, and improving auto-formalization.