Equational Theories Project (ETP)
- ETP is a large-scale collaborative initiative that classifies 4,694 equational laws on magmas and constructs an exhaustive implication graph with 22 million directed edges.
- It leverages a hybrid methodology combining human-driven proofs, automated theorem provers, and formal verification in Lean to settle longstanding universal algebra questions.
- The project establishes benchmarks for automated theorem provers and introduces scalable, reproducible workflows for collaborative, formalized mathematical research.
The Equational Theories Project (ETP) is a large-scale, collaborative online initiative to determine the logical relationships between all low-order equational laws on magmas, using both human-driven and machine-assisted proofs, formalized and checked in the Lean theorem prover. It provides a complete classification of logical implication among 4,694 normalized equational laws of magmas of order at most 4, constructing the exhaustive implication graph of 22,028,942 directed edges, all supported by formally verified proofs or explicit finite countermodels. The project has not only settled longstanding universal algebra questions but also introduced new workflows for collaborative mathematical research and established benchmarks for automated theorem provers.
1. Scientific Scope and Objectives
The principal objective of the ETP is the exhaustive determination of the logical implication relation among the 4,694 simplest equational laws on magmas—algebras with a single binary operation—of order up to 4. Motivated by classical questions in universal algebra regarding entailment between single-law equational theories, the project addresses the undecidable general implication problem for magmas by restrictively focusing on low-order, finite cases, where exhaustive computation remains feasible.
Specific goals include:
- Normalizing and numbering all equational laws of order ≤4, resulting in 4,694 representative laws.
- Determining, for every pair of distinct laws E, E′, whether E ⊨ E′ holds in both arbitrary and finite magmas.
- Constructing and analyzing the full directed implication graph with 22,028,942 edges.
- Formalizing all proofs and countermodels in Lean 4, ensuring kernel-verified correctness.
- Discovering new countermodel constructions and classes of magmas that realize gap cases or extremal properties.
- Developing collaborative and reproducible workflows (GitHub, Lean Zulip, Kanban tracking) for large-mathematics formalization and benchmarking ATPs at scale (Bolan et al., 8 Dec 2025).
2. Mathematical Framework and Dataset
A magma is defined as a set together with a binary operation . The equational laws under study are identities between nontrivial formal words in fixed variables, with at most four applications of , normalized up to symmetry and variable renaming. Some canonical examples:
- E₂: (constant/singleton law)
- E₄₃: (commutativity)
- E₄₅₁₂: (associativity)
This yields a concrete dataset of 4,694 nontrivial, symmetry-reduced laws. The implication graph is constructed as follows: vertices are the laws, and there is a directed edge iff every magma satisfying also satisfies (i.e., ). Reflexive (trivial) implications are excluded, resulting in edges. These implications are checked both in general (arbitrary magmas) and in the class of finite magmas.
After quotienting by mutual implication, the implication graph descends to a partial order on 1,415 equivalence classes, a structure supporting further lattice-theoretic and spectral analysis (Bolan et al., 8 Dec 2025).
3. Proof Techniques and Automation
The ETP developed a hybrid proof methodology integrating informal experimentation, human-written proof sketches, and formalization in Lean. Automated theorem provers (ATPs)—notably Vampire and Prover9—and Lean’s own tactic libraries (e.g., duper, egg) were used for both proof search and countermodel discovery.
Key proof and refutation techniques included:
- Brute-force generation and checking of all magmas of size ≤4 (enumerating 4.3 billion operation tables), which established or refuted 61.9% of all implications.
- Linear and translation-invariant models (e.g., over rings or ) for parametric counterexamples.
- Twisted semigroups via automorphisms, canonizer-based syntactic refutation based on free-magma theories, and ad hoc constructions informed by magma cohomology for the most recalcitrant implications.
- Lean formalization mandated all ATP-sourced proofs be reconstructed (not merely trusted) via proof certificate kernels for superposition, resolution, and congruence, guaranteeing kernel-level verification.
A minimal “generating set” of 10,657 positive and 586,925 negative implications was hand/formal-proved; the remaining pairs followed by graph-theoretic closure under transitivity and duality (Bolan et al., 8 Dec 2025).
Automated proving efforts, especially by Vampire, established that all valid implications could be proved and the majority of refuted cases caught via finite model-finding, settling 99.995% of the benchmark within modest computational resources (Janota, 20 Aug 2025).
4. Results, Discoveries, and Benchmarking
The ETP classified the laws into 1,415 mutual-entailment classes and discovered a spectrum of algebraic phenomena. Notable findings:
- 8,178,279 (37.12%) of all pairs are true implications; the rest are refuted.
- Almost all laws are “quasi-primal”: either equivalent to a singleton law or have a nontrivial finite model of size ≤5 (with only two needing size 7).
- 3,074 laws (65%) admit full spectra—finite models in every cardinality; the others display “spectral gaps,” indicating deep combinatorial structure.
- New families, such as weak central groupoids (E₁₄₈₅) and central groupoids (E₁₆₈), were identified, with interesting cardinality properties.
- Analysis of higher-order laws surfaced 213 prime candidates for single-law characterization of groups; most implications here were settled, with only 13 open cases remaining (Bolan et al., 8 Dec 2025).
Vampire’s benchmarking demonstrated high-throughput, large-scale first-order equational implication solving with detailed time and difficulty breakdowns. Approximately 99.995% of the 22 million implication queries were resolved automatically by Vampire, with finite-model building refuting 63% and superposition proving 37% (Janota, 20 Aug 2025).
| Aspect | Quantity/Result | Reference |
|---|---|---|
| Laws analyzed | 4,694 | (Bolan et al., 8 Dec 2025) |
| Directed implications | 22,028,942 | (Bolan et al., 8 Dec 2025) |
| Proved implications (~) | 8,178,279 (37.12%) | (Bolan et al., 8 Dec 2025) |
| Refuted by finite models | 61.9% | (Bolan et al., 8 Dec 2025) |
| Equivalence classes | 1,415 | (Bolan et al., 8 Dec 2025) |
| Longest implication chain | Length 15 | (Bolan et al., 8 Dec 2025) |
| Vampire coverage | 99.995% queries solved | (Janota, 20 Aug 2025) |
5. Integration with Automated Reasoning and E-Generalization
The ETP corpus serves as a benchmark and testbed for research in automated theorem proving and AI-driven mathematics. The ETP approach leverages and motivates techniques in grammar-guided E-generalization, which computes symbolic representations of all possible equational anti-unifiers under a background theory (Burghardt, 2014). These grammars are practical for lemma speculation and guiding ATPs, suggesting likely auxiliary laws needed in inductive or completion-style proofs.
Further, the ETP framework can accommodate integrations with lemma suggestion tools and inductive logic programming workflows, incorporating grammar-theoretic and anti-unification methods for broad-scale discovery and exploration of new equational identities.
6. Collaborative Workflows, Software Infrastructure, and Future Directions
The ETP established reproducible, scalable collaborative practices using public GitHub repositories, Lean Zulip coordination, and Kanban-style blueprint tracking for work assignment and status. Continuous integration and contributor standards were established early to ensure formal soundness without lowering the entry-level for newcomers.
Custom visualization tools, including GUIs for algebraic structure exploration, supported both casual and specialist participants. All contributions were required to be kernel-verified in Lean, including translated ATP result certificates.
Future research directions, as identified by the project, include:
- Settling the remaining unsolved finite-magma implication (E₆₇₇ ⊧ E₂₅₅).
- Systematically exploring implication with multiple hypotheses and generalizations beyond single-law entailments.
- Analysis of lattice and poset structure in the implication graph, especially “irreducible” or covering relations.
- Creating benchmarks for ATPs using the formalized ETP corpus.
- Integrating learned models (convolutional or graph neural networks) for proof guidance in larger law spaces (Bolan et al., 8 Dec 2025).
7. Broader Implications and Applications
The ETP provides a categorical resource for universal algebra, computational mathematics, and automated reasoning, demonstrating the feasibility of collaboratively settling large conjecture graphs at scale. Its exhaustive, formally verified implication graph serves as a reference point for the study of magma-based equational theories and as a robust benchmark for theorem provers.
The ETP methodology and findings extend to related domains requiring rigorous understanding of algebraic law entailment, model-finding, and the formal management of mathematical knowledge, and it offers an archetype for crowdsourced, machine-assisted mathematical research (Bolan et al., 8 Dec 2025, Janota, 20 Aug 2025).