Lean 4: Formal Verification of Neural Networks

Updated 9 December 2025

The paper presents a Lean 4 formalization framework for neural networks, rigorously verifying network properties and design constraints.
It integrates human insights with automated theorem proving techniques to certify correctness and identify counterexamples via model checking.
The methodology establishes practical benchmarks and scalable processes, laying the foundation for reliable AI system development using formal methods.

The Equational Theories Project (ETP) is a large-scale, collaborative research initiative designed to classify the logical implication relations among the simplest equational laws governing magmas—algebraic structures equipped with a single binary operation. ETP encompasses the construction and verification of an exhaustive implication graph for all 4,694 normalized equational laws of order up to 4 on magmas, resulting in formal proofs or counterexamples for 22,028,942 nontrivial directed implications. This project integrates human and automated theorem proving, with all results validated in Lean, and yields new algebraic constructions and insights into proof automation, theory exploration, and benchmarks for automated theorem provers (ATPs) (Bolan et al., 8 Dec 2025).

1. Scope, Motivation, and Objectives

The motivation for ETP originates in universal algebra, where the central question is: for given equational laws (identities), which ones entail which others in arbitrary or finite magmas? Since Birkhoff’s theorem characterizes entailment via rewriting, yet the general implication problem is undecidable, even low-order cases can demand extensive computer-assisted reasoning.

Principal goals:

Enumerate and normalize all equational laws (up to 4 occurrences of the operation) modulo permutation and symmetry, yielding 4,694 laws.
Completely determine the implication graph $E \models E'$ (22M edges), both for all magmas and for finite magmas.
Classify laws into equivalence classes under mutual entailment.
Formalize all instances in Lean, guaranteeing machine-verified correctness.
Develop new countermodel methods, discover previously unknown classes of magmas, and establish rigorous collaborative workflows for formal mathematics (Bolan et al., 8 Dec 2025).

2. Equational Laws and Implication Graph Construction

A magma $(M, \cdot)$ is defined as a set $M$ with a binary operation $\cdot : M \times M \to M$ . An equational law is an identity $w_1 = w_2$ between formal terms over variables, such as associativity $(x \cdot y) \cdot z = x \cdot (y \cdot z)$ or commutativity $x \cdot y = y \cdot x$ .

Implication graph construction:

Vertices: 4,694 unique normalized laws $E_i$ .
Directed edge $E \to E'$ : every magma satisfying $E$ also satisfies $E'$ , i.e., $E \models E'$ .
Edges: $4,694 \times 4,693 = 22,028,942$ pairs (excluding trivial self-implications).
After classifying by mutual implication (equivalence), 1,415 equivalence classes are observed; the partial order’s longest chain has length 15.

Transitivity, duality, and symmetry reduce the necessary number of direct proofs, with proof extraction automated via Lean’s custom tagging infrastructure. A generating set of 10,657 positive and 586,925 negative implications were formalized directly; all remaining implications are derivable by logical closure (Bolan et al., 8 Dec 2025).

3. Proof Methodologies and Automation Frameworks

The ETP employs a hybrid of human and machine-based theorem proving:

Human input: Mathematical experimentation, LaTeX proof sketches, and Lean formalizations.
Automated theorem proving (ATP): Deployment of Vampire, Prover9, and Lean-based tactics (e.g., duper, egg) for implication search and countermodel detection (Janota, 20 Aug 2025).
Proof certification: All ATP-produced certificates are reconstructed and kernel-checked within Lean for trusted verification.

Countermodel and refutation strategies:

Exhaustive enumeration of all magmas of size $\leq 4$ (4.3 billion tables), which account for 61.9% of negative implications.
Linear model refutations: $x \cdot y = a x + b y$ over fields/rings, with Gröbner basis techniques for structural analysis.
Translation-invariant models, twisting semigroups, and canonizers derived from free magma theory provide additional layers of negative instance construction.
Sui generis and cohomological arguments resolve the most difficult and isolated cases.

Collaborative framework:

Use of GitHub issues, Kanban tracking, blueprint-driven Lean formalization, and Lean Zulip chat for distributed, multi-contributor project coordination.
Integration of real-time progress tracking, CI pipelines, and graphical interfaces for motivational and logistical support (Bolan et al., 8 Dec 2025).

4. Key Results and Algebraic Discoveries

ETP led to several new structural insights:

Discovery of “weak central groupoids” and other novel magma classes. Some new constructions empirically exhibit finite orders of $n^2$ or $2n^2$ .
Most laws are quasi-primal: either equivalent to the singleton law ( $x \cdot y = x$ ) or possess a finite model of size $\leq 5$ (with two exceptions requiring size 7).
Among 4,694 laws, 3,074 are “full-spectrum”: they admit finite models of every cardinality; the remainder exhibit “gaps” in their spectra.
Systematic study of single-law group axiomatizations (Higman–Neumann candidates) for order 8, reducing approximately 298 million candidate laws to fewer than 300 nontrivial cases via ATP methods and countermodels.
The most difficult implication refutation (from $E_{1729}$ to $E_{817}$ ) required months of effort and over 4,000 lines of Lean code.

The complete implication graph combined with equivalence class information and finite/infinite model distinction provides a refined landscape for magma theory and computational algebra (Bolan et al., 8 Dec 2025).

5. Benchmarks, Automated Reasoning, and Integration with ATPs

The ETP implication graph forms an extensive benchmark for ATP and model finding technologies.

Vampire’s performance: Vampire automatically proved all 8.17 million positive implications and refuted 13.85 million non-implications (via finite models) out of 22.03 million total. 99.995% of queries were resolved within 500 instructions or 60 seconds. Just over 1,000 (0.005%) cases remain open, predominantly requiring truly infinite countermodels (Janota, 20 Aug 2025).
Problem difficulty is stratified: 64% “easy” ( $<$ 500 instructions); 35% “medium” (under 60s); 1% “hard.”
TPTP-formatted problem sets are available for the entire ETP corpus, supporting broad cross-evaluation of future ATP systems.

A plausible implication is that the magnitude and completeness of ETP’s corpus make it a canonical evaluation resource for universal-algebra ATPs.

6. Applications and Broader Impacts

Automated mathematics and theory exploration: The ETP demonstrates that large-scale implication graphs for algebraic laws can be exhaustively constructed with machine-assisted proofs, yielding verifiable maps of the logical relationships within equational theories.

Methodological innovation: ETP establishes scalable processes for crowd-sourced formalization, integration of multiple proof search paradigms, and real-time collaborative mathematical research.

Auxiliary research directions:

Systematic study of "implications with hypotheses" and multi-law entailments.
Analysis of irreducible poset edges and fine spectrum growth rates.
Empirical foundation for machine learning (CNN/GNN-guided proof search) using the ETP dataset.

Specific workflows and standards (contribution guidelines, CI, proof reconstruction, visualization tools) pioneered by ETP are generalizable to future collaborative mechanized mathematics efforts (Bolan et al., 8 Dec 2025).

7. Future Directions and Open Problems

Ongoing and prospective research questions identified by ETP:

Complete resolution of the two remaining open finite-magma implication cases ( $E_{677} \models_\mathrm{fin} E_{255}$ ).
Formal classification of implications involving explicit hypotheses ( $E_1 \wedge E_2 \models E_3$ ) or semigroup/associative extensions.
Refinement of ATP/finite-model-building heuristics to close the last fraction of unsolved queries and to better handle cases demanding infinite witnesses.
Investigation of learned and data-driven approaches for predicting entailments and guiding ATP proof strategy within very large equational law spaces.
Extension of the ETP protocol to multiple-operation signatures, more expressive algebraic structures, and richer logical settings.

The ETP is thus positioned as both a completed dataset and continuing platform for universal-algebra research, automated reasoning, and collaborative mathematics at scale (Bolan et al., 8 Dec 2025, Janota, 20 Aug 2025).