Latent Space of Equational Theories
- The latent space of equational theories is a three-dimensional Euclidean embedding that maps universally quantified equational laws in the language of magmas using empirical Stone pairings.
- It employs principal component analysis on a centered Stone pairing matrix to reveal deduction flows, geometric clusters, and statistical properties reflecting different syntactic signatures.
- Statistical analysis highlights directional trends and clustering of reversible implications, offering a practical framework for machine-assisted conjecture and proof generation.
The latent space of equational theories is a three-dimensional Euclidean landscape constructed to embed the space of universally quantified equational theories in the language of magmas, with each theory's position determined by its empirical statistical properties across a large sample of finite random magmas. Introduced by Berlioz and Melliès as an outgrowth of the Equational Theories Project of Terence Tao, this construction enables the geometric visualization of logical implication relations, observable as oriented chains in latent space, and supports statistical and deductive analysis of the interrelations among thousands of equational laws (Berlioz et al., 28 Jan 2026).
1. Specification of Equational Theories
An equational theory in this context consists of a single universally quantified equation in the signature , where is a binary operation. Each theory is thus defined by an equation: with integer label and at most four occurrences of . The "syntactic signature" of a theory, denoted , records the number of occurrences of on each side. The collection studied consists of 4694 such theories, each representing a first-order sentence in the language of magmas.
2. Statistical Sampling Over Finite Magmas
For statistical characterization, a parameter (with ) is fixed and independent random magmas are sampled, each with universe and operation table selected uniformly at random. Given a theory , for each magma the Stone pairing is computed: This is the empirical probability that a random -tuple in satisfies the given equation. All such pairings are assembled into a matrix: where rows index theories and columns index sampled magmas.
3. Embedding Construction via Principal Component Analysis
To define the latent space, is centered by subtracting the mean row vector . The centered matrix leads to the covariance matrix . The top three eigenvectors of (principal components) form , and each theory is embedded as: The induced metric is Euclidean, .
4. Logical Implication Structures in Latent Space
The Equational Theories Project defines a preorder on the set of 4694 theories. A strict implication not factoring through any intermediate theory (except by reversible steps) is called atomic (). Reversible implications () yield equivalence classes (cliques). Collapsing each reversible clique to its center of mass in produces a reduced directed acyclic graph (DAG) of 1415 nodes and 4824 arrows, where edges inherit an orientation and have Euclidean lengths.
Logical implication chains become visible as oriented paths in latent space; the orientation reflects the deductive flow between the corresponding equational theories.
5. Geometric and Statistical Properties
Analysis of the geometry and implication structure reveals several empirical patterns:
- Principal component interpretation:
- The -coordinate nearly perfectly tracks the expected Stone pairing, i.e., , for each theory.
- The -coordinate captures the variance in these probabilities across magmas.
- The -coordinate distinguishes conjugate theories: for every conjugate pair, , and self-conjugate theories are mapped to .
- Implication edge statistics:
- Mean Euclidean length of edges (in the full implication graph):
- Reversible (equivalences): ≈0.69
- Atomic (strict, non-reversible): ≈5.29
- All strict: ≈7.17
- This suggests that provable equivalences cluster tightly, while strict implications correspond to larger transitions.
- Directional flow:
- 78.7% of implication vectors have a positive -component, indicating a dominant "radial drift" from under-constrained toward over-constrained equational laws.
- Many implication edges align into nearly parallel families, typically reflecting witnesses similar to Herbrand-style rewrites.
6. Visualization, Clustering, and Interpretability
The spatial arrangement of theories in encodes their deductive and statistical relationships:
- Short Euclidean distances between two theories correspond to simple rewrites or provable equivalences.
- Testing for implication can be facilitated by checking if is near in the positive direction.
- Clusters by syntactic signature naturally emerge; for example, all -laws are located in one region, in another.
- Z-symmetry encodes conjugacy: conjugate parses are mirror images across the plane, supporting exploration via symmetry-based reparameterization.
- Oriented implication streams support preferred proof search orderings, recommending flows from small- (under-constrained) to large- (over-constrained), as seen in chains leading toward classical associativity.
7. Significance and Prospects
The latent space construction transforms the classical, abstract preorder of equational implication into a concrete three-dimensional landscape, mapping both statistical behavior and deductive transition paths. It provides a new framework for machine-assisted conjecture and proof generation, and suggests possible avenues for a statistical refinement of completeness theorems. This approach thus links finite model theory, machine learning, and algebraic logic, supporting both geometric and deductive exploration of the space of equational theories (Berlioz et al., 28 Jan 2026).