Latent Space of Equational Theories

Updated 4 February 2026

The latent space of equational theories is a three-dimensional Euclidean embedding that maps universally quantified equational laws in the language of magmas using empirical Stone pairings.
It employs principal component analysis on a centered Stone pairing matrix to reveal deduction flows, geometric clusters, and statistical properties reflecting different syntactic signatures.
Statistical analysis highlights directional trends and clustering of reversible implications, offering a practical framework for machine-assisted conjecture and proof generation.

The latent space of equational theories is a three-dimensional Euclidean landscape constructed to embed the space of universally quantified equational theories in the language of magmas, with each theory's position determined by its empirical statistical properties across a large sample of finite random magmas. Introduced by Berlioz and Melliès as an outgrowth of the Equational Theories Project of Terence Tao, this construction enables the geometric visualization of logical implication relations, observable as oriented chains in latent space, and supports statistical and deductive analysis of the interrelations among thousands of equational laws (Berlioz et al., 28 Jan 2026).

1. Specification of Equational Theories

An equational theory in this context consists of a single universally quantified equation in the signature $\Sigma = \{\diamond : 2\}$ , where $\diamond$ is a binary operation. Each theory $T$ is thus defined by an equation: $\forall x_1, \ldots, x_m.\; \text{LHS}_k(x_1, \ldots, x_m) = \text{RHS}_k(x_1, \ldots, x_m)$ with integer label $k$ and at most four occurrences of $\diamond$ . The "syntactic signature" of a theory, denoted $(a, b)$ , records the number of occurrences of $\diamond$ on each side. The collection studied consists of 4694 such theories, each representing a first-order sentence in the language of magmas.

2. Statistical Sampling Over Finite Magmas

For statistical characterization, a parameter $N$ (with $4 \le N \le 16$ ) is fixed and $n$ independent random magmas $A_1, \ldots, A_n$ are sampled, each with universe $\{0,\ldots,N-1\}$ and operation table selected uniformly at random. Given a theory $\varphi_k$ , for each magma $A_\ell$ the Stone pairing is computed: $p_{k, \ell} = \frac{|\{ (a_1, \ldots, a_m) \in A_\ell^m\,|\, A_\ell \models \varphi_k(a_1,\ldots,a_m)\}|}{N^m}$ This is the empirical probability that a random $m$ -tuple in $A_\ell$ satisfies the given equation. All such pairings are assembled into a matrix: $R \in [0,1]^{\#\text{Eqns} \times n}\;,\quad R_{k,\ell} = p_{k,\ell}$ where rows index theories and columns index sampled magmas.

3. Embedding Construction via Principal Component Analysis

To define the latent space, $R$ is centered by subtracting the mean row vector $\mu$ . The centered matrix $X = R - \mathbf{1} \cdot \mu$ leads to the covariance matrix $C_X = (1/n) XX^\top$ . The top three eigenvectors of $C_X$ (principal components) form $U_3$ , and each theory $k$ is embedded as: $f(\text{Eqn}_k) = U_3^\top (R_{k,\cdot} - \mu) \in \mathbb{R}^3$ The induced metric is Euclidean, $d(T_i, T_j) = \|f(T_i) - f(T_j)\|_2$ .

4. Logical Implication Structures in Latent Space

The Equational Theories Project defines a preorder $\Rightarrow$ on the set of 4694 theories. A strict implication $j \Rightarrow k$ not factoring through any intermediate theory (except by reversible steps) is called atomic ( $j \Rightarrow^1 k$ ). Reversible implications ( $j \Leftrightarrow k$ ) yield equivalence classes (cliques). Collapsing each reversible clique to its center of mass in $\mathbb{R}^3$ produces a reduced directed acyclic graph (DAG) of 1415 nodes and 4824 arrows, where edges inherit an orientation and have Euclidean lengths.

Logical implication chains become visible as oriented paths in latent space; the orientation reflects the deductive flow between the corresponding equational theories.

5. Geometric and Statistical Properties

Analysis of the geometry and implication structure reveals several empirical patterns:

Principal component interpretation:
- The $X$ -coordinate nearly perfectly tracks the expected Stone pairing, i.e., $E[\text{Eqn}_k] = (1/n)\sum_\ell p_{k,\ell}$ , for each theory.
- The $Y$ -coordinate captures the variance in these probabilities across magmas.
- The $Z$ -coordinate distinguishes conjugate theories: for every conjugate pair, $Z_j = -Z_{j^*}$ , and self-conjugate theories are mapped to $Z = 0$ .
Implication edge statistics:
- Mean Euclidean length of edges (in the full implication graph):
- Reversible (equivalences): ≈0.69
- Atomic (strict, non-reversible): ≈5.29
- All strict: ≈7.17
- This suggests that provable equivalences cluster tightly, while strict implications correspond to larger transitions.
Directional flow:
- 78.7% of implication vectors $f(j) \rightarrow f(k)$ have a positive $X$ -component, indicating a dominant "radial drift" from under-constrained toward over-constrained equational laws.
- Many implication edges align into nearly parallel families, typically reflecting witnesses similar to Herbrand-style rewrites.

6. Visualization, Clustering, and Interpretability

The spatial arrangement of theories in $\mathbb{R}^3$ encodes their deductive and statistical relationships:

Short Euclidean distances between two theories correspond to simple rewrites or provable equivalences.
Testing for implication $j \Rightarrow k$ can be facilitated by checking if $f(k)$ is near $f(j)$ in the positive $X$ direction.
Clusters by syntactic signature $(a, b)$ naturally emerge; for example, all $(0,4)$ -laws are located in one region, $(1,3)$ in another.
Z-symmetry encodes conjugacy: conjugate parses are mirror images across the $Z=0$ plane, supporting exploration via symmetry-based reparameterization.
Oriented implication streams support preferred proof search orderings, recommending flows from small- $X$ (under-constrained) to large- $X$ (over-constrained), as seen in chains leading toward classical associativity.

7. Significance and Prospects

The latent space construction transforms the classical, abstract preorder of equational implication into a concrete three-dimensional landscape, mapping both statistical behavior and deductive transition paths. It provides a new framework for machine-assisted conjecture and proof generation, and suggests possible avenues for a statistical refinement of completeness theorems. This approach thus links finite model theory, machine learning, and algebraic logic, supporting both geometric and deductive exploration of the space of equational theories (Berlioz et al., 28 Jan 2026).

Markdown Report Issue Upgrade to Chat

References (1)

The Latent Space of Equational Theories (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Latent Space of Equational Theories.

Latent Space of Equational Theories

1. Specification of Equational Theories

2. Statistical Sampling Over Finite Magmas

3. Embedding Construction via Principal Component Analysis

4. Logical Implication Structures in Latent Space

5. Geometric and Statistical Properties

6. Visualization, Clustering, and Interpretability

7. Significance and Prospects

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Latent Space of Equational Theories

1. Specification of Equational Theories

2. Statistical Sampling Over Finite Magmas

3. Embedding Construction via Principal Component Analysis

4. Logical Implication Structures in Latent Space

5. Geometric and Statistical Properties

6. Visualization, Clustering, and Interpretability

7. Significance and Prospects

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research