Papers
Topics
Authors
Recent
Search
2000 character limit reached

Latent Space of Equational Theories

Updated 4 February 2026
  • The latent space of equational theories is a three-dimensional Euclidean embedding that maps universally quantified equational laws in the language of magmas using empirical Stone pairings.
  • It employs principal component analysis on a centered Stone pairing matrix to reveal deduction flows, geometric clusters, and statistical properties reflecting different syntactic signatures.
  • Statistical analysis highlights directional trends and clustering of reversible implications, offering a practical framework for machine-assisted conjecture and proof generation.

The latent space of equational theories is a three-dimensional Euclidean landscape constructed to embed the space of universally quantified equational theories in the language of magmas, with each theory's position determined by its empirical statistical properties across a large sample of finite random magmas. Introduced by Berlioz and Melliès as an outgrowth of the Equational Theories Project of Terence Tao, this construction enables the geometric visualization of logical implication relations, observable as oriented chains in latent space, and supports statistical and deductive analysis of the interrelations among thousands of equational laws (Berlioz et al., 28 Jan 2026).

1. Specification of Equational Theories

An equational theory in this context consists of a single universally quantified equation in the signature Σ={⋄:2}\Sigma = \{\diamond : 2\}, where ⋄\diamond is a binary operation. Each theory TT is thus defined by an equation: ∀x1,…,xm.  LHSk(x1,…,xm)=RHSk(x1,…,xm)\forall x_1, \ldots, x_m.\; \text{LHS}_k(x_1, \ldots, x_m) = \text{RHS}_k(x_1, \ldots, x_m) with integer label kk and at most four occurrences of ⋄\diamond. The "syntactic signature" of a theory, denoted (a,b)(a, b), records the number of occurrences of ⋄\diamond on each side. The collection studied consists of 4694 such theories, each representing a first-order sentence in the language of magmas.

2. Statistical Sampling Over Finite Magmas

For statistical characterization, a parameter NN (with 4≤N≤164 \le N \le 16) is fixed and nn independent random magmas A1,…,AnA_1, \ldots, A_n are sampled, each with universe {0,…,N−1}\{0,\ldots,N-1\} and operation table selected uniformly at random. Given a theory φk\varphi_k, for each magma AℓA_\ell the Stone pairing is computed: pk,ℓ=∣{(a1,…,am)∈Aℓm ∣ Aℓ⊨φk(a1,…,am)}∣Nmp_{k, \ell} = \frac{|\{ (a_1, \ldots, a_m) \in A_\ell^m\,|\, A_\ell \models \varphi_k(a_1,\ldots,a_m)\}|}{N^m} This is the empirical probability that a random mm-tuple in AℓA_\ell satisfies the given equation. All such pairings are assembled into a matrix: R∈[0,1]#Eqns×n  ,Rk,ℓ=pk,ℓR \in [0,1]^{\#\text{Eqns} \times n}\;,\quad R_{k,\ell} = p_{k,\ell} where rows index theories and columns index sampled magmas.

3. Embedding Construction via Principal Component Analysis

To define the latent space, RR is centered by subtracting the mean row vector μ\mu. The centered matrix X=R−1⋅μX = R - \mathbf{1} \cdot \mu leads to the covariance matrix CX=(1/n)XX⊤C_X = (1/n) XX^\top. The top three eigenvectors of CXC_X (principal components) form U3U_3, and each theory kk is embedded as: f(Eqnk)=U3⊤(Rk,⋅−μ)∈R3f(\text{Eqn}_k) = U_3^\top (R_{k,\cdot} - \mu) \in \mathbb{R}^3 The induced metric is Euclidean, d(Ti,Tj)=∥f(Ti)−f(Tj)∥2d(T_i, T_j) = \|f(T_i) - f(T_j)\|_2.

4. Logical Implication Structures in Latent Space

The Equational Theories Project defines a preorder ⇒\Rightarrow on the set of 4694 theories. A strict implication j⇒kj \Rightarrow k not factoring through any intermediate theory (except by reversible steps) is called atomic (j⇒1kj \Rightarrow^1 k). Reversible implications (j⇔kj \Leftrightarrow k) yield equivalence classes (cliques). Collapsing each reversible clique to its center of mass in R3\mathbb{R}^3 produces a reduced directed acyclic graph (DAG) of 1415 nodes and 4824 arrows, where edges inherit an orientation and have Euclidean lengths.

Logical implication chains become visible as oriented paths in latent space; the orientation reflects the deductive flow between the corresponding equational theories.

5. Geometric and Statistical Properties

Analysis of the geometry and implication structure reveals several empirical patterns:

  • Principal component interpretation:
    • The XX-coordinate nearly perfectly tracks the expected Stone pairing, i.e., E[Eqnk]=(1/n)∑ℓpk,â„“E[\text{Eqn}_k] = (1/n)\sum_\ell p_{k,\ell}, for each theory.
    • The YY-coordinate captures the variance in these probabilities across magmas.
    • The ZZ-coordinate distinguishes conjugate theories: for every conjugate pair, Zj=−Zj∗Z_j = -Z_{j^*}, and self-conjugate theories are mapped to Z=0Z = 0.
  • Implication edge statistics:
    • Mean Euclidean length of edges (in the full implication graph):
    • Reversible (equivalences): ≈0.69
    • Atomic (strict, non-reversible): ≈5.29
    • All strict: ≈7.17
    • This suggests that provable equivalences cluster tightly, while strict implications correspond to larger transitions.
  • Directional flow:
    • 78.7% of implication vectors f(j)→f(k)f(j) \rightarrow f(k) have a positive XX-component, indicating a dominant "radial drift" from under-constrained toward over-constrained equational laws.
    • Many implication edges align into nearly parallel families, typically reflecting witnesses similar to Herbrand-style rewrites.

6. Visualization, Clustering, and Interpretability

The spatial arrangement of theories in R3\mathbb{R}^3 encodes their deductive and statistical relationships:

  • Short Euclidean distances between two theories correspond to simple rewrites or provable equivalences.
  • Testing for implication j⇒kj \Rightarrow k can be facilitated by checking if f(k)f(k) is near f(j)f(j) in the positive XX direction.
  • Clusters by syntactic signature (a,b)(a, b) naturally emerge; for example, all (0,4)(0,4)-laws are located in one region, (1,3)(1,3) in another.
  • Z-symmetry encodes conjugacy: conjugate parses are mirror images across the Z=0Z=0 plane, supporting exploration via symmetry-based reparameterization.
  • Oriented implication streams support preferred proof search orderings, recommending flows from small-XX (under-constrained) to large-XX (over-constrained), as seen in chains leading toward classical associativity.

7. Significance and Prospects

The latent space construction transforms the classical, abstract preorder of equational implication into a concrete three-dimensional landscape, mapping both statistical behavior and deductive transition paths. It provides a new framework for machine-assisted conjecture and proof generation, and suggests possible avenues for a statistical refinement of completeness theorems. This approach thus links finite model theory, machine learning, and algebraic logic, supporting both geometric and deductive exploration of the space of equational theories (Berlioz et al., 28 Jan 2026).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Latent Space of Equational Theories.