TorusE: Toroidal KG Embedding

Updated 19 July 2025

TorusE is a knowledge graph embedding model that situates entities on a toroidal manifold, eliminating normalization issues and preserving translation operations.
It leverages modular arithmetic on an n-dimensional torus to maintain compact embedding spaces, enhancing computational efficiency and scalability.
Empirical results on standard benchmarks show that TorusE delivers competitive precision and speed compared to traditional models like TransE.

TorusE is a knowledge graph embedding model that addresses the challenge of embedding entities and relations for link prediction by situating them on a toroidal manifold—a mathematically compact Lie group structure. This approach eliminates the fundamental tension between the translation principle advocated in traditional models like TransE and the need for extrinsic regularization, enabling faithful translation operations and superior computational properties. TorusE’s design is grounded in the mathematical properties of the n-dimensional torus ( $T^n = \mathbb{R}^n/\mathbb{Z}^n$ ), offering practical advantages in scalability, prediction quality, and interpretability of relational patterns in knowledge graphs.

1. Conceptual Foundations and Motivation

TransE, a foundational translation-based knowledge graph embedding model, operates on the principle:

$\mathbf{h} + \mathbf{r} \approx \mathbf{t}$

where $\mathbf{h}$ , $\mathbf{r}$ , and $\mathbf{t}$ are the embeddings for the head entity, relation, and tail entity, respectively. In practice, TransE minimizes a distance (e.g., $\ell_1$ or $\ell_2$ norm) between $\mathbf{h} + \mathbf{r}$ and $\mathbf{t}$ .

However, as TransE’s embedding space is $\mathbb{R}^n$ , embeddings can diverge in magnitude. To prevent this, entity embeddings are normalized to reside on the unit sphere. This enforced normalization introduces a distortion: while the translation $\mathbf{h} + \mathbf{r}$ may naturally fall outside the sphere, normalization projects it back, disrupting the translation principle and adversely impacting link prediction accuracy (Ebisu et al., 2017). Regularization is thus both necessary and counterproductive under this setup.

TorusE was introduced to reconcile this conflict, exploiting the properties of compact Lie groups—specifically, the torus—to inherently bound the embeddings without the need for extraneous normalization.

2. Mathematical Formulation and Lie Group Embedding

TorusE replaces the noncompact vector space with an $n$ -dimensional torus:

$T^n = \mathbb{R}^n / \mathbb{Z}^n$

This identification means that each coordinate of an embedding vector is interpreted modulo $1$: if any value exceeds the $[0,1)$ interval, it wraps around (periodic boundary condition).

Mathematically, for entity or relation $e$ , the embedding is $[\mathbf{e}] \in T^n$ .

Translation Principle on the Torus:

Addition is performed component-wise modulo $1$:

$[\mathbf{x}] + [\mathbf{y}] = [\mathbf{x} + \mathbf{y}] \mod 1$

The triple pattern becomes:

$[\mathbf{h}] + [\mathbf{r}] \approx [\mathbf{t}]$

Scoring Functions:

TorusE defines differentiable, torus-aware scoring functions, such as:

$\ell_1$ -based:

$f_{L_1}(h, r, t) = 2 \cdot d_{L_1}([\mathbf{h}] + [\mathbf{r}], [\mathbf{t}])$

where $d_{L_1}$ computes the minimal $L_1$ distance across the torus.

$\ell_2$ -based and complex embeddings (for connections to ComplEx and DistMult) are also supported:

$f_{L_2}(h, r, t) = 4 \cdot d_{L_2}^2([\mathbf{h}] + [\mathbf{r}], [\mathbf{t}])$

$f_{eL_2}(h, r, t) = \frac{1}{4} d_{eL_2}^2([\mathbf{h}] + [\mathbf{r}], [\mathbf{t}])$

The objective uses a margin-based ranking loss, analogous to TransE:

$\mathcal{L} = \sum_{(h, r, t) \in \Delta} \sum_{(h', r, t') \in \Delta'_{(h,r,t)}} [\gamma + f_d(h, r, t) - f_d(h', r, t')]_+$

where $\Delta$ is the set of positive triples, $\Delta'$ negative samples, and $\gamma$ the margin (Ebisu et al., 2017).

3. Geometric and Topological Foundations

The torus as an embedding space offers critical properties for knowledge graph embeddings:

Compactness eliminates embedding norm divergence without need for normalization. All embeddings are inherently bounded due to the wraparound (periodicity) imposed by modular arithmetic.
Group Structure ensures that translation operations ( $[\mathbf{h}] + [\mathbf{r}]$ ) are always valid and differentiable, preserving smoothness for gradient-based optimization.
Isometric Flatness: Any translation on the torus preserves intrinsic distances, analogous to a flat square with identified edges, as detailed in geometric analyses (III et al., 2020). This avoids “edge effects” present in embedding spaces with boundaries.
Cyclic Symmetry: The toroidal structure natively supports cyclical and periodic relational patterns, enhancing representation of multi-hop and path-based relationships between entities.

4. Computational Efficiency and Scalability

TorusE’s compact space obviates the need for recurrent normalization steps in training. This yields:

Improved computational throughput: Training does not require projection of entity embeddings back onto a constraint surface, reducing time per update (Ebisu et al., 2017).
Scalability to large KGs: TorusE’s time and space complexity are $O(n)$ per embedding. Empirical evidence shows an order of magnitude speedup over TransE in high dimensional regimes (e.g., with embedding dimension $10^4$ ) (Ebisu et al., 2017).

Recent advances in sparse computation further accelerate TorusE. Frameworks such as SparseTransX (Anik et al., 24 Feb 2025) reformulate the embedding updates and scoring operations as large sparse-dense matrix multiplications (SpMM), significantly reducing training time and memory. This is achieved by representing triplet updates as sparse incidence matrices, enabling batch computations in a single operation for all triplets:

$\mathbf{Z} = \mathbf{A} \cdot \mathbf{E}$

where $\mathbf{A}$ encodes the combination (+1, −1) of heads, tails, and relations across all triples, and $\mathbf{E}$ stacks the embeddings (Anik et al., 24 Feb 2025).

This yields up to $5.3\times$ CPU and $4.2\times$ GPU speedup for TorusE and other translation-based models.

5. Empirical Performance and Comparative Analysis

TorusE provides strong performance on standard knowledge graph completion benchmarks. For example:

Model	WN18 MRR	FB15k MRR	WN18RR MRR	FB15k-237 MRR
TorusE	0.951	0.810	0.477	0.346

These results are competitive with other translation-based (TransE, ComplEx) and bilinear models (DistMult) (Ebisu et al., 2017, Ebisu et al., 2019). On metrics such as HITS@1 and Mean Reciprocal Rank (MRR), TorusE is often superior to regularization-dependent models, particularly in scenarios requiring high precision.

However, more recent models employing graph-pattern ranking (such as GRank (Ebisu et al., 2019)) or hybrid path-based approaches may yield marginal improvements in MRR and HITS@n, particularly in settings with richer graph patterns or strong redundancy. TorusE’s theoretical clarity and computational advantages still make it favored for scalable and interpretable knowledge graph embedding.

6. Relation to Other Embedding Architectures

TorusE’s formulation generalizes the translation-based family by situating the principle $h + r \approx t$ on an arbitrary (compact, abelian) Lie group. This approach enables unification and reinterpretation:

ComplEx and DistMult: By embedding $T^n$ into $\mathbb{C}^n$ , TorusE’s scoring functions can mimic or be mapped to those of ComplEx, with links such as:

$-2f_{eL_2}(h, r, t) + 1 = f_{ComplEx}(h, r, t)$

(Ebisu et al., 2017)

Extensions: Models like MöbiusE (Chen et al., 2021) generalize TorusE by embedding on Möbius strips or more intricate manifolds, introducing additional nonlinearity and twist, which may capture more complex cyclic patterns. MöbiusE has demonstrated improved results in some link prediction tasks.

A unified framework is possible by expressing translation-based knowledge graph embedding as translation operations on Lie groups $G$ , where the choice of $G$ (e.g., $T^n$ , $S^n$ , Möbius bands) controls expressive capacity and regularization properties (Ebisu et al., 2019).

7. Limitations and Future Research Directions

While TorusE’s compact toroidal embedding elegantly resolves normalization conflicts and improves computation, several considerations remain:

Expressiveness: The flat, coordinate-wise wraparound may insufficiently capture intricate relational symmetries or nontrivial composing patterns, motivating exploration of more twisted or higher-genus topologies (as in MöbiusE (Chen et al., 2021)).
Interpretability: Like most embedding-based approaches, TorusE is not inherently interpretable to humans. Methods combining graph pattern mining or explicit rule reasoning (as in GRank (Ebisu et al., 2019)) have shown both greater interpretability and marginally improved accuracy on some datasets.
Implementation Details: Care must be taken to preserve differentiability through wraparound logic in training frameworks, and to implement efficient modular arithmetic in high-dimensional settings, especially for GPU-based systems.
Integration with Rule and Path-Based Inference: Hybrid frameworks using TorusE for path-embedding, combined with explicit rule or path scoring, offer improved link prediction and faster rule evaluation (Ebisu et al., 2019).

Further research explores more general compact Lie group embeddings, efficient sparse implementations, and integration with interpretable and path-based models to maintain the strengths of TorusE while addressing its limitations.

In summary, TorusE leverages the mathematical properties of the torus—a compact, abelian Lie group—to implement the translation principle for knowledge graph embeddings without recourse to extrinsic normalization. This results in an efficient, scalable, and theoretically grounded approach that remains competitive with more recent and complex models in both performance and computational properties (Ebisu et al., 2017, Anik et al., 24 Feb 2025). Ongoing work extends these ideas to more expressive manifolds, interpretable architectures, and optimized computational kernels.