Simplicial Embeddings in Topology & Data Science

Updated 11 October 2025

Simplicial Embeddings are rigorous mappings that preserve the combinatorial, metric, and topological properties of discrete structures.
They enable embedding complex simplicial complexes into target spaces, supporting applications in geometry, topology, and data science.
SEMs facilitate algorithmic constructions, neural representation learning, and spectral analysis, offering actionable insights for practical systems.

Simplicial Embeddings (SEMs) are a class of mathematical constructions, algorithms, and neural architectures that formalize the embedding of discrete topological or combinatorial structures—specifically simplicial complexes and their generalizations—into a target space, encoding or preserving structural, metric, or semantic properties. SEMs play a central role in contemporary topology, discrete geometry, representation learning, and increasingly in data science, as they underpin methods for representing, approximating, decomposing, or learning objects whose inherent complexity is higher than that encountered in graphs.

1. Mathematical Foundations and Classical Embedding Results

At its core, a simplicial embedding refers to an injective, structure-preserving map from a simplicial complex (or related object) into a host space, respecting either combinatorics, metrics, incidence relations, or topological invariants. The basic paradigm is exemplified by classical results such as the Nash–Kuiper theorem for isometric embeddings, or the work of Greene and Rokhlin, which extend smooth and polyhedral isometric embeddings to settings with indefinite or relaxed metric constraints.

The framework of indefinite metric polyhedra, as in "Simplicial isometric embeddings of indefinite metric polyhedra" (Minemyer, 2012), formalizes this further. Two equivalent definitions of such polyhedra are established: a geometric definition, as a triangulated space with assigned edge lengths (possibly of mixed signs), and an algebraic definition, based on associating each simplex a symmetric bilinear form (Gram matrix) that must agree on shared faces. This duality enables both intrinsic (edge-based) and extrinsic (form-based) viewpoints, and underpins much of the flexibility in SEM constructions.

A central technical achievement is proving that any such indefinite metric polyhedron (with bounded vertex degree) can be simplicially isometrically embedded into a finite-dimensional Minkowski space $\mathbb{R}^{q,q}$ , where $q = \max\{d,2n+1\}$ for compact $n$ -dimensional polyhedra ( $d$ is maximal vertex degree). Relaxing from strictly simplicial to piecewise-linear (pl) isometries drastically reduces required dimension and codimension, revealing a discrete $h$ -principle flexibility in the pl regime.

2. Embedding Complexity, Obstructions, and Bounds

A key research thread investigates the limits of how "complicated" a simplicial complex can be, while still admitting an embedding of a specified type. Lower and upper bounds on the number of $d$ -simplices in embeddable $d$ -complexes are established using geometric and extremal combinatorial methods, as detailed in (Gundert, 2018). The d-skeleton of the boundary complex of cyclic polytopes attains the lower bound $\Omega(n^{\lceil r/2 \rceil})$ for embeddability into $\mathbb{R}^r$ , while upper bounds (e.g., $O(n^{d+1-1/3^d})$ for codimension one) are established via forbidden subcomplexes and hypergraph extremal theory, generalizing the van Kampen–Flores obstruction.

A related line of work, such as (Björner et al., 2016), provides necessary homological conditions for topological or PL-embeddability. For a $d$ -complex $E$ to embed in $\mathbb{R}^{d+1}$ , the top homology $H_d(E;\mathbb{Z}_2)$ must admit a "2-complete" basis—a combinatorial sparsity/overlap criterion, and a direct generalization of Mac Lane's planarity condition for graphs. This is leveraged to derive precise upper bounds on the number of top-dimensional faces for embeddable complexes.

For $k$ -complexes embedding into $2k$-manifolds, the algebraic intersection form of the target manifold, combined with combinatorial data of $K$ , is encoded as a low-rank skew-symmetric or symmetric matrix of intersection numbers for nonadjacent $k$ -faces. Embeddability is then characterized by the possibility of completing such a partial matrix subject to specific rank and parity constraints (Skopenkov, 2021).

3. Algorithmic Constructions and Explicit Embeddings

Several works provide explicit, often algorithmic, constructions for SEMs. For indefinite metric polyhedra, an inductive, constructive approach allows one to extend partial isometric embeddings to full embeddings, achieving minimal dimension by enforcing a general position of vertex images and solving linear systems for edge-length constraints at each step (Galashin et al., 2015). The inherent extension property ensures robustness and supports applications in discrete geometry and computational applications, such as mesh generation and polyhedral deformation.

Parsimonious embedding constructions, such as those in (Berdnikov, 2022), focus on embedding a high-degree or complex simplicial complex $X$ into the boundary of a high-dimensional ball (or sphere) with bounded degree and low volume. The method uses mixed numerations and tree-based "cone" fibrations to produce embeddings that both simplify (homotopy equivalence with reduced local degree) or complexify (implanting topologically complex structures into a standard manifold) as desired. This has implications for systolic geometry and the construction of expanders and "hard-to-cut" manifolds.

4. Simplicial Embeddings in Learning Architectures

SEMs have been extended to neural representation learning, providing architectural and inductive regularizers for deep models. In self-supervised learning, as in (Lavoie et al., 2022), Simplicial Embedding layers project encoder outputs into a concatenation of $L$ simplices (via softmax normalization), imparting group-sparsity and overcompleteness. This projection enforces an inductive bias that yields both theoretical improvements in generalization error—via explicit bounds on representation complexity—and empirical gains on benchmarks such as CIFAR-100 and ImageNet.

Emergent semantic coherence arises as small groups of SEM features become distinctly predictive for semantically meaningful classes, visualizable as feature–class bipartite graphs. These properties are unattainable with naïve unnormalized or overcomplete, but unstructured, representations.

Further, the combination of SEMs with iterated learning (IL) schemes (Ren et al., 2023) leverages iterative compressibility and expressivity pressures to induce compositional, discrete (approximately categorical) latent spaces in neural models, improving generalization on unseen combinations of latent factors. Theoretical arguments based on Kolmogorov complexity explain why SEM-IL yields representations with lower description length and improved compositionality, supported by strong empirical results on both synthetic vision and molecular datasets.

5. SEMs in Topological Data Analysis and Spectral Methods

In topological data analysis (TDA), SEMs use the spectral properties of simplicial complexes. The Hodge Laplacian and its normalized variants (Schaub et al., 2018) generalize graph Laplacians to edge- and face-space, providing a natural infrastructure for random walks, diffusion processes, and spectral embeddings that capture higher-order interactions and cycles. The projection of edge flows onto harmonic subspaces (zero eigenvectors of the Hodge $1$-Laplacian) encodes the topological homology (cycles and holes). Applications range from clustering, outlier detection in trajectory data (Frantzen et al., 2021), to generalizations of PageRank for network analysis.

Symbolic embedding strategies (e.g., Simplex2Vec in (Billings et al., 2019)) use random walks on Hasse diagrams and word2vec-style methods to learn continuous embeddings for simplices, enabling community detection and the analysis of higher-order interaction effects in biological and social datasets.

Learned SEMs for whole-complex representation ("universal embeddings") employ geometric message passing across all simplex levels (Hajij et al., 2021), with explicit construction of global representations via metric learning losses that preserve intrinsic proximity across complexes. These methods extend graph-level representation learning to higher-dimensional topologies, with applications in CAD, mesh analysis, and beyond.

Invariant representations for embedded complexes (Paik, 2023) encode isometry- and subdivision-invariant features of geometric objects (e.g., surface meshes) through transforms such as Euler curves and persistent homology, combined with equivariant graph neural networks, enabling effective classification and clustering of geometric data.

6. SEMs and Discrete Rigidity, Surface, and Graph Embeddings

A significant area of SEM research addresses rigidity and uniqueness phenomena for relation-preserving embeddings of graphs and surfaces. In low-dimensional topology, simplicial embeddings between multicurve graphs (Erlandsson et al., 2015) and multiarc graphs (Parlier et al., 2019) are shown, under complexity conditions, to be uniquely induced by $\pi_1$ -injective embeddings or homeomorphisms of the underlying surfaces, revealing strong rigidity properties. The combinatorial data encoded in these graphs or complexes thus directly reflect the topological type of the surface.

At the intersection of graph theory and surface topology, the relationship between simplicial surfaces and cubic graphs—triangulated 2-manifolds and their incidence graphs—is explored in (Weiß et al., 29 Oct 2024). Here, the existence, uniqueness, and re-embedding phenomena of cubic graphs as face graphs of simplicial surfaces are classified in terms of cycle double covers and peripheral cycles (umbrellas), providing new lenses into Whitney’s and Tutte’s classical theorems, strong graph embeddings, and the cycle double cover conjecture.

7. Applications, Extensions, and Open Problems

Simplicial Embeddings are foundational in a wide variety of domains:

In geometry and topology: constructing manifolds with prescribed complexity or metric properties; producing expanders and systolic extremal examples; characterizing rigidity and embedding obstructions.
In machine learning and network science: enforcing sparsity, overcompleteness, and compositionality in feature representations; robust self-supervised and few-shot learners; analyzing trajectory and interaction data.
In communication: semantic-native communication protocols built on simplicial complexes and convolutional autoencoders (Zhao et al., 2022) leverage the semantical and topological structure for resilience under noise and missing data.
In data analysis: symbolic and spectral embeddings facilitate community detection, anomaly detection, and visualization of higher-order patterns.

Ongoing research seeks to generalize SEM results to higher codimension, relax regularity and degree constraints, extend spectral and message-passing frameworks to richer (e.g., weighted or parametrized) complexes, and further unify algebraic, combinatorial, and geometric perspectives across disciplines.

This synthesis integrates foundational theory, algorithmic construction, empirical architectures, rigidity phenomena, and topological data methods, reflecting the breadth and depth of Simplicial Embeddings as a modern mathematical and applied paradigm.