Disambiguating Cluster Separation in Neighbor Embeddings

Determine, for cluster separation observed in low-dimensional neighbor embeddings such as t-SNE and UMAP, whether the observed separation reflects genuine structure in the high-dimensional data manifold or instead arises from the optimization process distorting the manifold to favor local compactness. Establish criteria or mechanisms that can distinguish true manifold-driven separation from optimizer-induced artifacts.

Background

Neighbor embedding methods such as t-SNE and UMAP construct a weighted k-nearest neighbor graph of the data and optimize a low-dimensional embedding that preserves local similarities. These methods are known to prioritize local separation, often producing distinct clusters in the embedding.

However, the presence of visually separated clusters does not by itself guarantee that separation faithfully reflects the high-dimensional manifold structure. Because the optimization procedure balances attraction and repulsion forces, it can potentially distort the manifold to create sharper local compactness at the expense of global continuity. The paper highlights this ambiguity as a central unresolved question motivating their spectral framework, which aims to make the global–local trade-off explicit and analyzable via graph spectral modes.

References

When an embedding shows separated clusters, it remains unclear whether this separation reflects true data structure or arises from the optimizer distorting the underlying manifold to favor local compactness.

A Spectral Framework for Multi-Scale Nonlinear Dimensionality Reduction  (2604.02535 - Huang et al., 2 Apr 2026) in Introduction