Scaling Behavior of Intrinsic Dimensionality in Large Neural Embeddings

Ascertain whether the intrinsic dimensionality of embeddings produced by large language models remains low as model size and embedding dimensionality increase, to understand whether low intrinsic dimensionality persists at larger scales.

Background

Intrinsic dimensionality (ID) measures the effective dimensionality of the manifold on which embeddings lie. Prior empirical studies report low ID for various neural embeddings despite high ambient dimensionality, which helps explain why nearest-neighbor search may avoid the curse of dimensionality. However, the paper argues that ID alone is insufficient to explain observed retrieval performance, especially since approximate search quality can degrade even at modest ambient dimensions and modern embeddings continue to grow to thousands or tens of thousands of dimensions.

The authors explicitly note uncertainty about how ID behaves as embedding models scale. Determining whether ID remains low at larger scales would clarify the extent to which low intrinsic dimensionality accounts for the continued success of sub-linear vector search methods as models and embeddings grow.

References

It is not clear if the intrinsic dimensionality will remain low as these embedding models scale and yet we observe they remain amenable to sub-linear vector search techniques.

Breaking the Curse of Dimensionality: On the Stability of Modern Vector Retrieval (2512.12458 - Lakshman et al., 13 Dec 2025) in Section 3.2 Intrinsic Dimensionality