Papers
Topics
Authors
Recent
Search
2000 character limit reached

Interpreting the Curse of Dimensionality from Distance Concentration and Manifold Effect

Published 31 Dec 2023 in cs.LG and cs.DS | (2401.00422v3)

Abstract: The characteristics of data like distribution and heterogeneity, become more complex and counterintuitive as dimensionality increases. This phenomenon is known as curse of dimensionality, where common patterns and relationships (e.g., internal pattern and boundary pattern) that hold in low-dimensional space may be invalid in higher-dimensional space. It leads to a decreasing performance for the regression, classification, or clustering models or algorithms. Curse of dimensionality can be attributed to many causes. In this paper, we first summarize the potential challenges associated with manipulating high-dimensional data, and explains the possible causes for the failure of regression, classification, or clustering tasks. Subsequently, we delve into two major causes of the curse of dimensionality, distance concentration, and manifold effect, by performing theoretical and empirical analyses. The results demonstrate that, as the dimensionality increases, nearest neighbor search (NNS) using three classical distance measurements, Minkowski distance, Chebyshev distance, and cosine distance, becomes meaningless. Meanwhile, the data incorporates more redundant features, and the variance contribution of principal component analysis (PCA) is skewed towards a few dimensions.

Citations (2)

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (3)

Collections

Sign up for free to add this paper to one or more collections.