What is the $\textit{intrinsic}$ dimension of your binary data? -- and how to compute it quickly
Abstract: Dimensionality is an important aspect for analyzing and understanding (high-dimensional) data. In their 2006 ICDM paper Tatti et al. answered the question for a (interpretable) dimension of binary data tables by introducing a normalized correlation dimension. In the present work we revisit their results and contrast them with a concept based notion of intrinsic dimension (ID) recently introduced for geometric data sets. To do this, we present a novel approximation for this ID that is based on computing concepts only up to a certain support value. We demonstrate and evaluate our approximation using all available datasets from Tatti et al., which have between 469 and 41271 extrinsic dimensions.
- Anonymous, A.: What is the intrinsic dimension of your binary data? - and how to compute it quickly - Artifacts (Apr 2024). https://doi.org/10.5281/zenodo.10908237
- Pestov, V.: Intrinsic dimension of a dataset: what properties does one expect? In: IJCNN. pp. 2959–2964 (2007). https://doi.org/10.1109/IJCNN.2007.4371431
- Tatti, N.: Distances between data sets based on summary statistics. Journal of Machine Learning Research 8(1) (2007)
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.