Sketching the Heat Kernel: Using Gaussian Processes to Embed Data (2403.07929v1)
Abstract: This paper introduces a novel, non-deterministic method for embedding data in low-dimensional Euclidean space based on computing realizations of a Gaussian process depending on the geometry of the data. This type of embedding first appeared in (Adler et al, 2018) as a theoretical model for a generic manifold in high dimensions. In particular, we take the covariance function of the Gaussian process to be the heat kernel, and computing the embedding amounts to sketching a matrix representing the heat kernel. The Karhunen-Lo`eve expansion reveals that the straight-line distances in the embedding approximate the diffusion distance in a probabilistic sense, avoiding the need for sharp cutoffs and maintaining some of the smaller-scale structure. Our method demonstrates further advantage in its robustness to outliers. We justify the approach with both theory and experiments.
- Convergence of the reach for a sequence of Gaussian-embedded manifolds. Probab. Theory Related Fields, 171(3-4):1045–1091, 2018.
- Random fields and geometry, volume 80. Springer, 2007.
- Jonathan Bates. The embedding dimension of Laplacian eigenfunction maps. Appl. Comput. Harmon. Anal., 37(3):516–530, 2014.
- Laplacian eigenmaps for dimensionality reduction and data representation. Neural computation, 15(6):1373–1396, 2003.
- Embedding Riemannian manifolds by their heat kernel. Geom. Funct. Anal., 4(4):373–398, 1994.
- Diffusion maps for changing data. Appl. Comput. Harmon. Anal., 36(1):79–107, 2014.
- Diffusion maps. Appl. Comput. Harmon. Anal., 21(1):5–30, 2006.
- On the convergence rate of sinkhorn’s algorithm, 2022.
- Universal local parametrizations via heat kernels and eigenfunctions of the Laplacian. Ann. Acad. Sci. Fenn. Math., 35(1):131–174, 2010.
- Philip A. Knight. The Sinkhorn-Knopp algorithm: convergence and applications. SIAM J. Matrix Anal. Appl., 30(1):261–275, 2008.
- The intrinsic geometry of some random manifolds. Electron. Commun. Probab., 22:Paper No. 1, 12, 2017.
- Stephane S. Lafon. Diffusion maps and geometric harmonics. ProQuest LLC, Ann Arbor, MI, 2004. Thesis (Ph.D.)–Yale University.
- Doubly stochastic normalization of the Gaussian kernel is robust to heteroskedastic noise. SIAM J. Math. Data Sci., 3(1):388–413, 2021.
- Spectral methods for uncertainty quantification. Scientific Computation. Springer, New York, 2010. With applications to computational fluid dynamics.
- Probability in Banach spaces. Classics in Mathematics. Springer-Verlag, Berlin, 2011. Isoperimetry and processes, Reprint of the 1991 edition.
- John M. Lee. Introduction to smooth manifolds, volume 218 of Graduate Texts in Mathematics. Springer, New York, second edition, 2013.
- Manifold learning with bi-stochastic kernels. IMA J. Appl. Math., 84(3):455–482, 2019.
- Per-Gunnar Martinsson. Randomized methods for matrix computations, 2019.
- Stephen Semmes. On the nonexistence of bi-Lipschitz parameterizations and geometric problems about A∞subscript𝐴A_{\infty}italic_A start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT-weights. Rev. Mat. Iberoamericana, 12(2):337–410, 1996.
- K. T. Sturm. Diffusion processes and heat kernels on metric spaces. Ann. Probab., 26(1):1–55, 1998.
- Roman Vershynin. High-dimensional probability, volume 47 of Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, Cambridge, 2018. An introduction with applications in data science, With a foreword by Sara van de Geer.