Demystifying Spectral Bias on Real-World Data (2406.02663v2)
Abstract: Kernel ridge regression (KRR) and Gaussian processes (GPs) are fundamental tools in statistics and machine learning, with recent applications to highly over-parameterized deep neural networks. The ability of these tools to learn a target function is directly related to the eigenvalues of their kernel sampled on the input data distribution. Targets that have support on higher eigenvalues are more learnable. However, solving such eigenvalue problems on real-world data remains a challenge. Here, we consider cross-dataset learnability and show that one may use eigenvalues and eigenfunctions associated with highly idealized data measures to reveal spectral bias on complex datasets and bound learnability on real-world data. This allows us to leverage various symmetries that realistic kernels manifest to unravel their spectral bias.