Spectrum Dependent Learning Curves in Kernel Regression and Wide Neural Networks (2002.02561v7)

Published 7 Feb 2020 in cs.LG and stat.ML

Abstract: We derive analytical expressions for the generalization performance of kernel regression as a function of the number of training samples using theoretical methods from Gaussian processes and statistical physics. Our expressions apply to wide neural networks due to an equivalence between training them and kernel regression with the Neural Tangent Kernel (NTK). By computing the decomposition of the total generalization error due to different spectral components of the kernel, we identify a new spectral principle: as the size of the training set grows, kernel machines and neural networks fit successively higher spectral modes of the target function. When data are sampled from a uniform distribution on a high-dimensional hypersphere, dot product kernels, including NTK, exhibit learning stages where different frequency modes of the target function are learned. We verify our theory with simulations on synthetic data and MNIST dataset.

Citations (184)

View on Semantic Scholar

Summary

The paper introduces a spectral principle that reveals how kernel methods and NTK-based neural networks learn target functions as training data increases.
The authors derive analytical learning curves using Gaussian process methods and statistical physics, clarifying the asymptotic behavior of generalization errors.
Simulations on synthetic data and MNIST confirm that higher kernel eigenvalue modes are learned more efficiently, highlighting spectral bias.

Spectrum Dependent Learning Curves in Kernel Regression and Wide Neural Networks

The paper "Spectrum Dependent Learning Curves in Kernel Regression and Wide Neural Networks" presents a theoretical framework for understanding the generalization performance of kernel regression and wide neural networks. This framework emphasizes the dependence of learning curves on the spectral properties of data and models. Utilizing methods from Gaussian processes and statistical physics, the authors derive analytical expressions characterizing the generalization error as a function of training sample size and relate these findings to wide neural networks via the Neural Tangent Kernel (NTK).

Key Insights and Theoretical Contributions

Spectral Principle Identification: The paper introduces a spectral principle that dictates that kernel machines and neural networks learn high spectral modes of a target function with increasing training sample size. This insight is particularly relevant for understanding learning dynamics and generalization in high-dimensional settings like neural networks.
Learning Curves Derivation: Analytical expressions for learning curves are derived using Gaussian process methods and the replica technique from statistical physics. These expressions elucidate the asymptotic behavior of generalization errors, emphasizing the role of the kernel’s spectrum in learning efficiency.
Modal Error Analysis: The authors dissect the generalization error into components associated with different eigenmodes of the target function. They demonstrate that modes with higher kernel eigenvalues are learned more effectively, a nuance that underpins the spectral bias observed in neural networks.
Dot Product Kernels on High-Dimensional Spheres: Special focus is given to dot product kernels, including the NTK, which are essential in understanding neural network inductive biases. The spectral decomposition in this scenario reveals distinct learning phases as the data dimension becomes large, offering new perspectives on mode learning dynamics.
Verification through Simulations: Theoretical predictions are corroborated with extensive simulations using synthetic datasets and MNIST, confirming the model’s accuracy in capturing spectral learning behaviors in various domains including kernel regression and neural networks.

Practical and Theoretical Implications

Spectral Bias in Neural Networks: The framework advances our understanding of spectral bias in neural networks, particularly how low-frequency components of a target are preferentially learned and how this impacts generalization.
Design of Architectures and Learning Strategies: Insights into spectral learning can guide the design of neural network architectures and learning algorithms that exploit these biases for improved performance in tasks requiring high-frequency feature learning.
Future Directions in AI: The connection between kernel regression and neural network learning via NTK paves the way for further exploration of learning dynamics in deep models, potentially influencing advancements in automated architecture selection and training protocols.

This paper provides a detailed analytical basis for the exploration of learning curves in kernel machines and neural networks, establishing foundational principles that can influence both practical applications and theoretical advancements in machine learning. The results underscore the importance of spectral considerations in understanding and optimizing machine learning models across various contexts.

PDF Markdown

Spectrum Dependent Learning Curves in Kernel Regression and Wide Neural Networks (2002.02561v7)

Summary

Spectrum Dependent Learning Curves in Kernel Regression and Wide Neural Networks

Key Insights and Theoretical Contributions

Practical and Theoretical Implications

Related Papers