Spectral Bias and Task-Model Alignment Explain Generalization in Kernel Regression and Infinitely Wide Neural Networks (2006.13198v6)

Published 23 Jun 2020 in stat.ML, cond-mat.dis-nn, and cs.LG

Abstract: Generalization beyond a training dataset is a main goal of machine learning, but theoretical understanding of generalization remains an open problem for many models. The need for a new theory is exacerbated by recent observations in deep neural networks where overparameterization leads to better performance, contradicting the conventional wisdom from classical statistics. In this paper, we investigate generalization error for kernel regression, which, besides being a popular machine learning method, also includes infinitely overparameterized neural networks trained with gradient descent. We use techniques from statistical mechanics to derive an analytical expression for generalization error applicable to any kernel or data distribution. We present applications of our theory to real and synthetic datasets, and for many kernels including those that arise from training deep neural networks in the infinite-width limit. We elucidate an inductive bias of kernel regression to explain data with "simple functions", which are identified by solving a kernel eigenfunction problem on the data distribution. This notion of simplicity allows us to characterize whether a kernel is compatible with a learning task, facilitating good generalization performance from a small number of training examples. We show that more data may impair generalization when noisy or not expressible by the kernel, leading to non-monotonic learning curves with possibly many peaks. To further understand these phenomena, we turn to the broad class of rotation invariant kernels, which is relevant to training deep neural networks in the infinite-width limit, and present a detailed mathematical analysis of them when data is drawn from a spherically symmetric distribution and the number of input dimensions is large.

Authors (3)

Abdulkadir Canatar (10 papers)
Blake Bordelon (27 papers)
Cengiz Pehlevan (81 papers)

Citations (166)

View on Semantic Scholar

Summary

The paper provides an analytical derivation of the generalization error for kernel regression using statistical mechanics methods.
It demonstrates that spectral bias favors simpler function components, with task-model alignment reducing sample complexity.
The study bridges kernel methods with infinitely wide neural networks, offering insights into non-monotonic learning curves and overparameterization.

Spectral Bias and Task-Model Alignment in Generalization of Kernel Regression and Infinitely Wide Neural Networks

The paper "Spectral Bias and Task-Model Alignment Explain Generalization in Kernel Regression and Infinitely Wide Neural Networks," authored by Abdulkadir Canatar, Blake Bordelon, and Cengiz Pehlevan, presents an analytical paper on the generalization behavior of kernel regression and its connection to infinitely wide neural networks. Using techniques from statistical mechanics, the authors derive a comprehensive analytical expression for the generalization error applicable to any kernel and data distribution. The paper not only demonstrates applicability to real and synthetic datasets but also to deep networks characterized in the infinite-width limit. The results elucidate key generalization principles associated with spectral bias and task-model compatibility in kernel regression.

Analytical Derivation of Generalization Error in Kernel Regression

Kernel regression, a critical tool in supervised learning, has its performance evaluated by how well it generalizes from observed data to unseen samples. The paper achieves a milestone by deriving the generalization error for kernel regression using the replica method from statistical physics. This approach provides a mathematically rigorous framework, calculating the error across any given kernel and dataset, with a high degree of agreement with experimental observations. The kernel eigenvalues and alignment of the target function are central to understanding how kernel regression performs in fitting data.

Implications of Spectral Bias

The concept of spectral bias is central to the paper's discussion on the generalization capabilities of kernel regression. In essence, spectral bias indicates that kernel regression is predisposed toward fitting data with simpler functions, defined by large eigenvalues associated with the kernel's eigenfunctions. Consequently, kernels with spectral eigenfunctions aligned with the learning task enable more efficient learning, reducing the sample complexity needed for good generalization performance. The paper advances the discussion by introducing the cumulative power distribution as a heuristic measure of task-model alignment. This metric evaluates how the distribution of the target function's power aligns with the kernel's eigenfunctions.

Non-monotonicity in Sample-wise Learning Curves

The authors identify a phenomenon where generalization error may exhibit non-monotonic behavior as a function of the number of samples under conditions of label noise or when the target function includes components not expressible by the kernel basis. This behavior, analogous to the double-descent phenomenon observed in modern machine learning models, highlights the nuances in the bias-variance trade-off and illustrates the potential pitfalls of oversampling in noisy environments. The paper provides robust analytical insights into how such non-monotonicity arises from variance in the model’s predictions over different sampled datasets.

Practical and Theoretical Implications

By mapping the analytical framework from kernel regression to infinitely wide neural networks trained under specific conditions, the paper bridges classical machine learning theories with contemporary practices in deep learning. The implications are profound not only for understanding kernel methods but also for elucidating the performance of expansive neural networks in overparameterized regimes. The insights into how spectral bias and task-model alignment affect generalization could potentially inform the design of more efficient learning architectures and influence the choice of methods or models aligned with specific learning tasks.

Future Directions

The paper lays fertile ground for future research into both computational and theoretical aspects of machine learning. Extending the paper to finite-width neural networks, addressing computational complexities associated with large-scale eigendecomposition on real-world datasets, and exploring the impact of different kernel classes on generalization stand out as prospective research areas. The results also invite exploration into the interfaces with other areas in artificial intelligence and machine learning, such as reinforcement learning, where task-model compatibility is pivotal.

In conclusion, this comprehensive paper of kernel regression extends the theoretical understanding of generalization in high-dimensional feature spaces, emphasizing the pivotal roles of spectral bias and task-model alignment. These findings contribute significantly to bridging foundational statistical theories with practical AI applications that characterize modern machine learning landscapes.

PDF Markdown