Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
126 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Towards Understanding the Spectral Bias of Deep Learning (1912.01198v3)

Published 3 Dec 2019 in cs.LG and stat.ML

Abstract: An intriguing phenomenon observed during training neural networks is the spectral bias, which states that neural networks are biased towards learning less complex functions. The priority of learning functions with low complexity might be at the core of explaining generalization ability of neural network, and certain efforts have been made to provide theoretical explanation for spectral bias. However, there is still no satisfying theoretical result justifying the underlying mechanism of spectral bias. In this paper, we give a comprehensive and rigorous explanation for spectral bias and relate it with the neural tangent kernel function proposed in recent work. We prove that the training process of neural networks can be decomposed along different directions defined by the eigenfunctions of the neural tangent kernel, where each direction has its own convergence rate and the rate is determined by the corresponding eigenvalue. We then provide a case study when the input data is uniformly distributed over the unit sphere, and show that lower degree spherical harmonics are easier to be learned by over-parameterized neural networks. Finally, we provide numerical experiments to demonstrate the correctness of our theory. Our experimental results also show that our theory can tolerate certain model misspecification in terms of the input data distribution.

Citations (191)

Summary

  • The paper introduces a generic theorem showing that neural network convergence along NTK eigendirections depends on eigenvalue magnitudes and over-parameterization.
  • It characterizes the NTK spectrum by demonstrating that networks efficiently learn lower-degree spherical harmonics with uniformly distributed data.
  • Empirical findings confirm that spectral bias inherently steers learning towards simpler functions, informing improved training strategies and generalization.

Understanding Spectral Bias in Deep Learning

The manuscript titled "Towards Understanding the Spectral Bias of Deep Learning" presents an in-depth theoretical exploration into the phenomenon of spectral bias observed during the training of neural networks. Spectral bias refers to the tendency of neural networks to preferentially learn functions of lower complexity. This behavior, though empirically observed, has not been entirely understood theoretically.

The authors embark on rigorously examining spectral bias using the framework of the Neural Tangent Kernel (NTK). They provide a theoretical exposition linking spectral bias to the eigenfunctions of the NTK. Specifically, the paper \ reveals that training of neural networks can be interpreted as converging along the eigendirections defined by the NTK. This convergence rate is intrinsically tied to the eigenvalues pertinent to these directions, with networks inherently biasing towards those components associated with larger eigenvalues.

The analytical framework further offers a case paper with input data uniformly distributed over a unit sphere, demonstrating that neural networks learn lower degree spherical harmonics more efficiently within the over-parameterized setup. Moreover, by leveraging numerical experiments, the paper substantiates its theoretical propositions, showing the theory's resilience even to certain model misspecifications regarding input data distributions.

The paper presents several significant contributions:

  1. Generic Theorem on Convergence: It showcases that under specific sample complexity and over-parameterization conditions, convergence of the training error along eigendirections of NTK hinges on their respective eigenvalues.
  2. Characterization of NTK Spectra: The research extends the understanding of NTK spectra, articulating more generalized spectra characteristics compared to earlier work, including when input distributions are uniformly over the unit sphere.
  3. Comprehensive Account of Spectral Bias: By establishing a precise control over regression residuals and outlining the implications of spectral bias in learning dynamics, the analysis provides a comprehensive theoretical account for the presence of spectral bias.

The findings hold substantial consequences for the theoretical understanding of neural networks' generalization capabilities. By elaborating on the spectral bias through NTK eigenfunctions, the paper paves the way for improved interpretability of neural networks during training and potentially suggests new avenues for designing architectures and training algorithms that exploit this bias to enhance performance.

Numerically, spectral bias implies that lower complexity components presumed to generalize better to unseen data are learned quickly, inherently aligning learning trajectories towards simpler solutions even in highly flexible over-parameterized settings. Consequently, this bias could be exploited in practical scenarios where early stopping criteria or regularization strategies are applied, shaping how practitioners think about network capacity and complexity.

Future research could focus on extrapolating these findings to broader neural network architectures beyond two-layer constructs and exploring how spectral bias manifests across varied data distributions and task types. Additionally, bridging the NTK framework with other theoretical models of learning could yield further insights into dynamics at play within deep learning paradigms.

In conclusion, by elucidating underlying mechanisms of spectral bias through a robust theoretical lens, the paper contributes significantly to the ongoing discourse on neural networks’ generalization and learning efficiencies. Integrating this understanding with practical machine learning workflows could substantially enhance the predictability and robustness of deep learning models.