Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
153 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

On the Spectral Bias of Neural Networks (1806.08734v3)

Published 22 Jun 2018 in stat.ML and cs.LG

Abstract: Neural networks are known to be a class of highly expressive functions able to fit even random input-output mappings with $100\%$ accuracy. In this work, we present properties of neural networks that complement this aspect of expressivity. By using tools from Fourier analysis, we show that deep ReLU networks are biased towards low frequency functions, meaning that they cannot have local fluctuations without affecting their global behavior. Intuitively, this property is in line with the observation that over-parameterized networks find simple patterns that generalize across data samples. We also investigate how the shape of the data manifold affects expressivity by showing evidence that learning high frequencies gets \emph{easier} with increasing manifold complexity, and present a theoretical understanding of this behavior. Finally, we study the robustness of the frequency components with respect to parameter perturbation, to develop the intuition that the parameters must be finely tuned to express high frequency functions.

Citations (1,223)

Summary

  • The paper demonstrates that deep neural networks inherently favor low-frequency functions over high-frequency ones.
  • The paper employs Fourier analysis of ReLU networks to show that high-frequency components decay rapidly, requiring precise parameter tuning.
  • The paper reveals that complex data manifolds can facilitate high-frequency learning, offering counterintuitive insights into network training dynamics.

On the Spectral Bias of Neural Networks

The paper "On the Spectral Bias of Neural Networks" investigates an intrinsic bias in deep neural networks (DNNs) towards learning lower frequency functions, a phenomenon the authors term as "spectral bias." This bias implies that DNNs tend to learn functions that vary smoothly with fewer local fluctuations before those with higher frequencies. Using tools from Fourier analysis, the paper elucidates the theoretical underpinnings of this bias and examines its empirical manifestations. Additionally, the authors delve into how the complexity of the data manifold influences the learning of high-frequency components, establishing a counterintuitive insight that complex manifolds might ease high-frequency learning.

Fourier Analysis of ReLU Networks

Neural networks, particularly those with Rectified Linear Unit (ReLU) activations, are highlighted for their ability to be viewed as Continuous Piecewise Linear (CPWL) functions. The authors leverage this property to analytically investigate the Fourier spectrum of these networks. Through rigorous Fourier analysis, it is shown that the Fourier components of ReLU networks decay rapidly with frequency. Specifically, for a network function ff, the Fourier transform f~(k)\tilde{f}(k) decays as a rational function of the frequency magnitude kk:

f~(k)=n=0dCn(θ,k)1Hnθ(k)kn+1\tilde{f}(k) = \sum_{n=0}^{d} \frac{C_n(\theta, k) \mathbf{1}_{H^\theta_n}(k)}{k^{n + 1}}

This establishes an inherent low-frequency preference in DNNs. The constants Cn(θ,k)C_n(\theta, k) depend on the network parameters θ\theta and are bounded, revealing that learning high-frequency components demands finely-tuned parameters.

Empirical Evidence of Spectral Bias

Through a series of controlled experiments, the paper empirically demonstrates the spectral bias. Networks trained to fit a superposition of sinusoids with varying frequencies were observed to learn lower frequencies first, followed by higher frequencies. This was consistent across different frequencies and amplitudes of the target functions. When the network parameters were randomly perturbed, the high-frequency components were found to be more fragile compared to low-frequency ones, illustrating that expressing high frequencies requires precise parameter configurations.

Spectral Bias in Real-Data Contexts

Further experiments extend these findings to real-world data, specifically the MNIST dataset. The authors added sinusoidal noise to the target functions and examined the impact on validation performance. It was observed that noise with low frequency adversely affected validation performance more than high-frequency noise, reaffirming that DNNs prioritize and learn smoother patterns more effectively.

Additionally, the introduction of eigenfunctions of the Gaussian RBF kernel as a generalized notion of frequency provided insight into the dynamics of spectral bias in high-dimensional spaces such as MNIST. Visualizing the spectrum evolution during training showed that the network converged on low-frequency components before higher frequencies.

Impact of Data Manifold Complexity

The paper explores how the shape of the data manifold affects the ease of learning higher frequencies. The experimental setup involved training networks using target functions defined on synthetic manifolds with varying complexities (e.g., circles versus flower-shaped curves with multiple petals). It was discovered that more complex manifolds (those with higher intrinsic frequencies) permitted the network to learn higher frequency target functions more readily. The rationale is that low-frequency functions in the input space may map to high-frequency components on intricate lower-dimensional manifolds, making these complex embeddings beneficial for capturing high-frequency behavior.

Implications and Future Directions

The research has several significant implications for both the theoretical understanding and practical applications of neural networks. On a theoretical level, the spectral bias offers a refined perspective on the implicit regularization mechanisms of DNNs. Practically, this understanding could inform strategies for designing more efficient training protocols and architectures that leverage or mitigate this bias, depending on the application requirements.

Future work could investigate the spectral properties of different network architectures and activation functions beyond ReLU. Additionally, examining how various training regimes, such as curriculum learning or adversarial training, influence the spectral bias could lead to novel training methodologies aimed at optimizing neural network generalization and robustness.

In summary, this paper provides a comprehensive analysis of the spectral bias inherent in neural networks, combining theoretical insights with empirical validation and highlighting the nuanced relationship between data manifold complexity and the learning dynamics of high-frequency functions.