Frequency Bias in Neural Networks for Input of Non-Uniform Density (2003.04560v1)

Published 10 Mar 2020 in cs.LG and stat.ML

Abstract: Recent works have partly attributed the generalization ability of over-parameterized neural networks to frequency bias -- networks trained with gradient descent on data drawn from a uniform distribution find a low frequency fit before high frequency ones. As realistic training sets are not drawn from a uniform distribution, we here use the Neural Tangent Kernel (NTK) model to explore the effect of variable density on training dynamics. Our results, which combine analytic and empirical observations, show that when learning a pure harmonic function of frequency $\kappa$, convergence at a point $\x \in \Sphere^{d-1}$ occurs in time $O(\kappa^d/p(\x))$ where $p(\x)$ denotes the local density at $\x$. Specifically, for data in $\Sphere^1$ we analytically derive the eigenfunctions of the kernel associated with the NTK for two-layer networks. We further prove convergence results for deep, fully connected networks with respect to the spectral decomposition of the NTK. Our empirical study highlights similarities and differences between deep and shallow networks in this model.

Citations (167)

View on Semantic Scholar

Summary

The paper demonstrates that frequency bias, where low frequencies are learned faster, persists in neural networks even with non-uniform data density.
Analytical and empirical evaluations show convergence time for learning frequencies depends on frequency and inverse local data density.
The study finds that eigenfunctions of the Neural Tangent Kernel under non-uniform data have higher local frequencies in denser regions, explaining faster learning there.

Frequency Bias in Neural Networks for Inputs of Non-Uniform Density

The paper "Frequency Bias in Neural Networks for Input of Non-Uniform Density" investigates the dynamic behavior of neural networks when trained on datasets with varying density distributions. Specifically, it explores how over-parameterized neural networks exhibit frequency bias during training, a phenomenon where networks first learn the low-frequency components of a target function before fitting higher frequencies. This paper extends previous work that assumed uniform data distributions by considering more realistic, non-uniform distributions.

Key Findings

Frequency Bias in Non-Uniform Density: Using the Neural Tangent Kernel (NTK) model, the authors demonstrate that the frequency bias persists even in non-uniform data distributions. In regions with locally constant density, low frequencies are learned faster than high frequencies, proportionate to the local data density.
Learning Dynamics: Through both analytical and empirical evaluations, it is shown that when learning a pure harmonic function of frequency $\kappa$ , the convergence time at a point $x$ on the $d-1$ -dimensional sphere $S^{d-1}$ is $O(\kappa^d/p(x))$ , where $p(x)$ represents the local density. In particular, for one-dimensional inputs on $S^1$ , the eigenfunctions of the NTK for two-layer ReLU networks are derived, and learning a sine function in such a setting occurs in time $O(\kappa^2/p^*)$ , with $p^*$ being the minimum density in the input space.
Comparison of Deep and Shallow Networks: The paper compares the learning dynamics of deep fully-connected (FC) networks to shallow networks. It is found that, similar to shallow networks, deep networks exhibit frequency bias where the eigenvalues of the NTK decay with frequency, indicating slower convergence for higher-frequency components. However, deeper networks tend to learn medium frequencies faster than shallow networks.
Eigenfunctions and Density: The paper explores the eigenfunctions of NTK under non-uniform distributions. For piecewise constant densities, the eigenfunctions are composed of functions with local frequencies that are higher in denser regions. This implies faster learning of high-frequency components where the data is more densely packed.
Theoretical Implications: The paper generalizes analytical results of frequency bias in shallow neural networks to multi-layer, deep networks, encompassing the realistic setting of non-uniform data distributions. The theoretical underpinnings are supported by empirical findings, bridging the understanding of networks' capability to generalize.

Implications and Future Directions

The research emphasizes the importance of considering data distribution in analyzing neural network training processes. The frequency bias, coupled with varying data densities, suggests that neural networks inherently perform a form of implicit regularization, contributing to their ability to learn effectively without overfitting, especially when early stopping is used as a regularization method. These insights into learning dynamics posit that data preprocessing and network architecture can be tailored to leverage this frequency bias for better generalization.

Future work could expand on the implications of frequency bias in convolutional and recurrent networks, especially in tasks involving spatial or temporal data. Moreover, exploring methods to manipulate frequency bias actively through architectural or algorithmic innovations may provide pathways to optimize learning efficiency in deep networks.

In concluding, the paper contributes to the understanding of neural network generalization through the lens of frequency bias, offering valuable insights into how data density influences learning dynamics. This lays a foundation for more nuanced approaches to the design and training of neural networks in complex, realistic data environments.

Frequency Bias in Neural Networks for Input of Non-Uniform Density (2003.04560v1)

Summary

Frequency Bias in Neural Networks for Inputs of Non-Uniform Density

Key Findings

Implications and Future Directions

Related Papers