Understanding the dynamics of the frequency bias in neural networks (2405.14957v1)
Abstract: Recent works have shown that traditional Neural Network (NN) architectures display a marked frequency bias in the learning process. Namely, the NN first learns the low-frequency features before learning the high-frequency ones. In this study, we rigorously develop a partial differential equation (PDE) that unravels the frequency dynamics of the error for a 2-layer NN in the Neural Tangent Kernel regime. Furthermore, using this insight, we explicitly demonstrate how an appropriate choice of distributions for the initialization weights can eliminate or control the frequency bias. We focus our study on the Fourier Features model, an NN where the first layer has sine and cosine activation functions, with frequencies sampled from a prescribed distribution. In this setup, we experimentally validate our theoretical results and compare the NN dynamics to the solution of the PDE using the finite element method. Finally, we empirically show that the same principle extends to multi-layer NNs.
- The FEniCS project version 1.5. Archive of Numerical Software, 3, 2015. doi: 10.11588/ans.2015.100.20553.
- On exact computation with an infinitely wide neural net. Advances in neural information processing systems, 32, 2019.
- A closer look at memorization in deep networks. In International conference on machine learning, pages 233–242. PMLR, 2017.
- Seeing implicit neural representations as fourier series. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 2041–2050, 2022.
- Spectrum dependent learning curves in kernel regression and wide neural networks. In International Conference on Machine Learning, pages 1024–1034. PMLR, 2020.
- JAX: composable transformations of Python+NumPy programs, 2018. URL http://github.com/google/jax.
- Towards understanding the spectral bias of deep learning. In 30th International Joint Conference on Artificial Intelligence (IJCAI 2021), pages 2205–2211. International Joint Conferences on Artificial Intelligence, 2021.
- S. Dahlgaard and J. Evald. Tight hardness results for distance and centrality problems in constant degree graphs. arXiv preprint arXiv:1609.08403, 2016.
- Input–output maps are strongly biased towards simple outputs. Nature communications, 9(1):761, 2018.
- J. Feldman. The simplicity principle in perception and cognition. Wiley Interdisciplinary Reviews: Cognitive Science, 7(5):330–340, 2016.
- On the spectral bias of convolutional neural tangent and gaussian process kernels. Advances in Neural Information Processing Systems, 35:11253–11265, 2022.
- Controlling the inductive bias of wide neural networks by modifying the kernel’s spectrum. arXiv preprint arXiv:2307.14531, 2023.
- Imagenet-trained cnns are biased towards texture; increasing shape bias improves accuracy and robustness. arXiv preprint arXiv:1811.12231, 2018.
- On the activation function dependence of the spectral bias of neural networks. arXiv preprint arXiv:2208.04924, 2022.
- Neural tangent kernel: Convergence and generalization in neural networks. Advances in neural information processing systems, 31, 2018.
- Sgd on neural networks learns functions of increasing complexity. Advances in neural information processing systems, 32, 2019.
- Wide neural networks of any depth evolve as linear models under gradient descent. Advances in neural information processing systems, 32, 2019.
- Automated Solution of Differential Equations by the Finite Element Method. Springer, 2012. doi: 10.1007/978-3-642-23099-8.
- Theory of the frequency principle for general deep neural networks. arXiv preprint arXiv:1906.09235, 2019.
- An upper limit of decaying rate with respect to frequency in linear frequency principle model. In Mathematical and Scientific Machine Learning, pages 205–214. PMLR, 2022a.
- On the exact computation of linear frequency principle dynamics and its generalization. SIAM Journal on Mathematics of Data Science, 4(4):1272–1292, 2022b.
- D. A. Lyon. The discrete fourier transform, part 4: spectral leakage. Journal of object technology, 8(7), 2009.
- On progressive sharpening, flat minima and generalisation. arXiv preprint arXiv:2305.14683, 2023.
- Mean-field theory of two-layers neural networks: dimension-free bounds and kernel limit. In Conference on Learning Theory, pages 2388–2464. PMLR, 2019.
- Implicit neural representation in medical imaging: A comparative survey. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 2381–2391, 2023.
- Simplicity bias in 1-hidden layer neural networks. Advances in Neural Information Processing Systems, 36, 2024.
- On the spectral bias of neural networks. In International Conference on Machine Learning, pages 5301–5310. PMLR, 2019.
- A. Rahimi and B. Recht. Random features for large-scale kernel machines. Advances in neural information processing systems, 20, 2007.
- S. Ramasinghe and S. Lucey. Learning positional embeddings for coordinate-mlps. arXiv preprint arXiv:2112.11577, 2021.
- S. Ramasinghe and S. Lucey. Beyond periodicity: Towards a unifying framework for activations in coordinate-mlps. In European Conference on Computer Vision, pages 142–158. Springer, 2022.
- On the frequency-bias of coordinate-mlps. Advances in Neural Information Processing Systems, 35:796–809, 2022.
- The convergence rate of neural networks for learned functions of different frequencies. Advances in Neural Information Processing Systems, 32, 2019.
- J. Schmidhuber. Discovering problem solutions with low Kolmogorov complexity and high generalization capability. Inst. für Informatik, 1994.
- The pitfalls of simplicity bias in neural networks. Advances in Neural Information Processing Systems, 33:9573–9585, 2020.
- A. Sinha and J. C. Duchi. Learning kernels with random features. Advances in neural information processing systems, 29, 2016.
- Implicit neural representations with periodic activation functions. Advances in neural information processing systems, 33:7462–7473, 2020.
- Fourier features let networks learn high frequency functions in low dimensional domains. Advances in Neural Information Processing Systems, 33:7537–7547, 2020.
- N. Tishby and N. Zaslavsky. Deep learning and the information bottleneck principle. In 2015 ieee information theory workshop (itw), pages 1–5. IEEE, 2015.
- Attention is all you need. Advances in neural information processing systems, 30, 2017.
- On the eigenvector bias of fourier feature networks: From regression to solving multi-scale pdes with physics-informed neural networks. Computer Methods in Applied Mechanics and Engineering, 384:113938, 2021.
- Neural fields in visual computing and beyond. In Computer Graphics Forum, volume 41, pages 641–676. Wiley Online Library, 2022.
- Z. J. Xu. Understanding training and generalization in deep learning by fourier analysis. arXiv preprint arXiv:1808.04295, 2018.
- Z. J. Xu and H. Zhou. Deep frequency principle towards understanding why deeper learning is faster. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 10541–10550, 2021.
- Frequency principle: Fourier analysis sheds light on deep neural networks. arXiv preprint arXiv:1901.06523, 2019a.
- Training behavior of deep neural network in frequency domain. In Neural Information Processing: 26th International Conference, ICONIP 2019, Sydney, NSW, Australia, December 12–15, 2019, Proceedings, Part I 26, pages 264–274. Springer, 2019b.
- G. Yang and E. J. Hu. Tensor programs iv: Feature learning in infinite-width neural networks. In International Conference on Machine Learning, pages 11727–11737. PMLR, 2021.
- G. Yang and H. Salman. A fine-grained spectral perspective on neural networks. arXiv preprint arXiv:1907.10599, 2019.
- Identifying spurious biases early in training through the lens of simplicity bias. In International Conference on Artificial Intelligence and Statistics, pages 2953–2961. PMLR, 2024.
- A linear frequency principle model to understand the absence of overfitting in neural networks. Chinese Physics Letters, 38(3):038701, 2021.
- Rethinking positional encoding. arXiv preprint arXiv:2107.02561, 2021.