How Can Deep Neural Networks Fail Even With Global Optima? (2407.16872v1)
Abstract: Fully connected deep neural networks are successfully applied to classification and function approximation problems. By minimizing the cost function, i.e., finding the proper weights and biases, models can be built for accurate predictions. The ideal optimization process can achieve global optima. However, do global optima always perform well? If not, how bad can it be? In this work, we aim to: 1) extend the expressive power of shallow neural networks to networks of any depth using a simple trick, 2) construct extremely overfitting deep neural networks that, despite having global optima, still fail to perform well on classification and function approximation problems. Different types of activation functions are considered, including ReLU, Parametric ReLU, and Sigmoid functions. Extensive theoretical analysis has been conducted, ranging from one-dimensional models to models of any dimensionality. Numerical results illustrate our theoretical findings.
- Andrew R Barron. Universal approximation bounds for superpositions of a sigmoidal function. IEEE Transactions on Information theory, 39(3):930–945, 1993.
- Fast and accurate deep network learning by exponential linear units (elus). ICLR, 2016.
- George Cybenko. Approximation by superpositions of a sigmoidal function. Mathematics of control, signals and systems, 2(4):303–314, 1989.
- A priori estimates of the population risk for two-layer neural networks. Communications in Mathematical Sciences, 2018.
- Deep sparse rectifier neural networks. In Proceedings of the fourteenth international conference on artificial intelligence and statistics, pages 315–323. JMLR Workshop and Conference Proceedings, 2011.
- Relu deep neural networks and linear finite elements. Journal of Computational Mathematics, 38(3):502–527, 2020.
- Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In Proceedings of the IEEE international conference on computer vision, pages 1026–1034, 2015.
- Kurt Hornik. Approximation capabilities of multilayer feedforward networks. Neural networks, 4(2):251–257, 1991.
- Kurt Hornik. Some new results on neural network approximation. Neural networks, 6(8):1069–1072, 1993.
- Universal approximation with deep narrow networks. In Conference on learning theory, pages 2306–2327. PMLR, 2020.
- Self-normalizing neural networks. Advances in neural information processing systems, 30, 2017.
- Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, 25, 2012.
- Uniform approximation rates and metric entropy of shallow neural networks. Research in the Mathematical Sciences, 9(3):46, 2022.
- Rectifier nonlinearities improve neural network acoustic models. In Proc. icml, volume 30, page 3. Atlanta, GA, 2013.
- Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th international conference on machine learning (ICML-10), pages 807–814, 2010.
- Michael A Nielsen. Neural networks and deep learning, volume 25. Determination press San Francisco, CA, USA, 2015.
- Universal approximation using radial-basis-function networks. Neural computation, 3(2):246–257, 1991.
- Jonathan W Siegel. Optimal approximation rates for deep relu neural networks on sobolev and besov spaces. Journal of Machine Learning Research, 24(357):1–52, 2023.
- High-order approximation rates for shallow neural networks with cosine and relu activation functions. Applied and Computational Harmonic Analysis, 58:1–26, 2022.
- Sharp bounds on the approximation rates, metric entropy, and n-widths of shallow neural networks. Foundations of Computational Mathematics, pages 1–57, 2022.
- Attention is all you need. Advances in neural information processing systems, 30, 2017.
- Dmitry Yarotsky. Error bounds for approximations with deep relu networks. Neural Networks, 94:103–114, 2017.