2000 character limit reached
Geometry of Critical Sets and Existence of Saddle Branches for Two-layer Neural Networks (2405.17501v1)
Published 26 May 2024 in cs.LG and math.OC
Abstract: This paper presents a comprehensive analysis of critical point sets in two-layer neural networks. To study such complex entities, we introduce the critical embedding operator and critical reduction operator as our tools. Given a critical point, we use these operators to uncover the whole underlying critical set representing the same output function, which exhibits a hierarchical structure. Furthermore, we prove existence of saddle branches for any critical set whose output function can be represented by a narrower network. Our results provide a solid foundation to the further study of optimization and training behavior of neural networks.
- The loss landscape of deep linear neural networks: a second-order analysis. arXiv preprint arXiv:2107.13289v2, 2022.
- O. Calin. Deep Learning Architecture: A Mathematical Approach. Springer Series in the Data Sciences. Springer Nature Switzerland AG, 2020.
- Y. Cooper. Global minima of overparameterized neural networks. SIAM Journal on Mathematics of Data Science, 3(2):679–691, 2021.
- Identifying and attacking the saddle point problem in high-dimensional non-convex optimization. NeurIPS, 27:2933–2941, 2014.
- K. Fukumizu and S. ichi Amari. Local minima and plateaus in hierarchical structures of multilayer perceptrons. Neural Networks, 13:317–327, 2000.
- Semi-flat minima and saddle points by embedding neural networks to overparameterization. NeurIPS, 32, 2019.
- A. Hatcher. Algebraic Topology. Cambridge University Press, 2002.
- Understanding generalization through visualizations. NeurIPS, Workshop publishing, pages 87–97, 2020.
- On the benefit of width for neural networks: Disappearance of basins. SIAM Journal on Optimization, 32(3):1728–1758, 2022.
- Q. Nguyen. On connected sublevel sets in deep learning. ICML, page 4790–4799, 2019.
- Q. Nguyen and M. Hein. The loss surface of deep and wide neural networks. ICML, 70:2603–2612, 2017.
- Geometry of the loss landscape in overparametrized neural networks: Symmetry and invariances. Proceedings of Machine Learning Research, 139, 2021.
- I. Skorokhodov and M. Burtsev. Loss landscape sightseeing with multi-point optimization. NeurIPS, Workshop publishing, 2019.
- The global landscape of neural networks. Nonconvex Optimization for Signal Processing and Machine Learning, 37(5):95–108, 2020.
- Spurious valleys in one-hidden-layer neural network optimization landscapes. Journal of Machine Learning Research, 20(133):1–34, 2019.
- Structure and gradient dynamics near global minima of two-layer neural networks. arXiv preprint arXiv:2309.00508, 2023a.
- Emebdding principle of loss landscape of deep neural networks. NeurIPS, 34:14848–14859, 2021.
- Emebdding principle: a hierarchical structure of loss landscape of deep neural networks. Journal of Machine Learning, 1(1):60–113, 2022.
- Optimistic estimate uncovers the potential of nonlinear models. arXiv preprint arXiv:2307.08921, 2023b.