Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Geometry of Critical Sets and Existence of Saddle Branches for Two-layer Neural Networks (2405.17501v1)

Published 26 May 2024 in cs.LG and math.OC

Abstract: This paper presents a comprehensive analysis of critical point sets in two-layer neural networks. To study such complex entities, we introduce the critical embedding operator and critical reduction operator as our tools. Given a critical point, we use these operators to uncover the whole underlying critical set representing the same output function, which exhibits a hierarchical structure. Furthermore, we prove existence of saddle branches for any critical set whose output function can be represented by a narrower network. Our results provide a solid foundation to the further study of optimization and training behavior of neural networks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (19)
  1. The loss landscape of deep linear neural networks: a second-order analysis. arXiv preprint arXiv:2107.13289v2, 2022.
  2. O. Calin. Deep Learning Architecture: A Mathematical Approach. Springer Series in the Data Sciences. Springer Nature Switzerland AG, 2020.
  3. Y. Cooper. Global minima of overparameterized neural networks. SIAM Journal on Mathematics of Data Science, 3(2):679–691, 2021.
  4. Identifying and attacking the saddle point problem in high-dimensional non-convex optimization. NeurIPS, 27:2933–2941, 2014.
  5. K. Fukumizu and S. ichi Amari. Local minima and plateaus in hierarchical structures of multilayer perceptrons. Neural Networks, 13:317–327, 2000.
  6. Semi-flat minima and saddle points by embedding neural networks to overparameterization. NeurIPS, 32, 2019.
  7. A. Hatcher. Algebraic Topology. Cambridge University Press, 2002.
  8. Understanding generalization through visualizations. NeurIPS, Workshop publishing, pages 87–97, 2020.
  9. On the benefit of width for neural networks: Disappearance of basins. SIAM Journal on Optimization, 32(3):1728–1758, 2022.
  10. Q. Nguyen. On connected sublevel sets in deep learning. ICML, page 4790–4799, 2019.
  11. Q. Nguyen and M. Hein. The loss surface of deep and wide neural networks. ICML, 70:2603–2612, 2017.
  12. Geometry of the loss landscape in overparametrized neural networks: Symmetry and invariances. Proceedings of Machine Learning Research, 139, 2021.
  13. I. Skorokhodov and M. Burtsev. Loss landscape sightseeing with multi-point optimization. NeurIPS, Workshop publishing, 2019.
  14. The global landscape of neural networks. Nonconvex Optimization for Signal Processing and Machine Learning, 37(5):95–108, 2020.
  15. Spurious valleys in one-hidden-layer neural network optimization landscapes. Journal of Machine Learning Research, 20(133):1–34, 2019.
  16. Structure and gradient dynamics near global minima of two-layer neural networks. arXiv preprint arXiv:2309.00508, 2023a.
  17. Emebdding principle of loss landscape of deep neural networks. NeurIPS, 34:14848–14859, 2021.
  18. Emebdding principle: a hierarchical structure of loss landscape of deep neural networks. Journal of Machine Learning, 1(1):60–113, 2022.
  19. Optimistic estimate uncovers the potential of nonlinear models. arXiv preprint arXiv:2307.08921, 2023b.

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets