Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
162 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Directional Convergence Near Small Initializations and Saddles in Two-Homogeneous Neural Networks (2402.09226v2)

Published 14 Feb 2024 in cs.LG, math.OC, and stat.ML

Abstract: This paper examines gradient flow dynamics of two-homogeneous neural networks for small initializations, where all weights are initialized near the origin. For both square and logistic losses, it is shown that for sufficiently small initializations, the gradient flow dynamics spend sufficient time in the neighborhood of the origin to allow the weights of the neural network to approximately converge in direction to the Karush-Kuhn-Tucker (KKT) points of a neural correlation function that quantifies the correlation between the output of the neural network and corresponding labels in the training data set. For square loss, it has been observed that neural networks undergo saddle-to-saddle dynamics when initialized close to the origin. Motivated by this, this paper also shows a similar directional convergence among weights of small magnitude in the neighborhood of certain saddle points.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (32)
  1. Implicit regularization in deep matrix factorization. In Advances in Neural Information Processing Systems, 2019a.
  2. On exact computation with an infinitely wide neural net. In Advances in Neural Information Processing Systems, 2019b.
  3. Neural networks as kernel learners: The silent alignment effect. In International Conference on Learning Representations, 2022.
  4. Convex Analysis and Nonlinear Optimization. Springer Verlag, Berlin, Heidelberg, New York, 2000.
  5. Gradient flow dynamics of shallow reLU networks for square loss and orthogonal inputs. In Advances in Neural Information Processing Systems, 2022.
  6. Why do larger models generalize better? A theoretical perspective via the XOR problem. In Proceedings of the 36th International Conference on Machine Learning, 2019.
  7. On the global convergence of gradient descent for over-parameterized models using optimal transport. In Advances in Neural Information Processing Systems, volume 31. Curran Associates, Inc., 2018.
  8. On lazy training in differentiable programming. In Advances in Neural Information Processing Systems, 2019.
  9. Nonsmooth Analysis and Control Theory. Springer-Verlag, Berlin, Heidelberg, 1998. ISBN 0387983368.
  10. F.H. Clarke. Optimization and Nonsmooth Analysis. Wiley New York, 1983.
  11. M. Coste. An Introduction to O-minimal Geometry. Dottorato di ricerca in matematica / Università di Pisa, Dipartimento di Matematica. Istituti editoriali e poligrafici internazionali, 2000. ISBN 9788881472260.
  12. Stochastic subgradient method converges on tame functions. Foundations of Computational Mathematics, 20:119–154, 2018.
  13. Aleksej F. Filippov. Differential equations with discontinuous righthand sides. In Mathematics and Its Applications, 1988.
  14. Disentangling feature and lazy training in deep neural networks. Journal of Statistical Mechanics: Theory and Experiment, 2020(11):113301, nov 2020.
  15. Neural tangent kernel: Convergence and generalization in neural networks. In Advances in Neural Information Processing Systems, 2018.
  16. Saddle-to-saddle dynamics in deep linear networks: Small initialization training, symmetry, and sparsity, 2022.
  17. Directional convergence and alignment in deep learning. In Advances in Neural Information Processing Systems, volume 33, 2020.
  18. Understanding incremental learning of gradient descent: A fine-grained analysis of matrix sensing. In Proceedings of the 40th International Conference on Machine Learning, 2023.
  19. Phase diagram for two-layer relu neural networks at infinite-width limit. Journal of Machine Learning Research, 22(71):1–47, 2021.
  20. Gradient descent maximizes the margin of homogeneous neural networks. In International Conference on Learning Representations, 2020.
  21. Gradient descent on two-layer nets: Margin maximization and simplicity bias. In Advances in Neural Information Processing Systems, volume 34, 2021.
  22. Gradient descent quantizes relu network features, 2018.
  23. Mean-field theory of two-layers neural networks: dimension-free bounds and kernel limit. In Proceedings of the Thirty-Second Conference on Learning Theory, pp.  2388–2464, 2019.
  24. Early neuron alignment in two-layer reLU networks with small initialization. In The Twelfth International Conference on Learning Representations, 2024.
  25. In search of the real inductive bias: On the role of implicit regularization in deep learning. In ICLR (Workshop), 2015.
  26. Saddle-to-saddle dynamics in diagonal linear networks. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
  27. The implicit bias of gradient descent on separable data. J. Mach. Learn. Res., 19(1):2822–2878, January 2018.
  28. Understanding multi-phase optimization dynamics and rich nonlinear behaviors of reLU networks. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
  29. A. J. Wilkie. Model completeness results for expansions of the ordered field of real numbers by restricted pfaffian functions and the exponential function. Journal of the American Mathematical Society, 9(4):1051–1094, 1996. ISSN 08940347, 10886834. URL http://www.jstor.org/stable/2152916.
  30. Gradient dynamics of shallow univariate relu networks. In Advances in Neural Information Processing Systems, volume 32, 2019.
  31. Kernel and rich regimes in overparametrized models. In Proceedings of Thirty Third Conference on Learning Theory, pp.  3635–3673, 2020.
  32. Tensor programs iv: Feature learning in infinite-width neural networks. In Proceedings of the 38th International Conference on Machine Learning, pp.  11727–11737, 2021.
Citations (5)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com