Early Directional Convergence in Deep Homogeneous Neural Networks for Small Initializations (2403.08121v3)

Published 12 Mar 2024 in cs.LG, math.OC, and stat.ML

Abstract: This paper studies the gradient flow dynamics that arise when training deep homogeneous neural networks assumed to have locally Lipschitz gradients and an order of homogeneity strictly greater than two. It is shown here that for sufficiently small initializations, during the early stages of training, the weights of the neural network remain small in (Euclidean) norm and approximately converge in direction to the Karush-Kuhn-Tucker (KKT) points of the recently introduced neural correlation function. Additionally, this paper also studies the KKT points of the neural correlation function for feed-forward networks with (Leaky) ReLU and polynomial (Leaky) ReLU activations, deriving necessary and sufficient conditions for rank-one KKT points.

References (35)

Citations (2)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Tweets

https://twitter.com/StatMLPapers/status/1768125881694749158

Early Directional Convergence in Deep Homogeneous Neural Networks for Small Initializations (2403.08121v3)

Summary

Related Papers

Tweets