2000 character limit reached
Non-approximability of constructive global $\mathcal{L}^2$ minimizers by gradient descent in Deep Learning (2311.07065v1)
Published 13 Nov 2023 in cs.LG, cs.AI, math-ph, math.MP, math.OC, and stat.ML
Abstract: We analyze geometric aspects of the gradient descent algorithm in Deep Learning (DL) networks. In particular, we prove that the globally minimizing weights and biases for the $\mathcal{L}2$ cost obtained constructively in [Chen-Munoz Ewald 2023] for underparametrized ReLU DL networks can generically not be approximated via the gradient descent flow. We therefore conclude that the method introduced in [Chen-Munoz Ewald 2023] is disjoint from the gradient descent method.