Non-approximability of constructive global $\mathcal{L}^2$ minimizers by gradient descent in Deep Learning (2311.07065v1)

Published 13 Nov 2023 in cs.LG, cs.AI, math-ph, math.MP, math.OC, and stat.ML

Abstract: We analyze geometric aspects of the gradient descent algorithm in Deep Learning (DL) networks. In particular, we prove that the globally minimizing weights and biases for the $\mathcal{L}^2$ cost obtained constructively in [Chen-Munoz Ewald 2023] for underparametrized ReLU DL networks can generically not be approximated via the gradient descent flow. We therefore conclude that the method introduced in [Chen-Munoz Ewald 2023] is disjoint from the gradient descent method.

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Non-approximability of constructive global $\mathcal{L}^2$ minimizers by gradient descent in Deep Learning (2311.07065v1)

Summary

Related Papers