Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Geometric structure of shallow neural networks and constructive ${\mathcal L}^2$ cost minimization (2309.10370v2)

Published 19 Sep 2023 in cs.LG, cs.AI, math-ph, math.MP, math.OC, and stat.ML

Abstract: In this paper, we approach the problem of cost (loss) minimization in underparametrized shallow neural networks through the explicit construction of upper bounds, without any use of gradient descent. A key focus is on elucidating the geometric structure of approximate and precise minimizers. We consider shallow neural networks with one hidden layer, a ReLU activation function, an ${\mathcal L}2$ Schatten class (or Hilbert-Schmidt) cost function, input space ${\mathbb R}M$, output space ${\mathbb R}Q$ with $Q\leq M$, and training input sample size $N>QM$ that can be arbitrarily large. We prove an upper bound on the minimum of the cost function of order $O(\delta_P)$ where $\delta_P$ measures the signal to noise ratio of training inputs. In the special case $M=Q$, we explicitly determine an exact degenerate local minimum of the cost function, and show that the sharp value differs from the upper bound obtained for $Q\leq M$ by a relative error $O(\delta_P2)$. The proof of the upper bound yields a constructively trained network; we show that it metrizes a particular $Q$-dimensional subspace in the input space ${\mathbb R}M$. We comment on the characterization of the global minimum of the cost function in the given context.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Thomas Chen (43 papers)
  2. Patricia Muñoz Ewald (22 papers)
Citations (2)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets