Optimization landscape of two-layer neural networks (population risk)
Characterize the optimization landscape of the population risk R_N(θ) for two-layer neural networks with prediction ŷ(x;θ) = (1/N) ∑_{i=1}^N σ_*(x; θ_i) and square loss ℓ(y,ŷ) = (y − ŷ)^2, including the existence and structure of local minima, saddle points, and global minima, even when an infinite number of training examples are available.
References
Understanding the optimization landscape of two-layers neural networks is largely an open problem even when we have access to an infinite number of examples, i.e. to the population risk R_{N}(\theta).
— A Mean Field View of the Landscape of Two-Layers Neural Networks
(1804.06561 - Mei et al., 2018) in Introduction