Optimization landscape of two-layer neural networks (population risk)

Characterize the optimization landscape of the population risk R_N(θ) for two-layer neural networks with prediction ŷ(x;θ) = (1/N) ∑_{i=1}^N σ_*(x; θ_i) and square loss ℓ(y,ŷ) = (y − ŷ)^2, including the existence and structure of local minima, saddle points, and global minima, even when an infinite number of training examples are available.

Background

Two-layer neural networks induce a high-dimensional non-convex risk landscape, and while specific models have been analyzed, a general understanding of this landscape remains elusive. Even in the infinite-sample (population risk) setting, where stochastic fluctuations due to finite datasets are absent, the structure of local and global minima of R_N(θ) is not comprehensively understood.

This paper introduces a mean-field (distributional dynamics) perspective via a PDE to analyze training dynamics and generalization, but the full characterization of the population risk landscape for two-layer networks is identified as an outstanding problem.

References

Understanding the optimization landscape of two-layers neural networks is largely an open problem even when we have access to an infinite number of examples, i.e. to the population risk R_{N}(\theta).

— A Mean Field View of the Landscape of Two-Layers Neural Networks (1804.06561 - Mei et al., 2018) in Introduction

Optimization landscape of two-layer neural networks (population risk)

Sponsor

Background

References

Related Problems