Efficient approximation of optimal neural network parameters by stochastic gradient methods

Determine conditions under which the optimal parameters of a neural network can be efficiently approximated by conventional algorithms such as stochastic gradient descent, despite the non-convexity of the training objective ensured only by universal representation theorems.

Background

The paper notes that universal approximation theorems guarantee the existence of optimal parameters for neural networks but do not address algorithmic tractability. In practice, stochastic gradient descent (SGD) and related methods are used to train non-convex models, yet theoretical guarantees for efficiently reaching optimal parameters remain elusive.

This open question motivates the authors’ mean-field Langevin dynamics framework, which connects optimization landscapes lifted to the space of probability measures with gradient-flow dynamics and invariant measures, aiming to provide convergence guarantees under certain regularity conditions.

References

Furthermore while universal representation theorems ensures the existence of the optimal parameters of the network, it is in general not known when such optimal parameters can be efficiently approximated by conventional algorithms, such as stochastic gradient descent.

— Mean-Field Langevin Dynamics and Energy Landscape of Neural Networks (1905.07769 - Hu et al., 2019) in Section 1 (Introduction), p. 2

Efficient approximation of optimal neural network parameters by stochastic gradient methods

Background

References

Related Problems