General convergence of noiseless distributional dynamics
Establish a general convergence result for the continuity-equation distributional dynamics ∂_t ρ_t(θ) = 2 ξ(t) ∇_θ · (ρ_t(θ) ∇_θ Ψ(θ;ρ_t)) that arises as the mean-field limit of noiseless stochastic gradient descent on two-layer neural networks, demonstrating convergence to a fixed point (e.g., a global minimizer) under broad assumptions on the activation σ_*, data distribution P(X,Y), step-size schedule ξ(t), and initialization ρ_0.
Sponsor
References
In the next sections we state our results about convergence of the distributional dynamics to its fixed point. In the case of noisy SGD (and for the diffusion PDE #1{eq:GeneralPDE_Temp}), a general convergence result can be established (although at the cost of an additional regularization). For noiseless SGD (and the continuity equation #1{eq:GeneralPDE_Temp}), we do not have such general result.