General convergence of noiseless distributional dynamics
Establish a general convergence result for the continuity-equation distributional dynamics ∂_t ρ_t(θ) = 2 ξ(t) ∇_θ · (ρ_t(θ) ∇_θ Ψ(θ;ρ_t)) that arises as the mean-field limit of noiseless stochastic gradient descent on two-layer neural networks, demonstrating convergence to a fixed point (e.g., a global minimizer) under broad assumptions on the activation σ_*, data distribution P(X,Y), step-size schedule ξ(t), and initialization ρ_0.
References
In the next sections we state our results about convergence of the distributional dynamics to its fixed point. In the case of noisy SGD (and for the diffusion PDE #1{eq:GeneralPDE_Temp}), a general convergence result can be established (although at the cost of an additional regularization). For noiseless SGD (and the continuity equation #1{eq:GeneralPDE_Temp}), we do not have such general result.