Improved Particle Approximation Error for Mean Field Neural Networks (2405.15767v3)
Abstract: Mean-field Langevin dynamics (MFLD) minimizes an entropy-regularized nonlinear convex functional defined over the space of probability distributions. MFLD has gained attention due to its connection with noisy gradient descent for mean-field two-layer neural networks. Unlike standard Langevin dynamics, the nonlinearity of the objective functional induces particle interactions, necessitating multiple particles to approximate the dynamics in a finite-particle setting. Recent works (Chen et al., 2022; Suzuki et al., 2023b) have demonstrated the uniform-in-time propagation of chaos for MFLD, showing that the gap between the particle system and its mean-field limit uniformly shrinks over time as the number of particles increases. In this work, we improve the dependence on logarithmic Sobolev inequality (LSI) constants in their particle approximation errors, which can exponentially deteriorate with the regularization coefficient. Specifically, we establish an LSI-constant-free particle approximation error concerning the objective gap by leveraging the problem structure in risk minimization. As the application, we demonstrate improved convergence of MFLD, sampling guarantee for the mean-field stationary distribution, and uniform-in-time Wasserstein propagation of chaos in terms of particle complexity.
- Analysis and geometry of Markov diffusion operators, volume 348.
- Functional inequalities for gaussian convolutions of compactly supported measures: Explicit bounds and dimension dependence. Bernoulli, 24(1):333 – 353.
- Functional inequalities for perturbed measures with applications to log-concave measures and to some bayesian problems. Bernoulli, 28(4):2294 – 2321.
- Uniform-in-time propagation of chaos for mean field langevin dynamics. arXiv preprint arXiv:2212.03050.
- Entropic fictitious play for mean field optimization problem. Journal of Machine Learning Research, 24(211):1–36.
- Chizat, L. (2022). Mean-field langevin dynamics: Exponential convergence and annealing. Transactions on Machine Learning Research.
- On the global convergence of gradient descent for over-parameterized models using optimal transport. In Advances in Neural Information Processing Systems 31, pages 3040–3050.
- Logarithmic sobolev inequalities and stochastic ising models. Journal of statistical physics, 46(5-6):1159–1194.
- Mean-field langevin dynamics and energy landscape of neural networks. arXiv preprint arXiv:1905.07769.
- Distribution dependent stochastic differential equations. Frontiers of Mathematics in China, 16:257–301.
- Sampling from the mean-field stationary distribution. arXiv preprint arXiv:2402.07355.
- McKean Jr, H. P. (1966). A class of markov processes associated with nonlinear parabolic equations. Proceedings of the National Academy of Sciences, 56(6):1907–1911.
- A mean field view of the landscape of two-layer neural networks. Proceedings of the National Academy of Sciences, 115(33):E7665–E7671.
- Stochastic particle gradient descent for infinite ensembles. arXiv preprint arXiv:1712.05438.
- Particle dual averaging: Optimization of mean field neural networks with global convergence rate analysis. In Advances in Neural Information Processing Systems 34, pages 19608–19621.
- Convex analysis of the mean field langevin dynamics. In Proceedings of International Conference on Artificial Intelligence and Statistics 25, pages 9741–9757.
- Particle stochastic dual coordinate ascent: Exponential convergent algorithm for mean field neural network optimization. In Proceedings of the 10th International Conference on Learning Representations.
- Generalization of an inequality by talagrand and links with the logarithmic sobolev inequality. Journal of Functional Analysis, 173(2):361–400.
- Trainability and accuracy of artificial neural networks: An interacting particle system approach. Communications on Pure and Applied Mathematics, 75(9):1889–1935.
- Uniform-in-time propagation of chaos for the mean field gradient langevin dynamics. In Proceedings of the 11th International Conference on Learning Representations.
- Atsushi Nitanda (30 papers)