Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
162 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Convergence of flow-based generative models via proximal gradient descent in Wasserstein space (2310.17582v3)

Published 26 Oct 2023 in stat.ML, cs.LG, math.OC, math.ST, and stat.TH

Abstract: Flow-based generative models enjoy certain advantages in computing the data generation and the likelihood, and have recently shown competitive empirical performance. Compared to the accumulating theoretical studies on related score-based diffusion models, analysis of flow-based models, which are deterministic in both forward (data-to-noise) and reverse (noise-to-data) directions, remain sparse. In this paper, we provide a theoretical guarantee of generating data distribution by a progressive flow model, the so-called JKO flow model, which implements the Jordan-Kinderleherer-Otto (JKO) scheme in a normalizing flow network. Leveraging the exponential convergence of the proximal gradient descent (GD) in Wasserstein space, we prove the Kullback-Leibler (KL) guarantee of data generation by a JKO flow model to be $O(\varepsilon2)$ when using $N \lesssim \log (1/\varepsilon)$ many JKO steps ($N$ Residual Blocks in the flow) where $\varepsilon $ is the error in the per-step first-order condition. The assumption on data density is merely a finite second moment, and the theory extends to data distributions without density and when there are inversion errors in the reverse process where we obtain KL-$W_2$ mixed error guarantees. The non-asymptotic convergence rate of the JKO-type $W_2$-proximal GD is proved for a general class of convex objective functionals that includes the KL divergence as a special case, which can be of independent interest. The analysis framework can extend to other first-order Wasserstein optimization schemes applied to flow-based generative models.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (75)
  1. Stochastic interpolants: A unifying framework for flows and diffusions. arXiv preprint arXiv:2303.08797, 2023.
  2. Building normalizing flows with stochastic interpolants. In The Eleventh International Conference on Learning Representations, 2023.
  3. Optimizing functionals on the space of probabilities with input convex neural networks. arXiv preprint arXiv:2106.00774, 2021.
  4. Shun-ichi Amari. Information geometry and its applications: Convex function and dually flat manifold. In LIX Fall Colloquium on Emerging Trends in Visual Computing, pages 75–102. Springer, 2008.
  5. Shun-ichi Amari. Information geometry and its applications, volume 194. Springer, 2016.
  6. Gradient flows: in metric spaces and in the space of probability measures. Springer Science & Business Media, 2005.
  7. Input convex neural networks. In International Conference on Machine Learning, pages 146–155. PMLR, 2017.
  8. Invertible residual networks. In International Conference on Machine Learning, pages 573–582. PMLR, 2019.
  9. Linear convergence bounds for diffusion models via stochastic localization. arXiv preprint arXiv:2308.03686, 2023.
  10. Error bounds for flow matching methods. arXiv preprint arXiv:2305.16860, 2023.
  11. Espen Bernton. Langevin monte carlo and JKO splitting. In Proceedings of the 31st Conference On Learning Theory, volume 75 of Proceedings of Machine Learning Research, pages 1777–1798. PMLR, 2018.
  12. Convergence to equilibrium in wasserstein distance for fokker–planck equations. Journal of Functional Analysis, 263(8):2430–2457, 2012.
  13. Improved analysis of score-based generative modeling: User-friendly bounds under minimal smoothness assumptions. In International Conference on Machine Learning, pages 4735–4763. PMLR, 2023.
  14. Neural ordinary differential equations. Advances in neural information processing systems, 31, 2018.
  15. The probability flow ode is provably fast. arXiv preprint arXiv:2305.11798, 2023.
  16. Sampling is as easy as learning the score: theory for diffusion models with minimal data assumptions. arXiv preprint arXiv:2209.11215, 2022.
  17. Restoration-degradation beyond linear diffusions: A non-asymptotic analysis for ddim-type samplers. In International Conference on Machine Learning, pages 4462–4484. PMLR, 2023.
  18. Classification logit two-sample testing by neural networks for differentiating near manifold densities. IEEE Transactions on Information Theory, 68(10):6631–6662, 2022.
  19. Valentin De Bortoli. Convergence of denoising diffusion models under the manifold hypothesis. arXiv preprint arXiv:2208.05314, 2022.
  20. Pierre Degond and S Mas-Gallic. The weighted particle method for convection-diffusion equations. i. the case of an isotropic viscosity. Mathematics of computation, 53(188):485–507, 1989.
  21. A deterministic approximation of diffusion equations using particles. SIAM Journal on Scientific and Statistical Computing, 11(2):293–310, 1990.
  22. Forward-backward gaussian variational inference via JKO in the Bures-Wasserstein space. In International Conference on Machine Learning, pages 7960–7991. PMLR, 2023.
  23. NICE: Non-linear independent components estimation. arXiv preprint arXiv:1410.8516, 2014.
  24. Density estimation using Real NVP. In International Conference on Learning Representations, 2017.
  25. Variational wasserstein gradient flow. In International Conference on Machine Learning, pages 6185–6215. PMLR, 2022.
  26. How to train your neural ODE: the world of jacobian and kinetic regularization. In International conference on machine learning, pages 3154–3164. PMLR, 2020.
  27. Generative adversarial nets. In NIPS, 2014.
  28. Ffjord: Free-form continuous dynamics for scalable reversible generative models. In International Conference on Learning Representations, 2018.
  29. Improved training of wasserstein gans. In NIPS, 2017.
  30. On pairs of f𝑓fitalic_f-divergences and their joint range. IEEE Transactions on Information Theory, 57(6):3230–3235, 2011.
  31. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
  32. Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, 33:6840–6851, 2020.
  33. Bridging mean-field games and normalizing flows with trajectory regularization. Journal of Computational Physics, page 112155, 2023.
  34. An error analysis of generative adversarial networks for learning distributions. The Journal of Machine Learning Research, 23(1):5047–5089, 2022.
  35. Estimation of non-normalized statistical models by score matching. Journal of Machine Learning Research, 6(4), 2005.
  36. Image-to-image translation with conditional adversarial networks. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 5967–5976, 2017.
  37. A framework of composite functional gradient methods for generative adversarial models. IEEE transactions on pattern analysis and machine intelligence, 43(1):17–32, 2019.
  38. The variational formulation of the fokker–planck equation. SIAM journal on mathematical analysis, 29(1):1–17, 1998.
  39. Modified frank wolfe in probability space. Advances in Neural Information Processing Systems, 34:14448–14462, 2021.
  40. Auto-encoding variational Bayes. arXiv preprint arXiv:1312.6114, 2013.
  41. An introduction to Variational Autoencoders. Foundations and Trends® in Machine Learning, 12(4):307–392, 2019.
  42. Glow: Generative flow with invertible 1x1 convolutions. Advances in neural information processing systems, 31, 2018.
  43. Normalizing flows: An introduction and review of current methods. IEEE transactions on pattern analysis and machine intelligence, 43(11):3964–3979, 2020.
  44. Variational inference via Wasserstein gradient flows. Advances in Neural Information Processing Systems, 35:14434–14447, 2022.
  45. On the ability of neural nets to express distributions. In Conference on Learning Theory, pages 1271–1296. PMLR, 2017.
  46. Convergence for score-based generative modeling with polynomial complexity. Advances in Neural Information Processing Systems, 35:22870–22882, 2022.
  47. Convergence of score-based generative modeling for general data distributions. In International Conference on Algorithmic Learning Theory, pages 946–985. PMLR, 2023.
  48. Towards faster non-asymptotic convergence for diffusion-based generative models. arXiv preprint arXiv:2306.09251, 2023.
  49. Flow matching for generative modeling. In The Eleventh International Conference on Learning Representations, 2023.
  50. Qiang Liu. Rectified flow: A marginal preserving approach to optimal transport. arXiv preprint arXiv:2209.14577, 2022.
  51. A universal approximation theorem of deep neural networks for expressing probability distributions. Advances in neural information processing systems, 33:3094–3105, 2020.
  52. Distribution learning via neural differential equations: a nonparametric statistical perspective. arXiv preprint arXiv:2309.01043, 2023.
  53. Large-scale wasserstein gradient flows. Advances in Neural Information Processing Systems, 34:15243–15256, 2021.
  54. Robust stochastic approximation approach to stochastic programming. SIAM Journal on optimization, 19(4):1574–1609, 2009.
  55. A. Nemirovsky and D Yudin. Problem complexity and method efficiency in optimization. 1983.
  56. f-gan: Training generative neural samplers using variational divergence minimization. Advances in neural information processing systems, 29, 2016.
  57. OT-flow: Fast and accurate continuous normalizing flows via optimal transport. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, 2021.
  58. Improved convergence of score-based diffusion models via prediction-correction. arXiv preprint arXiv:2305.14164, 2023.
  59. High-dimensional distribution generation through deep neural networks. Partial Differential Equations and Applications, 2(5):64, 2021.
  60. Maxim Raginsky. Strong data processing inequalities and ΦΦ\Phiroman_Φ-sobolev inequalities for discrete channels. IEEE Transactions on Information Theory, 62(6):3355–3389, 2016.
  61. A machine learning framework for solving high-dimensional mean field game and mean field control problems. Proceedings of the National Academy of Sciences, 117(17):9183–9193, 2020.
  62. The wasserstein proximal gradient algorithm. Advances in Neural Information Processing Systems, 33:12356–12366, 2020.
  63. Thomas C. Sideris. Ordinary differential equations and dynamical systems. 2013.
  64. Maximum likelihood training of score-based diffusion models. Advances in Neural Information Processing Systems, 34:1415–1428, 2021.
  65. Generative modeling by estimating gradients of the data distribution. Advances in Neural Information Processing Systems, 32, 2019.
  66. Score-based generative modeling through stochastic differential equations. In International Conference on Learning Representations, 2021.
  67. Neural estimation of statistical divergences. Journal of machine learning research, 23(126), 2022.
  68. Theoretical guarantees for sampling and inference in generative models with latent diffusions. In Conference on Learning Theory, pages 3084–3114. PMLR, 2019.
  69. Taming hyperparameter tuning in continuous normalizing flows using the jko scheme. Scientific Reports, 13(1):4501, 2023.
  70. Pascal Vincent. A connection between score matching and denoising autoencoders. Neural computation, 23(7):1661–1674, 2011.
  71. Invertible neural networks for graph prediction. IEEE Journal on Selected Areas in Information Theory, 3(3):454–467, 2022.
  72. Computing high-dimensional optimal transport by flow neural networks. arXiv preprint arXiv:2305.11857, 2023.
  73. Normalizing flow neural networks by JKO scheme. Conference on Neural Information Processing Systems (NeurIPS), 2023.
  74. On the capacity of deep generative networks for approximating distributions. Neural networks, 145:144–154, 2022.
  75. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE international conference on computer vision, pages 2223–2232, 2017.
Citations (11)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com