Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
156 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Neural Wasserstein Gradient Flows for Maximum Mean Discrepancies with Riesz Kernels (2301.11624v3)

Published 27 Jan 2023 in cs.LG, math.OC, and math.PR

Abstract: Wasserstein gradient flows of maximum mean discrepancy (MMD) functionals with non-smooth Riesz kernels show a rich structure as singular measures can become absolutely continuous ones and conversely. In this paper we contribute to the understanding of such flows. We propose to approximate the backward scheme of Jordan, Kinderlehrer and Otto for computing such Wasserstein gradient flows as well as a forward scheme for so-called Wasserstein steepest descent flows by neural networks (NNs). Since we cannot restrict ourselves to absolutely continuous measures, we have to deal with transport plans and velocity plans instead of usual transport maps and velocity fields. Indeed, we approximate the disintegration of both plans by generative NNs which are learned with respect to appropriate loss functions. In order to evaluate the quality of both neural schemes, we benchmark them on the interaction energy. Here we provide analytic formulas for Wasserstein schemes starting at a Dirac measure and show their convergence as the time step size tends to zero. Finally, we illustrate our neural MMD flows by numerical examples.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (53)
  1. Optimizing functionals on the space of probabilities with input convex neural networks. Transactions on Machine Learning Research, 2022.
  2. Gradient Flows. Lectures in Mathematics ETH Zürich. Birkhäuser, Basel, 2005. ISBN 978-3-7643-2428-5.
  3. Input convex neural networks. In Precup, D. and Teh, Y. W. (eds.), Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pp.  146–155. PMLR, 2017.
  4. Refining deep generative models via discriminator gradient flow. In International Conference on Learning Representations, 2021.
  5. Maximum mean discrepancy gradient flow. In Wallach, H., Larochelle, H., Beygelzimer, A., d Alché-Buc, F., Fox, E., and Garnett, R. (eds.), Advances in Neural Information Processing Systems, volume 32, pp.  1–11, New York, USA, 2019.
  6. Kernelized wasserstein natural gradient. In International Conference on Learning Representations, 2020.
  7. Dimensionality of local minimizers of the interaction energy. Archive for Rational Mechanics and Analysis, 209:1055–1088, 2013.
  8. Efficient gradient flows in sliced-Wasserstein space. Transactions on Machine Learning Research, 2022.
  9. Flows for simultaneous manifold learning and density estimation. In Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., and Lin, H. (eds.), Advances in Neural Information Processing Systems, volume 33, pp.  442–453. Curran Associates, Inc., 2020.
  10. Brenier, Y. Décomposition polaire et réarrangement monotone des champs de vecteurs. Comptes Rendus de l’Académie des Sciences Paris Series I Mathematics, 305(19):805–808, 1987.
  11. Proximal optimal transport modeling of population dynamics. In Camps-Valls, G., Ruiz, F. J. R., and Valera, I. (eds.), Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, volume 151 of Proceedings of Machine Learning Research, pp.  6511–6528. PMLR, 2022.
  12. Explicit equilibrium solutions for the aggregation equation with power-law potentials. Kinetic and Related Models, 10(1):171–192, 2017.
  13. Primal dual methods for Wasserstein gradient flows. Foundations of Computational Mathematics, 22(2):389–443, 2022.
  14. Threshold condensation to singular support for a Riesz equilibrium problem. Analysis and Mathematical Physics, 13(19), 2023.
  15. Optimal transport natural gradient for statistical manifolds with continuous sample space. Information Geometry, 3(1):1–32, 2020.
  16. Estimating barycenters of measures in high dimensions. arXiv preprint arXiv:2007.07105, 2021.
  17. Neural variational gradient descent. In Fourth Symposium on Advances in Approximate Bayesian Inference, 2022.
  18. Particle-based variational inference with preconditioned functional gradient flow. In The Eleventh International Conference on Learning Representations, 2023.
  19. Curve based approximation of measures on manifolds by discrepancy minimization. Foundations of Computational Mathematics, 21(6):1595–1642, 2021.
  20. Variational Wasserstein gradient flow. In Chaudhuri, K., Jegelka, S., Song, L., Szepesvari, C., Niu, G., and Sabato, S. (eds.), Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, pp.  6185–6215. PMLR, 2022.
  21. Deep generative learning via variational gradient flow. In Chaudhuri, K. and Salakhutdinov, R. (eds.), Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pp.  2093–2101. PMLR, 2019.
  22. Gigli, N. On the geometry of the space of probability measures endowed with the quadratic optimal transport distance. PhD thesis, Scuola Normale Superiore di Pisa, 2004.
  23. Giorgi, E. D. New problems on minimizing movements. In Ciarlet, P. and Lions, J.-L. (eds.), Boundary Value Problems for Partial Differential Equations and Applications, pp.  81–98. Masson, 1993.
  24. Kale flow: A relaxed kl gradient flow for probabilities with disjoint support. In Ranzato, M., Beygelzimer, A., Dauphin, Y., Liang, P., and Vaughan, J. W. (eds.), Advances in Neural Information Processing Systems, volume 34, pp.  8018–8031. Curran Associates, Inc., 2021.
  25. Quadrature errors, discrepancies and their relations to halftoning on the torus and the sphere. SIAM Journal on Scientific Computing, 34(5):2760–2791, 2012.
  26. Learning the stein discrepancy for training and evaluating energy-based models without sampling. In International Conference on Machine Learning, 2020.
  27. Computation of power law equilibrium measures on balls of arbitrary dimension. Constructive Approximation, 2022.
  28. Stochastic normalizing flows for inverse problems: A Markov chain viewpoint. SIAM Journal on Uncertainty Quantification, 10:1162–1190, 2022.
  29. Generalized normalizing flows via Markov chains. Series: Elements in Non-local Data Interactions: Foundations and Applications. Cambridge University Press, 2023.
  30. Wasserstein steepest descent flows of discrepancies with Riesz kernels. arXiv preprint arXiv:2211.01804, 2022.
  31. Wasserstein gradient flows of the discrepancy with distance kernel on the line. In Scale Space and Variational Methods in Computer Vision, pp.  431–443. Springer, 2023a.
  32. Generative sliced MMD flows with Riesz kernels. arXiv preprint arXiv:2305.11463, 2023b.
  33. The deep minimizing movement scheme. arXiv preprint arXiv:2109.14851, 2021.
  34. The variational formulation of the Fokker–Planck equation. SIAM Journal on Mathematical Analysis, 29(1):1–17, 1998.
  35. Adam: A method for stochastic optimization. In International Conference on Learning Representations, 2015.
  36. Neural optimal transport. In The Eleventh International Conference on Learning Representations, 2023.
  37. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.
  38. Wasserstein proximal of GANs. In Nielsen, F. and Barbaresco, F. (eds.), Geometric Science of Information, pp.  524–533, Cham, 2021. Springer International Publishing.
  39. Large-scale optimal transport via adversarial training with cycle-consistency. arXiv preprint arXiv:2003.06635, 2020.
  40. Mattila, P. Geometry of sets and measures in Euclidean spaces: fractals and rectifiability. Cambridge University Press, 1999.
  41. Large-scale wasserstein gradient flows. In Ranzato, M., Beygelzimer, A., Dauphin, Y., Liang, P., and Vaughan, J. W. (eds.), Advances in Neural Information Processing Systems, volume 34, pp.  15243–15256, 2021.
  42. Otto, F. The geometry of dissipative evolution equations: the porous medium equation. Communications in Partial Differential Equations, 26:101–174, 2001.
  43. Eulerian calculus for the contraction in the Wasserstein distance. SIAM Journal on Mathematical Analysis, 37(4):1227–1255, 2005.
  44. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Wallach, H., Larochelle, H., Beygelzimer, A., d’Alché Buc, F., Fox, E., and Garnett, R. (eds.), Advances in Neural Information Processing Systems 32, pp.  8024–8035, 2019.
  45. Pavliotis, G. A. Stochastic processes and applications: Diffusion Processes, the Fokker-Planck and Langevin Equations. Number 60 in Texts in Applied Mathematics. Springer, New York, 2014.
  46. Random variables, monotone relations, and convex analysis. Mathematical Programming, 148:297–331, 2014. doi: 10.1007/s10107-014-0801-1.
  47. Generative modeling by estimating gradients of the data distribution. Advances in Neural Information Processing Systems, 32, 2019.
  48. Score-based generative modeling through stochastic differential equations. In International Conference on Learning Representations, 2021.
  49. Dithering by differences of convex functions. SIAM Journal on Imaging Sciences, 4(1):79–108, 2011.
  50. Villani, C. Topics in Optimal Transportation. Number 58 in Graduate Studies in Mathematics. American Mathematical Society, Providence, 2003.
  51. Bayesian learning via stochastic gradient Langevin dynamics. In Getoor, L. and Scheffer, T. (eds.), Proceedings of the 28th International Conference on International Conference on Machine Learning, pp.  681–688, Madison, 2011.
  52. Wendland, H. Scattered Data Approximation. Cambridge University Press, 2005.
  53. Stochastic normalizing flows. In Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., and Lin, H. (eds.), Advances in Neural Information Processing Systems, volume 33, pp.  5933–5944, 2020.
Citations (13)

Summary

We haven't generated a summary for this paper yet.