Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
184 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Generative Sliced MMD Flows with Riesz Kernels (2305.11463v4)

Published 19 May 2023 in cs.LG, math.PR, and stat.ML

Abstract: Maximum mean discrepancy (MMD) flows suffer from high computational costs in large scale computations. In this paper, we show that MMD flows with Riesz kernels $K(x,y) = - |x-y|r$, $r \in (0,2)$ have exceptional properties which allow their efficient computation. We prove that the MMD of Riesz kernels, which is also known as energy distance, coincides with the MMD of their sliced version. As a consequence, the computation of gradients of MMDs can be performed in the one-dimensional setting. Here, for $r=1$, a simple sorting algorithm can be applied to reduce the complexity from $O(MN+N2)$ to $O((M+N)\log(M+N))$ for two measures with $M$ and $N$ support points. As another interesting follow-up result, the MMD of compactly supported measures can be estimated from above and below by the Wasserstein-1 distance. For the implementations we approximate the gradient of the sliced MMD by using only a finite number $P$ of slices. We show that the resulting error has complexity $O(\sqrt{d/P})$, where $d$ is the data dimension. These results enable us to train generative models by approximating MMD gradient flows by neural networks even for image applications. We demonstrate the efficiency of our model by image generation on MNIST, FashionMNIST and CIFAR10.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (83)
  1. Neural Wasserstein gradient flows for maximum mean discrepancies with Riesz kernels. In A. Krause, E. Brunskill, K. Cho, B. Engelhardt, S. Sabato, and J. Scarlett, editors, Proceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Research, pages 664–690. PMLR, 2023.
  2. Optimizing functionals on the space of probabilities with input convex neural networks. Transactions on Machine Learning Research, 2022.
  3. Refining deep generative models via discriminator gradient flow. In International Conference on Learning Representations, 2021.
  4. Maximum mean discrepancy gradient flow. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019.
  5. Annealed flow transport Monte Carlo. In International Conference on Machine Learning. PMLR, 2021.
  6. Analyzing inverse problems with invertible neural networks. In International Conference on Learning Representations, 2019.
  7. Wasserstein generative adversarial networks. In International Conference on Machine Learning, 2017.
  8. K. Atkinson and W. Han. Spherical harmonics and approximations on the unit sphere: an introduction, volume 2044. Springer Science & Business Media, 2012.
  9. Matching normalizing flows and probability paths on manifolds. In K. Chaudhuri, S. Jegelka, L. Song, C. Szepesvari, G. Niu, and S. Sabato, editors, Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, pages 1749–1763. PMLR, 2022.
  10. Demystifying MMD GANs. In International Conference on Learning Representations, 2018.
  11. Equivalence of gradient flows and entropy solutions for singular nonlocal interaction equations in 1D. ESAIM: Control, Optimisation and Calculus of Variations, 21(2):414–441, 2015.
  12. Spherical sliced-Wasserstein. arXiv preprint arXiv:2206.08780, 2022.
  13. Efficient gradient flows in sliced-Wasserstein space. Transactions on Machine Learning Research, 2022.
  14. N. Bonnotte. Unidimensional and evolution methods for optimal transportation. PhD Thesis, Univ. Paris-Sud, 2013.
  15. Proximal optimal transport modeling of population dynamics. In G. Camps-Valls, F. J. R. Ruiz, and I. Valera, editors, Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, volume 151 of Proceedings of Machine Learning Research, pages 6511–6528. PMLR, 2022.
  16. Measure solutions to a system of continuity equations driven by newtonian nonlocal interactions. Discrete and Continuous Dynamical Systems, 40(2):1191–1231, 2020.
  17. B. Dai and U. Seljak. Sliced iterative normalizing flows. In M. Meila and T. Zhang, editors, Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pages 2352–2364. PMLR, 2021.
  18. B. Dai and D. Wipf. Diagnosing and enhancing VAE models. In International Conference on Learning Representations, 2019.
  19. Neural variational gradient descent. In Fourth Symposium on Advances in Approximate Bayesian Inference, 2022.
  20. Particle-based variational inference with preconditioned functional gradient flow. In The Eleventh International Conference on Learning Representations, 2023.
  21. Nonparametric generative modeling with conditional sliced-Wasserstein flows. In A. Krause, E. Brunskill, K. Cho, B. Engelhardt, S. Sabato, and J. Scarlett, editors, Proceedings of the 40th International Conference on Machine Learning, volume 202, pages 8565–8584. PMLR, 2023.
  22. Training generative neural networks via maximum mean discrepancy optimization. In Proceedings of the Thirty-First Conference on Uncertainty in Artificial Intelligence, pages 258–267, 2015.
  23. Variational Wasserstein gradient flow. In K. Chaudhuri, S. Jegelka, L. Song, C. Szepesvari, G. Niu, and S. Sabato, editors, Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, pages 6185–6215. PMLR, 2022.
  24. Deep generative learning via variational gradient flow. In K. Chaudhuri and R. Salakhutdinov, editors, Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pages 2093–2101. PMLR, 2019.
  25. From optimization to sampling through gradient flows. arXiv preprint arXiv:2302.11449, 2023.
  26. KALE Flow: A relaxed KL gradient flow for probabilities with disjoint support. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P. Liang, and J. W. Vaughan, editors, Advances in Neural Information Processing Systems, volume 34, pages 8018–8031. Curran Associates, Inc., 2021.
  27. Generative adversarial nets. In Z. Ghahramani, M. Welling, C. Cortes, N. Lawrence, and K. Weinberger, editors, Advances in Neural Information Processing Systems, volume 27. Curran Associates, Inc., 2014.
  28. Quadrature errors, discrepancies, and their relations to halftoning on the torus and the sphere. SIAM Journal on Scientific Computing, 34(5):A2760–A2791, 2012.
  29. Learning the stein discrepancy for training and evaluating energy-based models without sampling. In International Conference on Machine Learning, 2020.
  30. A kernel two-sample test. The Journal of Machine Learning Research, 13(1):723–773, 2012.
  31. Generalized normalizing flows via Markov chains. Cambridge University Press, 2023.
  32. Deep generative Wasserstein gradient flows. Preprint on OpenReview: https://openreview.net/forum?id=zjSeBTEdXp1, 2023.
  33. Wasserstein gradient flows of the discrepancy with distance kernel on the line. In Scale Space and Variational Methods in Computer Vision, pages 431–443. Springer, 2023.
  34. Wasserstein steepest descent flows of discrepancies with Riesz kernels. arXiv preprint arXiv:2211.01804, 2023.
  35. GANs trained by a two time-scale update rule converge to a local nash equilibrium. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017.
  36. Denoising diffusion probabilistic models. In H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 6840–6851. Curran Associates, Inc., 2020.
  37. A variational perspective on diffusion-based generative models and score matching. In A. Beygelzimer, Y. Dauphin, P. Liang, and J. W. Vaughan, editors, Advances in Neural Information Processing Systems, 2021.
  38. The variational formulation of the Fokker–Planck equation. SIAM Journal on Mathematical Analysis, 29(1):1–17, 1998.
  39. D. Kershaw. Some extensions of W. Gautschi’s inequalities for the Gamma function. Mathematics of Computation, 41(164):607–611, 1983.
  40. D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. In International Conference on Learning Representations, 2015.
  41. D. P. Kingma and M. Welling. Auto-encoding variational bayes. In International Conference on Learning Representations, 2014.
  42. Cramer-wold auto-encoder. Journal of Machine Learning Research, 21(164):1–28, 2020.
  43. J. M. Kohler and A. Lucchi. Sub-sampled cubic regularization for non-convex optimization. In International Conference on Machine Learning, pages 1895–1904. PMLR, 2017.
  44. Generalized sliced probability metrics. In IEEE International Conference on Acoustics, Speech and Signal Processing, pages 4513–4517, 2022.
  45. A. Krizhevsky. Learning multiple layers of features from tiny images. Master’s thesis, University Toronto, ON, Canada, 2009.
  46. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.
  47. A. Lhéritier and N. Bondoux. A Cramer distance perspective on quantile regression based distributional reinforcement learning. arXiv preprint arXiv:2110.00535, 2021.
  48. MMD GAN: Towards deeper understanding of moment matching network. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017.
  49. Generative moment matching networks. In International conference on machine learning, pages 1718–1727. PMLR, 2015.
  50. Q. Liu. Stein variational gradient descent as gradient flow. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017.
  51. Q. Liu and D. Wang. Stein variational gradient descent: A general purpose Bayesian inference algorithm. In D. Lee, M. Sugiyama, U. Luxburg, I. Guyon, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 29. Curran Associates, Inc., 2016.
  52. Sliced-Wasserstein flows: Nonparametric generative modeling via optimal transport and diffusions. In K. Chaudhuri and R. Salakhutdinov, editors, Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pages 4104–4113. PMLR, 2019.
  53. Are GANs created equal? A large-scale study. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 31. Curran Associates, Inc., 2018.
  54. T. Modeste and C. Dombry. Characterization of translation invariant MMD on ℝdsuperscriptℝ𝑑\mathbb{R}^{d}blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT and connections with Wasserstein distances. HAL Preprint, hal-03855093, 2023.
  55. Large-scale Wasserstein gradient flows. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P. Liang, and J. W. Vaughan, editors, Advances in Neural Information Processing Systems, volume 34, pages 15243–15256, 2021.
  56. Sobolev GAN. In International Conference on Learning Representations, 2018.
  57. S. Neumayer and G. Steidl. From optimal transport to discrepancy. Handbook of Mathematical Models and Algorithms in Computer Vision and Imaging: Mathematical Imaging and Vision, pages 1–36, 2021.
  58. K. Nguyen and N. Ho. Revisiting sliced Wasserstein on images: From vectorization to convolution. In A. H. Oh, A. Agarwal, D. Belgrave, and K. Cho, editors, Advances in Neural Information Processing Systems, 2022.
  59. Distributional sliced-Wasserstein and applications to generative modeling. In International Conference on Learning Representations, 2021.
  60. Hierarchical sliced Wasserstein distance. In The Eleventh International Conference on Learning Representations, 2023.
  61. E. Novak and H. Wozniakowski. Tractability of Multivariate Problems. Volume II, volume 12 of EMS Tracts in Mathematics. EMS Publishing House, Zürich, 2010.
  62. G. Peyré. Entropic approximation of Wasserstein gradient flows. SIAM Journal on Imaging Sciences, 8(4):2323–2351, 2015.
  63. Numerical Fourier Analysis. Springer, 2018.
  64. Sliced optimal transport on the sphere. arXiv preprint arXiv:2304.09092, 2023.
  65. Wasserstein barycenter and its application to texture mixing. In Scale Space and Variational Methods in Computer Vision, pages 435–446. Springer, 2012.
  66. U-Net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention, pages 234–241. Springer, 2015.
  67. Equivalence of distance-based and RKHS-based statistics in hypothesis testing. The Annals of Statistics, 41(5):2263 – 2291, 2013.
  68. Y. Song and S. Ermon. Generative modeling by estimating gradients of the data distribution. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019.
  69. Y. Song and S. Ermon. Improved techniques for training score-based generative models. In H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 12438–12448. Curran Associates, Inc., 2020.
  70. Score-based generative modeling through stochastic differential equations. In International Conference on Learning Representations, 2021.
  71. Universality, characteristic kernels and RKHS embedding of measures. Journal of Machine Learning Research, 12(7), 2011.
  72. Hilbert space embeddings and metrics on probability measures. The Journal of Machine Learning Research, 11:1517–1561, 2010.
  73. Hilbert space embeddings and metrics on probability measures, 2010.
  74. I. Steinwart and A. Christmann. Support vector machines. Springer Science & Business Media, 2008.
  75. G. Székely. E-statistics: The energy of statistical samples. Techical Report, Bowling Green University, 2002.
  76. Energy statistics: A class of statistics based on distances. Journal of Statistical Planning and Inference, 143(8):1249–1272, 2013.
  77. Dithering by differences of convex functions. SIAM Journal on Imaging Sciences, 4(1):79–108, 2011.
  78. T. Vayer and R. Gribonval. Controlling wasserstein distances by kernel norms with application to compressive statistical learning. Journal of Machine Learning Research, 24:149–1, 2023.
  79. R. Vershynin. High-dimensional probability: An introduction with applications in data science, volume 47. Cambridge university press, 2018.
  80. M. Welling and Y. W. Teh. Bayesian learning via stochastic gradient Langevin dynamics. In Proceedings of the 28th International Conference on Machine Learning, pages 681–688, 2011.
  81. Stochastic normalizing flows. In H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 5933–5944. Curran Associates, Inc., 2020.
  82. Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747, 2017.
  83. Generative latent flow. arXiv preprint arXiv:1905.10485, 2019.
Citations (16)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com