Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Deep MMD Gradient Flow without adversarial training (2405.06780v1)

Published 10 May 2024 in cs.LG and cs.AI

Abstract: We propose a gradient flow procedure for generative modeling by transporting particles from an initial source distribution to a target distribution, where the gradient field on the particles is given by a noise-adaptive Wasserstein Gradient of the Maximum Mean Discrepancy (MMD). The noise-adaptive MMD is trained on data distributions corrupted by increasing levels of noise, obtained via a forward diffusion process, as commonly used in denoising diffusion probabilistic models. The result is a generalization of MMD Gradient Flow, which we call Diffusion-MMD-Gradient Flow or DMMD. The divergence training procedure is related to discriminator training in Generative Adversarial Networks (GAN), but does not require adversarial training. We obtain competitive empirical performance in unconditional image generation on CIFAR10, MNIST, CELEB-A (64 x64) and LSUN Church (64 x 64). Furthermore, we demonstrate the validity of the approach when MMD is replaced by a lower bound on the KL divergence.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (69)
  1. Fast inference in denoising diffusion models via mmd finetuning, 2023.
  2. Neural wasserstein gradient flows for maximum mean discrepancies with riesz kernels, 2023.
  3. Gradient Flows in Metric Spaces and in the Space of Probability Measures. Lectures in Mathematics ETH Zürich. Birkhäuser, 2. ed edition, 2008. ISBN 978-3-7643-8722-8 978-3-7643-8721-1. OCLC: 254181287.
  4. Refining deep generative models via discriminator gradient flow, 2021.
  5. On gradient regularizers for mmd gans. Advances in neural information processing systems, 31, 2018.
  6. Maximum mean discrepancy gradient flow, 2019.
  7. Generalized energy based models, 2021.
  8. Wasserstein gan, 2017.
  9. Generalization and equilibrium in generative adversarial nets (gans), 2017.
  10. Demystifying mmd gans, 2021.
  11. Bortoli, V. D. Convergence of denoising diffusion models under the manifold hypothesis, 2023.
  12. Large scale gan training for high fidelity natural image synthesis, 2019.
  13. The union of manifolds hypothesis and its implications for deep generative modelling. arXiv preprint arXiv:2207.02862, 2022.
  14. Your gan is secretly an energy-based model and you should use discriminator driven latent sampling, 2021.
  15. Sampling is as easy as learning the score: theory for diffusion models with minimal data assumptions. arXiv preprint arXiv:2209.11215, 2022.
  16. Variational wasserstein gradient flow, 2022.
  17. Testing the manifold hypothesis. Journal of the American Mathematical Society, 29(4):983–1049, 2016.
  18. A neural tangent kernel perspective of gans, 2022.
  19. Unifying gans and score-based diffusion as generative particle models, 2023.
  20. Learning generative models with sinkhorn divergences. In Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics, volume 84 of Proceedings of Machine Learning Research, pp.  1608–1617. PMLR, 2018.
  21. Kale flow: A relaxed kl gradient flow for probabilities with disjoint support, 2021.
  22. Generative adversarial networks, 2014.
  23. A kernel two-sample test. Journal of Machine Learning Research, 13(25):723–773, 2012. URL http://jmlr.org/papers/v13/gretton12a.html.
  24. Improved training of wasserstein gans, 2017.
  25. Posterior sampling based on gradient flows of the mmd with negative distance kernel, 2023.
  26. Deep generative wasserstein gradient flows, 2023. URL https://openreview.net/forum?id=zjSeBTEdXp1.
  27. Generative sliced mmd flows with riesz kernels, 2023.
  28. Gans trained by a two time-scale update rule converge to a local nash equilibrium, 2018.
  29. Denoising diffusion probabilistic models, 2020.
  30. Imagen video: High definition video generation with diffusion models. arXiv preprint arXiv:2210.02303, 2022.
  31. Hyvärinen, A. Estimation of non-normalized statistical models by score matching. Journal of Machine Learning Research, 6(24):695–709, 2005. URL http://jmlr.org/papers/v6/hyvarinen05a.html.
  32. Neural tangent kernel: Convergence and generalization in neural networks, 2020.
  33. The variational formulation of the fokker–planck equation. SIAM Journal on Mathematical Analysis, 29(1):1–17, 1998. doi: 10.1137/S0036141096303359. URL https://doi.org/10.1137/S0036141096303359.
  34. Training generative adversarial networks with limited data, 2020a.
  35. Analyzing and improving the image quality of stylegan, 2020b.
  36. Adam: A method for stochastic optimization, 2017.
  37. On convergence and stability of gans, 2017.
  38. Learning multiple layers of features from tiny images. 2009.
  39. Voicebox: Text-guided multilingual universal speech generation at scale. arXiv preprint arXiv:2306.15687, 2023.
  40. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998. doi: 10.1109/5.726791.
  41. Mmd gan: Towards deeper understanding of moment matching network, 2017.
  42. On error propagation of diffusion models, 2024.
  43. Generative adversarial networks for image and video synthesis: Algorithms and applications, 2020.
  44. Deep learning face attributes in the wild, 2015.
  45. Which training methods for gans do actually converge? In International conference on machine learning, pp. 3481–3490. PMLR, 2018.
  46. Muller, A. Integral probability metrics and their generating classes of functions. volume 29, pp.  429–443. Advances in Applied Probability, 1997.
  47. f-gan: Training generative neural samplers using variational divergence minimization, 2016.
  48. Pidstrigach, J. Score-based generative models detect manifolds. Advances in Neural Information Processing Systems, 35:35852–35865, 2022.
  49. Dreamfusion: Text-to-3d using 2d diffusion. arXiv preprint arXiv:2209.14988, 2022.
  50. High-resolution image synthesis with latent diffusion models, 2022.
  51. U-net: Convolutional networks for biomedical image segmentation, 2015.
  52. Photorealistic text-to-image diffusion models with deep language understanding. Advances in Neural Information Processing Systems, 35:36479–36494, 2022.
  53. Improved techniques for training gans, 2016.
  54. Santambrogio, F. Optimal transport for applied mathematicians. Birkäuser, NY, 55(58-63):94, 2015.
  55. Adversarial diffusion distillation. arXiv preprint arXiv:2311.17042, 2023.
  56. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. The MIT Press, 06 2018. ISBN 9780262256933. doi: 10.7551/mitpress/4175.001.0001. URL https://doi.org/10.7551/mitpress/4175.001.0001.
  57. Deep unsupervised learning using nonequilibrium thermodynamics, 2015.
  58. Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456, 2020.
  59. Consistency models. arXiv preprint arXiv:2303.01469, 2023.
  60. Going deeper with convolutions, 2014.
  61. A global geometric framework for nonlinear dimensionality reduction. science, 290(5500):2319–2323, 2000.
  62. Villani, C. Optimal Transport: Old and New. Grundlehren der mathematischen Wissenschaften. Springer Berlin Heidelberg, 2008. ISBN 9783540710509. URL https://books.google.co.uk/books?id=hV8o5R7_5tkC.
  63. Vincent, P. A connection between score matching and denoising autoencoders. Neural Computation, 23(7):1661–1674, 2011.
  64. Broadly applicable and accurate protein design by integrating structure prediction networks and diffusion generative models. BioRxiv, pp.  2022–12, 2022.
  65. Tackling the generative learning trilemma with denoising diffusion gans, 2022.
  66. Ufogen: You forward once large scale text-to-image generation via diffusion gans. arXiv preprint arXiv:2311.09257, 2023.
  67. Eliminating lipschitz singularities in diffusion models, 2023.
  68. Lsun: Construction of a large-scale image dataset using deep learning with humans in the loop, 2016.
  69. On the discrimination-generalization tradeoff in gans. In 6th International Conference on Learning Representations, 2018.
Citations (5)

Summary

  • The paper proposes DMMD, a novel method that replaces adversarial training with noise-adaptive MMD gradient flow for generative modeling.
  • It utilizes a noise-conditional discriminator trained via a progressive forward diffusion process to refine data representations at multiple noise levels.
  • Experimental results on CIFAR-10 reveal competitive FID and inception scores, demonstrating improved stability and scalability in image generation.

Understanding Deep MMD Gradient Flow Without Adversarial Training

Introduction

Generative modeling has seen significant strides, powering applications from image and audio generation to protein modeling and 3D creation. Typically, effective generative models fall into two categories: Generative Adversarial Networks (GANs) and Diffusion Models. Each comes with its strengths and challenges. This article focuses on a novel approach that combines insights from both, particularly streamlining the training of a discriminator without adversarial dynamics, leveraging something called Deep MMD Gradient Flow.

The Problem with Traditional Methods

GANs feature a generator and a discriminator that are trained in a min-max game, which often faces issues like instability and mode collapse. Even though GANs can create high-quality samples, fine-tuning the training procedure to avoid pitfalls is arduous.

Diffusion Models rely on a forward noising process followed by a learned backward denoising process. These models can handle multi-step processes well but often require many steps and can suffer from inefficiencies particularly as the gradient becomes unstable near the data distribution.

Enter DMMD: Diffusion-MMD-Gradient Flow

Combining the strengths of GANs' discriminators and Diffusion Models' noise processes, this new method introduces DMMD (Diffusion-MMD-Gradient Flow), offering a technique for generative modeling without requiring adversarial training and instead leveraging insights from both diffusion models and Maximum Mean Discrepancy (MMD).

How DMMD Works

Key Concepts:

  1. Maximum Mean Discrepancy (MMD): This is a measure of the distance between two distributions by comparing their mean embeddings in a Reproducing Kernel Hilbert Space (RKHS). In GANs, it's used as a loss to train discriminators.
  2. Noise-Adaptive Gradient Flow: Instead of only measuring MMD between two distributions, DMMD trains a noise-adaptive MMD that changes as samples move from a noisy initial distribution to the target distribution.

Training the Noise-Conditional Discriminator

The process begins by learning a noise-conditional MMD discriminator:

  1. Forward Diffusion Process: Gradually add noise to the data to create multiple noisy versions of the dataset.
  2. Training Process: A neural network learns to discriminate between noisy and actual data at various noise levels, improving its understanding incrementally, much like the progressive refinement in GANs but without creating adversarial conditions.

Sampling New Data

Using the trained noise-conditional MMD discriminator, DMMD allows for the creation of new samples by following these steps:

  1. Initialization: Start with random samples from a Gaussian distribution.
  2. Gradient Flow: Move these samples in the direction of the target distribution using MMD Gradient Flow, adjusting for the corresponding noise level adapted by the discriminator.

The Why: Benefits and Results

Numerical Insights

DMMD shows competitive results in unconditional image generation tasks such as CIFAR-10, achieving FID scores as low as 7.74 and inception scores above 9 in some configurations, rivaling traditional GANs and Diffusion models.

Adaptive Progression

One of the strong claims made by this paper is that an adaptive discriminator, which changes its kernel width based on the noise level, leads to faster convergence. This is crucial in high-dimensional settings like image generation where fixed methods may falter.

Broader Implications

Practical Impact

  1. Scalability: Since DMMD eliminates the need for adversarial training, it offers a more stable and scalable paradigm for training discriminators.
  2. Generative Flexibility: By leveraging gradient flows and adaptive measures, it opens up new avenues for generative modeling in complex settings such as high-dimensional data.

Theoretical Considerations

The forward diffusion process combined with MMD gradient flows offers an interesting mix of theory from optimal transport and kernel methods. It could pave the way for more robust theoretical frameworks that address issues with both GANs and diffusion models.

Future Prospects

  1. Expanding Scope: It would be interesting to explore DMMD in advanced settings like 3D object generation or in contexts where traditional diffusion models struggle, such as with highly irregular data distributions.
  2. Theoretical Optimizations: Further research might focus on optimizing the training procedure or understanding the convergence properties of the gradient flows better.

Conclusion

The proposed DMMD method exemplifies how combining established concepts from GANs and Diffusion models can lead to more stable, efficient generative modeling techniques. By training discriminators without adversarial dynamics and leveraging noise-adaptive mechanisms, DMMD provides a promising new direction in the landscape of generative modeling.