Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
98 tokens/sec
GPT-4o
12 tokens/sec
Gemini 2.5 Pro Pro
52 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
15 tokens/sec
DeepSeek R1 via Azure Pro
33 tokens/sec
Gemini 2.5 Flash Deprecated
12 tokens/sec
2000 character limit reached

Score identity Distillation: Exponentially Fast Distillation of Pretrained Diffusion Models for One-Step Generation (2404.04057v3)

Published 5 Apr 2024 in cs.LG, cs.AI, cs.CV, and stat.ML

Abstract: We introduce Score identity Distillation (SiD), an innovative data-free method that distills the generative capabilities of pretrained diffusion models into a single-step generator. SiD not only facilitates an exponentially fast reduction in Fr\'echet inception distance (FID) during distillation but also approaches or even exceeds the FID performance of the original teacher diffusion models. By reformulating forward diffusion processes as semi-implicit distributions, we leverage three score-related identities to create an innovative loss mechanism. This mechanism achieves rapid FID reduction by training the generator using its own synthesized images, eliminating the need for real data or reverse-diffusion-based generation, all accomplished within significantly shortened generation time. Upon evaluation across four benchmark datasets, the SiD algorithm demonstrates high iteration efficiency during distillation and surpasses competing distillation approaches, whether they are one-step or few-step, data-free, or dependent on training data, in terms of generation quality. This achievement not only redefines the benchmarks for efficiency and effectiveness in diffusion distillation but also in the broader field of diffusion-based generation. The PyTorch implementation is available at https://github.com/mingyuanzhou/SiD

Definition Search Book Streamline Icon: https://streamlinehq.com
References (87)
  1. All are worth words: A ViT backbone for score-based diffusion models. arXiv preprint arXiv:2209.12152, 2022.
  2. Tract: Denoising diffusion models with transitive closure time-distillation. arXiv preprint arXiv:2303.04248, 2023.
  3. Large scale GAN training for high fidelity natural image synthesis. In International Conference on Learning Representations, 2019. URL https://openreview.net/forum?id=B1xsqj09Fm.
  4. Stargan v2: Diverse image synthesis for multiple domains. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.  8188–8197, 2020.
  5. Improving diffusion models for inverse problems using manifold constraints. In Oh, A. H., Agarwal, A., Belgrave, D., and Cho, K. (eds.), Advances in Neural Information Processing Systems, 2022. URL https://openreview.net/forum?id=nJJjv0JDJju.
  6. ImageNet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pp.  248–255. Ieee, 2009.
  7. Diffusion models beat gans on image synthesis. Advances in Neural Information Processing Systems, 34:8780–8794, 2021.
  8. Efron, B. Tweedie’s formula and selection bias. Journal of the American Statistical Association, 106(496):1602–1614, 2011.
  9. One-step diffusion distillation via deep equilibrium models. Advances in Neural Information Processing Systems, 36, 2023.
  10. Generative adversarial nets. In Advances in Neural Information Processing Systems, pp. 2672–2680, 2014.
  11. BOOT: Data-free distillation of denoising diffusion models with bootstrapping. In ICML 2023 Workshop on Structured Probabilistic Inference {normal-{\{{\normal-\\backslash\&}normal-}\}} Generative Modeling, 2023.
  12. Semi-implicit graph variational auto-encoders. Advances in neural information processing systems, 32, 2019.
  13. GANs trained by a two time-scale update rule converge to a local Nash equilibrium. In Advances in Neural Information Processing Systems, pp. 6626–6637, 2017.
  14. Denoising diffusion probabilistic models. In Advances in Neural Information Processing Systems, 2020.
  15. Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res., 23(47):1–33, 2022.
  16. Assigning a value to a power likelihood in a general Bayesian model. Biometrika, 104(2):497–503, 2017.
  17. A two-timescale stochastic algorithm framework for bilevel optimization: Complexity analysis and application to actor-critic. SIAM Journal on Optimization, 33(1):147–180, 2023.
  18. LoRA: Low-rank adaptation of large language models. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=nZeVKeeFYf9.
  19. Scalable adaptive computation for iterative generation. arXiv preprint arXiv:2212.11972, 2022.
  20. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.  4401–4410, 2019.
  21. Analyzing and improving the image quality of StyleGAN. In Proc. CVPR, 2020.
  22. Elucidating the design space of diffusion-based generative models. In Oh, A. H., Agarwal, A., Belgrave, D., and Cho, K. (eds.), Advances in Neural Information Processing Systems, 2022. URL https://openreview.net/forum?id=k7FuTOWMOc7.
  23. Consistency trajectory models: Learning probability flow ode trajectory of diffusion. arXiv preprint arXiv:2310.02279, 2023.
  24. Auto-encoding variational Bayes. In International Conference on Learning Representations, 2014.
  25. Variational diffusion models. arXiv preprint arXiv:2107.00630, 2021.
  26. Krizhevsky, A. et al. Learning multiple layers of features from tiny images. 2009.
  27. Improved precision and recall metric for assessing generative models. Advances in Neural Information Processing Systems, 32, 2019.
  28. Energy-inspired models: Learning with sampler-induced distributions. Advances in Neural Information Processing Systems, 32, 2019.
  29. Flow matching for generative modeling. arXiv preprint arXiv:2210.02747, 2022.
  30. Pseudo numerical methods for diffusion models on manifolds. In International Conference on Learning Representations, 2022a. URL https://openreview.net/forum?id=PlKWVd2yBkY.
  31. Flow straight and fast: Learning to generate and transfer data with rectified flow. arXiv preprint arXiv:2209.03003, 2022b.
  32. DPM-solver: A fast ODE solver for diffusion probabilistic model sampling in around 10 steps. In Oh, A. H., Agarwal, A., Belgrave, D., and Cho, K. (eds.), Advances in Neural Information Processing Systems, 2022. URL https://openreview.net/forum?id=2uAaGwlP_V.
  33. Knowledge distillation in iterative generative models for improved sampling speed. arXiv preprint arXiv:2101.02388, 2021.
  34. Luo, C. Understanding diffusion models: A unified perspective. arXiv preprint arXiv:2208.11970, 2022.
  35. Latent consistency models: Synthesizing high-resolution images with few-step inference. ArXiv, abs/2310.04378, 2023a.
  36. Lcm-lora: A universal stable-diffusion acceleration module, 2023b.
  37. Diff-instruct: A universal approach for transferring knowledge from pre-trained diffusion models. In Thirty-seventh Conference on Neural Information Processing Systems, 2023c. URL https://openreview.net/forum?id=MLIs5iRq4w.
  38. Lyu, S. Interpretation and generalization of score matching. In Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, pp.  359–366, 2009.
  39. Accelerating diffusion models via early stop of the diffusion process. arXiv preprint arXiv:2205.12524, 2022.
  40. On distillation of guided diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  14297–14306, 2023.
  41. Efficient semi-implicit variational inference. arXiv preprint arXiv:2101.06070, 2021.
  42. Doubly semi-implicit variational inference. In The 22nd International Conference on Artificial Intelligence and Statistics, pp.  2593–2602. PMLR, 2019.
  43. SwiftBrush: One-step text-to-image diffusion model with variational score distillation. arXiv preprint arXiv:2312.05239, 2023.
  44. Improved denoising diffusion probabilistic models. In International Conference on Machine Learning, pp. 8162–8171. PMLR, 2021.
  45. DiffuseVAE: Efficient, controllable and high-fidelity generation from low-dimensional latents. arXiv preprint arXiv:2201.00308, 2022.
  46. Scalable diffusion models with transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.  4195–4205, 2023.
  47. Dreamfusion: Text-to-3d using 2d diffusion. ArXiv, abs/2209.14988, 2022.
  48. Hierarchical text-conditional image generation with CLIP latents. arXiv preprint arXiv:2204.06125, 2022.
  49. Stochastic backpropagation and approximate inference in deep generative models. In Proceedings of the 31st International Conference on Machine Learning, pp.  1278–1286, 2014.
  50. Robbins, H. E. An empirical Bayes approach to statistics. In Breakthroughs in Statistics: Foundations and basic theory, pp.  388–394. Springer, 1992.
  51. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  10684–10695, 2022.
  52. U-Net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention (MICCAI), volume 9351 of LNCS, pp.  234–241. Springer, 2015. URL http://lmb.informatik.uni-freiburg.de/Publications/2015/RFB15a. (available on arXiv:1505.04597 [cs.CV]).
  53. Photorealistic text-to-image diffusion models with deep language understanding. Advances in Neural Information Processing Systems, 35:36479–36494, 2022.
  54. Progressive distillation for fast sampling of diffusion models. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=TIdIXIpzhoI.
  55. Improved techniques for training GANs. In Advances in Neural Information Processing Systems, pp. 2234–2242, 2016.
  56. Projected gans converge faster. Advances in Neural Information Processing Systems, 34:17480–17492, 2021.
  57. Adversarial diffusion distillation. ArXiv, abs/2311.17042, 2023.
  58. On penalty-based bilevel gradient descent method. arXiv preprint arXiv:2302.05185, 2023.
  59. Importance weighted hierarchical variational inference. Advances in Neural Information Processing Systems, 32, 2019.
  60. Deep unsupervised learning using nonequilibrium thermodynamics. In International Conference on Machine Learning, pp. 2256–2265. PMLR, 2015.
  61. Denoising diffusion implicit models. In International Conference on Learning Representations, 2020.
  62. Improved techniques for training consistency models. arXiv preprint arXiv:2310.14189, 2023.
  63. Generative Modeling by Estimating Gradients of the Data Distribution. In Advances in Neural Information Processing Systems, pp. 11918–11930, 2019.
  64. Consistency models. arXiv preprint arXiv:2303.01469, 2023.
  65. Physics informed distillation for diffusion models, 2024. URL https://openreview.net/forum?id=a24gfxA7jD.
  66. Unbiased implicit variational inference. In The 22nd International Conference on Artificial Intelligence and Statistics, pp.  167–176. PMLR, 2019.
  67. NVAE: A deep hierarchical variational autoencoder. In Advances in neural information processing systems, 2020.
  68. Vincent, P. A Connection Between Score Matching and Denoising Autoencoders. Neural Computation, 23(7):1661–1674, 2011.
  69. Patch diffusion: Faster and more data-efficient training of diffusion models. arXiv preprint arXiv:2304.12526, 2023a.
  70. Prolificdreamer: High-fidelity and diverse text-to-3d generation with variational score distillation, 2023b.
  71. Diffusion-GAN: Training GANs with diffusion. In The Eleventh International Conference on Learning Representations, 2023c. URL https://openreview.net/forum?id=HZf7UbpWHuA.
  72. Bayesian learning via stochastic gradient Langevin dynamics. In Proceedings of the 28th international conference on machine learning (ICML-11), pp.  681–688. Citeseer, 2011.
  73. Tackling the generative learning trilemma with denoising diffusion GANs. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=JprM0p-q0Co.
  74. UFOGen: You forward once large scale text-to-image generation via diffusion GANs. ArXiv, abs/2311.09257, 2023.
  75. Variational approximations using Fisher divergence. arXiv preprint arXiv:1905.05284, 2019.
  76. Exact penalization and necessary optimality conditions for generalized bilevel programming problems. SIAM Journal on optimization, 7(2):481–507, 1997.
  77. Score mismatching for generative modeling. arXiv preprint arXiv:2309.11043, 2023.
  78. Semi-implicit variational inference. In International Conference on Machine Learning, pp. 5660–5669, 2018.
  79. One-step diffusion with distribution matching distillation, 2023.
  80. Semi-implicit variational inference via score matching. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=sd90a2ytrt.
  81. Hierarchical semi-implicit variational inference with application to diffusion model acceleration. In Thirty-seventh Conference on Neural Information Processing Systems, 2023. URL https://openreview.net/forum?id=ghIBaprxsV.
  82. Fast sampling of diffusion models with exponential integrator. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=Loek7hfb46P.
  83. Differentiable augmentation for data-efficient gan training. Advances in Neural Information Processing Systems, 33:7559–7570, 2020.
  84. Exploiting chain rule and Bayes’ theorem to compare probability distributions. Advances in Neural Information Processing Systems, 34:14993–15006, 2021.
  85. Truncated diffusion probabilistic models and diffusion-based adversarial auto-encoders. In The Eleventh International Conference on Learning Representations, 2023a. URL https://openreview.net/forum?id=HDxgaKk956l.
  86. Fast sampling of diffusion models via operator learning. In International Conference on Machine Learning, pp. 42390–42402. PMLR, 2023b.
  87. Learning stackable and skippable LEGO bricks for efficient, reconfigurable, and variable-resolution diffusion modeling, 2023c.
Citations (30)

Summary

We haven't generated a summary for this paper yet.