Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
129 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Reducing Spatial Fitting Error in Distillation of Denoising Diffusion Models (2311.03830v2)

Published 7 Nov 2023 in cs.CV and cs.AI

Abstract: Denoising Diffusion models have exhibited remarkable capabilities in image generation. However, generating high-quality samples requires a large number of iterations. Knowledge distillation for diffusion models is an effective method to address this limitation with a shortened sampling process but causes degraded generative quality. Based on our analysis with bias-variance decomposition and experimental observations, we attribute the degradation to the spatial fitting error occurring in the training of both the teacher and student model. Accordingly, we propose $\textbf{S}$patial $\textbf{F}$itting-$\textbf{E}$rror $\textbf{R}$eduction $\textbf{D}$istillation model ($\textbf{SFERD}$). SFERD utilizes attention guidance from the teacher model and a designed semantic gradient predictor to reduce the student's fitting error. Empirically, our proposed model facilitates high-quality sample generation in a few function evaluations. We achieve an FID of 5.31 on CIFAR-10 and 9.39 on ImageNet 64$\times$64 with only one step, outperforming existing diffusion methods. Our study provides a new perspective on diffusion distillation by highlighting the intrinsic denoising ability of models. Project link: \url{https://github.com/Sainzerjj/SFERD}.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (53)
  1. Label-Efficient Semantic Segmentation with Diffusion Models. In International Conference on Learning Representations.
  2. Diffedit: Diffusion-based semantic image editing with mask guidance. arXiv preprint arXiv:2210.11427.
  3. ImageNet: A large-scale hierarchical image database. IEEE Conference on Computer Vision and Pattern Recognition, 248–255.
  4. Diffusion Models Beat GANs on Image Synthesis. Advances in Neural Information Processing Systems, 34: 8780–8794.
  5. Genie: Higher-order denoising diffusion solvers. Advances in Neural Information Processing Systems, 35: 30150–30166.
  6. Generative Adversarial Networks. Communications of the ACM, 63(11): 139–144.
  7. GANs Trained by a Two-time-scale Update Rule Converge to a Local Nash Equilibrium. Advances in Neural Information Processing Systems, 30: 6626–6637.
  8. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531.
  9. Imagen video: High definition video generation with diffusion models. arXiv preprint arXiv:2210.02303.
  10. Denoising Diffusion Probabilistic Models. Advances in Neural Information Processing Systems, 33: 6840–6851.
  11. Classifier-Free Diffusion Guidance. In NeurIPS 2021 Workshop on Deep Generative Models and Downstream Applications.
  12. Conditional Diffusion Based on Discrete Graph Structures for Molecular Graph Generation. In AAAI Conference on Artificial Intelligence.
  13. TransGAN: Two pure transformers can make one strong GAN, and that can scale up. Advances in Neural Information Processing Systems, 34: 14745–14758.
  14. Elucidating the design space of diffusion-based generative models. Advances in Neural Information Processing Systems.
  15. Analyzing and Improving the Image Quality of StyleGAN. IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8107–8116.
  16. Imagic: Text-based real image editing with diffusion models. arXiv preprint arXiv:2210.09276.
  17. Variational diffusion models. Advances in Neural Information Processing Systems, 34: 21696–21707.
  18. On Fast Sampling of Diffusion Probabilistic Models. In ICML Workshop on Invertible Neural Networks, Normalizing Flows, and Explicit Likelihood Models.
  19. Krizhevsky, A. 2009. Learning Multiple Layers of Features from Tiny Images.
  20. Diffusion models already have a semantic latent space. arXiv preprint arXiv:2210.10960.
  21. ViTGAN: Training GANs with Vision Transformers. In International Conference on Learning Representations.
  22. On the Variance of the Adaptive Learning Rate and Beyond. In International Conference on Learning Representations.
  23. Pseudo Numerical Methods for Diffusion Models on Manifolds. In International Conference on Learning Representations.
  24. DPM-Solver: A Fast ODE Solver for Diffusion Probabilistic Model Sampling in Around 10 Steps. In Advances in Neural Information Processing Systems.
  25. DPM-Solver++: Fast Solver for Guided Sampling of Diffusion Probabilistic Models. arXiv preprint arXiv:2211.01095.
  26. RePaint: Inpainting using Denoising Diffusion Probabilistic Models. IEEE/CVF Conference on Computer Vision and Pattern Recognition, 11451–11461.
  27. Knowledge Distillation in Iterative Generative Models for Improved Sampling Speed. arXiv preprint arXiv:2101.02388.
  28. Diffusion probabilistic models for 3d point cloud generation. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2837–2845.
  29. On Distillation of Guided Diffusion Models. In NeurIPS 2022 Workshop on Score-Based Methods.
  30. GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models. In International Conference on Machine Learning, 16784–16804. PMLR.
  31. Diffusion Autoencoders: Toward a Meaningful and Decodable Representation. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, 10619–10629.
  32. Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125.
  33. High-resolution Image Synthesis with Latent Diffusion Models. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, 10684–10695.
  34. U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Part III 18, volume 9351, 234–241. Springer.
  35. Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 22500–22510.
  36. Improved techniques for training GANs. Advances in Neural Information Processing Systems, 29: 2226–2234.
  37. Progressive Distillation for Fast Sampling of Diffusion Models. In International Conference on Learning Representations.
  38. Deep Unsupervised Learning Using Nonequilibrium Thermodynamics. In International Conference on Machine Learning, 2256–2265. PMLR.
  39. Denoising Diffusion Implicit Models. In International Conference on Learning Representations.
  40. Consistency Models. arXiv preprint arXiv:2303.01469.
  41. Maximum likelihood training of score-based diffusion models. Advances in Neural Information Processing Systems, 34: 1415–1428.
  42. Generative Modeling by Estimating Gradients of the Data Distribution. Advances in Neural Information Processing Systems, 32: 11895–11907.
  43. Score-based Generative Modeling through Stochastic Differential Equations. In International Conference on Learning Representations.
  44. Accelerating Diffusion Sampling with Classifier-based Feature Distillation. arXiv preprint arXiv:2211.12039.
  45. Plug-and-Play Diffusion Features for Text-Driven Image-to-Image Translation. arXiv preprint arXiv:2211.12572.
  46. Imagen Editor and EditBench: Advancing and Evaluating Text-Guided Image Inpainting. arXiv preprint arXiv:2212.06909.
  47. Sliced Wasserstein Generative Models. IEEE/CVF Conference on Computer Vision and Pattern Recognition, 3708–3717.
  48. Uncovering the Disentanglement Capability in Text-to-Image Diffusion Models. ArXiv, abs/2212.08698.
  49. Group Normalization. International Journal of Computer Vision, 128: 742–755.
  50. Unsupervised representation learning from pre-trained diffusion probabilistic models. Advances in Neural Information Processing Systems, 35: 22117–22130.
  51. Differentiable augmentation for data-efficient GAN training. In Proceedings of the 34th International Conference on Neural Information Processing Systems, 7559–7570.
  52. Fast sampling of diffusion models via operator learning. In International Conference on Machine Learning, 42390–42402. PMLR.
  53. 3d shape generation and completion through point-voxel diffusion. In IEEE/CVF International Conference on Computer Vision, 5826–5835.

Summary

We haven't generated a summary for this paper yet.