Reducing Spatial Fitting Error in Distillation of Denoising Diffusion Models (2311.03830v2)
Abstract: Denoising Diffusion models have exhibited remarkable capabilities in image generation. However, generating high-quality samples requires a large number of iterations. Knowledge distillation for diffusion models is an effective method to address this limitation with a shortened sampling process but causes degraded generative quality. Based on our analysis with bias-variance decomposition and experimental observations, we attribute the degradation to the spatial fitting error occurring in the training of both the teacher and student model. Accordingly, we propose $\textbf{S}$patial $\textbf{F}$itting-$\textbf{E}$rror $\textbf{R}$eduction $\textbf{D}$istillation model ($\textbf{SFERD}$). SFERD utilizes attention guidance from the teacher model and a designed semantic gradient predictor to reduce the student's fitting error. Empirically, our proposed model facilitates high-quality sample generation in a few function evaluations. We achieve an FID of 5.31 on CIFAR-10 and 9.39 on ImageNet 64$\times$64 with only one step, outperforming existing diffusion methods. Our study provides a new perspective on diffusion distillation by highlighting the intrinsic denoising ability of models. Project link: \url{https://github.com/Sainzerjj/SFERD}.
- Label-Efficient Semantic Segmentation with Diffusion Models. In International Conference on Learning Representations.
- Diffedit: Diffusion-based semantic image editing with mask guidance. arXiv preprint arXiv:2210.11427.
- ImageNet: A large-scale hierarchical image database. IEEE Conference on Computer Vision and Pattern Recognition, 248–255.
- Diffusion Models Beat GANs on Image Synthesis. Advances in Neural Information Processing Systems, 34: 8780–8794.
- Genie: Higher-order denoising diffusion solvers. Advances in Neural Information Processing Systems, 35: 30150–30166.
- Generative Adversarial Networks. Communications of the ACM, 63(11): 139–144.
- GANs Trained by a Two-time-scale Update Rule Converge to a Local Nash Equilibrium. Advances in Neural Information Processing Systems, 30: 6626–6637.
- Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531.
- Imagen video: High definition video generation with diffusion models. arXiv preprint arXiv:2210.02303.
- Denoising Diffusion Probabilistic Models. Advances in Neural Information Processing Systems, 33: 6840–6851.
- Classifier-Free Diffusion Guidance. In NeurIPS 2021 Workshop on Deep Generative Models and Downstream Applications.
- Conditional Diffusion Based on Discrete Graph Structures for Molecular Graph Generation. In AAAI Conference on Artificial Intelligence.
- TransGAN: Two pure transformers can make one strong GAN, and that can scale up. Advances in Neural Information Processing Systems, 34: 14745–14758.
- Elucidating the design space of diffusion-based generative models. Advances in Neural Information Processing Systems.
- Analyzing and Improving the Image Quality of StyleGAN. IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8107–8116.
- Imagic: Text-based real image editing with diffusion models. arXiv preprint arXiv:2210.09276.
- Variational diffusion models. Advances in Neural Information Processing Systems, 34: 21696–21707.
- On Fast Sampling of Diffusion Probabilistic Models. In ICML Workshop on Invertible Neural Networks, Normalizing Flows, and Explicit Likelihood Models.
- Krizhevsky, A. 2009. Learning Multiple Layers of Features from Tiny Images.
- Diffusion models already have a semantic latent space. arXiv preprint arXiv:2210.10960.
- ViTGAN: Training GANs with Vision Transformers. In International Conference on Learning Representations.
- On the Variance of the Adaptive Learning Rate and Beyond. In International Conference on Learning Representations.
- Pseudo Numerical Methods for Diffusion Models on Manifolds. In International Conference on Learning Representations.
- DPM-Solver: A Fast ODE Solver for Diffusion Probabilistic Model Sampling in Around 10 Steps. In Advances in Neural Information Processing Systems.
- DPM-Solver++: Fast Solver for Guided Sampling of Diffusion Probabilistic Models. arXiv preprint arXiv:2211.01095.
- RePaint: Inpainting using Denoising Diffusion Probabilistic Models. IEEE/CVF Conference on Computer Vision and Pattern Recognition, 11451–11461.
- Knowledge Distillation in Iterative Generative Models for Improved Sampling Speed. arXiv preprint arXiv:2101.02388.
- Diffusion probabilistic models for 3d point cloud generation. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2837–2845.
- On Distillation of Guided Diffusion Models. In NeurIPS 2022 Workshop on Score-Based Methods.
- GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models. In International Conference on Machine Learning, 16784–16804. PMLR.
- Diffusion Autoencoders: Toward a Meaningful and Decodable Representation. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, 10619–10629.
- Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125.
- High-resolution Image Synthesis with Latent Diffusion Models. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, 10684–10695.
- U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Part III 18, volume 9351, 234–241. Springer.
- Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 22500–22510.
- Improved techniques for training GANs. Advances in Neural Information Processing Systems, 29: 2226–2234.
- Progressive Distillation for Fast Sampling of Diffusion Models. In International Conference on Learning Representations.
- Deep Unsupervised Learning Using Nonequilibrium Thermodynamics. In International Conference on Machine Learning, 2256–2265. PMLR.
- Denoising Diffusion Implicit Models. In International Conference on Learning Representations.
- Consistency Models. arXiv preprint arXiv:2303.01469.
- Maximum likelihood training of score-based diffusion models. Advances in Neural Information Processing Systems, 34: 1415–1428.
- Generative Modeling by Estimating Gradients of the Data Distribution. Advances in Neural Information Processing Systems, 32: 11895–11907.
- Score-based Generative Modeling through Stochastic Differential Equations. In International Conference on Learning Representations.
- Accelerating Diffusion Sampling with Classifier-based Feature Distillation. arXiv preprint arXiv:2211.12039.
- Plug-and-Play Diffusion Features for Text-Driven Image-to-Image Translation. arXiv preprint arXiv:2211.12572.
- Imagen Editor and EditBench: Advancing and Evaluating Text-Guided Image Inpainting. arXiv preprint arXiv:2212.06909.
- Sliced Wasserstein Generative Models. IEEE/CVF Conference on Computer Vision and Pattern Recognition, 3708–3717.
- Uncovering the Disentanglement Capability in Text-to-Image Diffusion Models. ArXiv, abs/2212.08698.
- Group Normalization. International Journal of Computer Vision, 128: 742–755.
- Unsupervised representation learning from pre-trained diffusion probabilistic models. Advances in Neural Information Processing Systems, 35: 22117–22130.
- Differentiable augmentation for data-efficient GAN training. In Proceedings of the 34th International Conference on Neural Information Processing Systems, 7559–7570.
- Fast sampling of diffusion models via operator learning. In International Conference on Machine Learning, 42390–42402. PMLR.
- 3d shape generation and completion through point-voxel diffusion. In IEEE/CVF International Conference on Computer Vision, 5826–5835.