Simple and Fast Distillation of Diffusion Models (2409.19681v1)

Published 29 Sep 2024 in cs.CV

Abstract: Diffusion-based generative models have demonstrated their powerful performance across various tasks, but this comes at a cost of the slow sampling speed. To achieve both efficient and high-quality synthesis, various distillation-based accelerated sampling methods have been developed recently. However, they generally require time-consuming fine tuning with elaborate designs to achieve satisfactory performance in a specific number of function evaluation (NFE), making them difficult to employ in practice. To address this issue, we propose Simple and Fast Distillation (SFD) of diffusion models, which simplifies the paradigm used in existing methods and largely shortens their fine-tuning time up to 1000$\times$. We begin with a vanilla distillation-based sampling method and boost its performance to state of the art by identifying and addressing several small yet vital factors affecting the synthesis efficiency and quality. Our method can also achieve sampling with variable NFEs using a single distilled model. Extensive experiments demonstrate that SFD strikes a good balance between the sample quality and fine-tuning costs in few-step image generation task. For example, SFD achieves 4.53 FID (NFE=2) on CIFAR-10 with only 0.64 hours of fine-tuning on a single NVIDIA A100 GPU. Our code is available at https://github.com/zju-pi/diff-sampler.

Summary

The paper presents a novel SFD technique that reduces fine-tuning time of diffusion models by up to 1000x while preserving sample quality.
It achieves balanced performance with few sampling steps by optimizing critical timestamps and supporting variable NFE.
Empirical tests on datasets like CIFAR-10 demonstrate that SFD outperforms existing methods, enhancing practical deployment of diffusion models.

Summary of "Simple and Fast Distillation of Diffusion Models"

The paper "Simple and Fast Distillation of Diffusion Models" by Zhenyu Zhou et al. addresses the issue of slow sampling speeds in diffusion-based generative models. These models, known for their high-quality synthesis across various domains including image, video, audio, and molecular structures, often require hundreds or thousands of sampling steps, making their application in practical scenarios challenging. To mitigate this, the paper introduces a novel method called Simple and Fast Distillation (SFD), which significantly reduces the fine-tuning time required for these models while maintaining high synthesis quality.

Main Contributions

Conceptual Simplification:
- The SFD method optimizes a vanilla distillation-based sampling approach by addressing small but critical factors affecting synthesis efficiency and quality. This approach reduces the fine-tuning time by up to 1000 times.
Balanced Performance:
- Empirical results demonstrate that SFD effectively balances sample quality and fine-tuning costs in few-step image generation tasks. For instance, on the CIFAR-10 dataset, SFD achieves a Fréchet Inception Distance (FID) of 4.53 with just 0.64 hours of fine-tuning using an NVIDIA A100 GPU.
Variable NFE Sampling:
- The method supports sampling with variable numbers of function evaluations (NFE) using a single distilled model. This flexibility is crucial for adapting the model to different computational constraints.
Efficient Trajectory Matching:
- Compared to existing methods that require extensive fine-tuning and complex optimization objectives, SFD simplifies the process, focusing on a small number of critical timestamps. This streamlined approach enhances efficiency without compromising synthesis quality.

Technical Approach

Revisiting Trajectory Distillation:
- The paper highlights the inefficiencies in current distillation methods, which often suffer from bloated fine-tuning costs due to step mismatch and complex objectives. To address this, the authors propose a global perspective on trajectory distillation.
Smooth Gradient Field Modification:
- Fine-tuning at specific timestamps is shown to positively impact other timestamps, facilitating a smoother and more efficient refinement process. This insight forms the basis for only training on a subset of critical timestamps.
Implementation of SFD:
- SFD employs DPM-Solver++(3M) as the teacher solver and adopts a simplified training procedure to enhance the student model’s gradient field. The fine-tuning involves adjusting a minimal number of sampling steps, further supported by an analytical first step (AFS) technique to save on function evaluations.
Step-Condition in SFD-v:
- The introduction of a step-condition into the model enables the student to perform sampling with various steps, effectively making the distilled model adaptable to different NFEs.

Experimental Results

Uncurated Comparisons:
- Across datasets like CIFAR-10, ImageNet 64×64, and LSUN-Bedroom 256×256, SFD and SFD-v consistently outperform or match the sample quality of existing methods like progressive distillation and consistency models while requiring significantly less fine-tuning time.
Stable Diffusion:
- In the case of text-to-image generation using Stable Diffusion, SFD demonstrates its effectiveness even at high resolutions. By training with a guidance scale of 1 and sampling at various scales, the method achieves competitive FID scores with fewer sampling steps.

Implications and Future Work

The findings of this paper have practical implications for the deployment of generative models in real-time applications where computational resources and response time are critical. By enabling high-quality synthesis with minimal fine-tuning, SFD makes diffusion models more accessible for broader applications.

Theoretically, the simplified approach to trajectory distillation and the variable NFE capability provides a new direction for future research in model distillation and generative sampling. Further exploration could involve tailoring specific time schedules and enhancing the core mechanisms underlying trajectory matching.

Conclusion

"Simple and Fast Distillation of Diffusion Models" introduces an efficient approach to reduce the fine-tuning burden of diffusion-based generative models while retaining high synthesis quality. The method balances practical usability with theoretical robustness, paving the way for more resource-efficient model deployments in diverse generative tasks.

PDF Markdown

Related Papers

GitHub

GitHub - zju-pi/diff-sampler: An open-source toolbox for fast sampling of diffusion models. Official implementations of our papers published in ICML, CVPR, NeurIPS. (168 stars)