DKDM: Data-Free Knowledge Distillation for Diffusion Models with Any Architecture (2409.03550v1)

Published 5 Sep 2024 in cs.CV, cs.AI, and cs.LG

Abstract: Diffusion models (DMs) have demonstrated exceptional generative capabilities across various areas, while they are hindered by slow inference speeds and high computational demands during deployment. The most common way to accelerate DMs involves reducing the number of denoising steps during generation, achieved through faster sampling solvers or knowledge distillation (KD). In contrast to prior approaches, we propose a novel method that transfers the capability of large pretrained DMs to faster architectures. Specifically, we employ KD in a distinct manner to compress DMs by distilling their generative ability into more rapid variants. Furthermore, considering that the source data is either unaccessible or too enormous to store for current generative models, we introduce a new paradigm for their distillation without source data, termed Data-Free Knowledge Distillation for Diffusion Models (DKDM). Generally, our established DKDM framework comprises two main components: 1) a DKDM objective that uses synthetic denoising data produced by pretrained DMs to optimize faster DMs without source data, and 2) a dynamic iterative distillation method that flexibly organizes the synthesis of denoising data, preventing it from slowing down the optimization process as the generation is slow. To our knowledge, this is the first attempt at using KD to distill DMs into any architecture in a data-free manner. Importantly, our DKDM is orthogonal to most existing acceleration methods, such as denoising step reduction, quantization and pruning. Experiments show that our DKDM is capable of deriving 2x faster DMs with performance remaining on par with the baseline. Notably, our DKDM enables pretrained DMs to function as "datasets" for training new DMs.

Authors (6)

Qianlong Xiang (1 paper)
Miao Zhang (147 papers)
Yuzhang Shang (35 papers)
Jianlong Wu (38 papers)
Yan Yan (242 papers)
Liqiang Nie (191 papers)

Citations (2)

View on Semantic Scholar

Summary

Overview of "DKDM: Data-Free Knowledge Distillation for Diffusion Models with Any Architecture"

The paper "DKDM: Data-Free Knowledge Distillation for Diffusion Models with Any Architecture" introduces a novel paradigm for accelerating Diffusion Models (DMs) through a data-free approach. This method, termed Data-Free Knowledge Distillation for Diffusion Models (DKDM), is designed to transfer the generative capabilities of large pretrained DMs to more efficient architectures without the need for source data. The authors address the challenges of slow inference speeds and high computational demands, which limit the practical deployment of DMs.

Key Contributions

Data-Free Knowledge Distillation (DKDM): The paper introduces DKDM as a new framework for distilling DMs without accessing the source data. This approach leverages the generative ability of pretrained models to create synthetic denoising data, thereby optimizing faster DM architectures.
Novel DKDM Objective: The authors propose a DKDM objective that aligns closely with the traditional DM optimization goals while eliminating dependencies on source data. This objective effectively distills knowledge from the teacher to the student model.
Dynamic Iterative Distillation: To handle the bottleneck of generating synthetic data, the authors introduce a dynamic iterative distillation method. This approach allows for efficient collection and use of denoising data, significantly reducing computation and storage requirements compared to naïve methods.
Versatility Across Architectures: DKDM is architecture-agnostic, allowing the distillation of knowledge from large models into student models of any architectural design. This flexibility enables broad applicability across various model configurations.

Experimental Results

The experiments demonstrate that DMs distilled using DKDM can achieve up to twice the generation speed compared to baseline models while maintaining comparable performance in terms of generative quality. Specifically, the DKDM-accelerated models show substantial improvements in metrics such as Inception Score (IS) and Fréchet Inception Distance (FID) across multiple datasets, including CIFAR10, CelebA, and ImageNet.

DKDM is shown to complement existing acceleration techniques like denoising step reduction and quantization. Notably, it facilitates the efficient training of DMs without requiring access to large, proprietary datasets, thereby addressing logistical and privacy concerns prevalent in current KD methods.

Implications and Future Directions

The DKDM paradigm presents a significant advancement in the scalability and deployment of diffusion models, particularly for consumer applications where processing power and computational efficiency are critical. By removing the dependency on source datasets, DKDM allows for greater flexibility and reduces barriers to experimentation and deployment in varied environments.

Future research could explore integrating DKDM with advanced techniques like Neural Architecture Search (NAS) to further optimize DMs in terms of speed and performance. Furthermore, addressing the underlying distribution gap between the synthetic data generated by the teacher model and the original data could potentially enhance the efficacy of the distilled models.

Conclusion

The paper presents a substantial contribution to the field of machine learning and model optimization, offering a practical solution to the challenges faced by diffusion models. By enabling data-free distillation, DKDM not only improves the inference speed of DMs but also bypasses data privacy issues, paving the way for more robust and versatile applications of generative models. This research sets a foundation for future endeavors in accelerating and deploying complex generative models efficiently and securely.

PDF Markdown

Related Papers

YouTube

Show All Videos