Post-training Quantization on Diffusion Models (2211.15736v3)

Published 28 Nov 2022 in cs.CV

Abstract: Denoising diffusion (score-based) generative models have recently achieved significant accomplishments in generating realistic and diverse data. These approaches define a forward diffusion process for transforming data into noise and a backward denoising process for sampling data from noise. Unfortunately, the generation process of current denoising diffusion models is notoriously slow due to the lengthy iterative noise estimations, which rely on cumbersome neural networks. It prevents the diffusion models from being widely deployed, especially on edge devices. Previous works accelerate the generation process of diffusion model (DM) via finding shorter yet effective sampling trajectories. However, they overlook the cost of noise estimation with a heavy network in every iteration. In this work, we accelerate generation from the perspective of compressing the noise estimation network. Due to the difficulty of retraining DMs, we exclude mainstream training-aware compression paradigms and introduce post-training quantization (PTQ) into DM acceleration. However, the output distributions of noise estimation networks change with time-step, making previous PTQ methods fail in DMs since they are designed for single-time step scenarios. To devise a DM-specific PTQ method, we explore PTQ on DM in three aspects: quantized operations, calibration dataset, and calibration metric. We summarize and use several observations derived from all-inclusive investigations to formulate our method, which especially targets the unique multi-time-step structure of DMs. Experimentally, our method can directly quantize full-precision DMs into 8-bit models while maintaining or even improving their performance in a training-free manner. Importantly, our method can serve as a plug-and-play module on other fast-sampling methods, e.g., DDIM. The code is available at https://github.com/42Shawn/PTQ4DM .

Authors (5)

Yuzhang Shang (35 papers)
Zhihang Yuan (45 papers)
Bin Xie (38 papers)
Bingzhe Wu (58 papers)
Yan Yan (242 papers)

Citations (107)

View on Semantic Scholar

Summary

Post-training Quantization on Diffusion Models: An Expert Overview

The paper "Post-training Quantization on Diffusion Models" presents a novel approach to accelerating denoising diffusion models by introducing post-training quantization (PTQ). The diffusion models, which have made significant strides in generating high-quality and diverse data across various domains such as image, audio, and video, face deployment challenges due to their high computational costs and slow sampling processes. The research addresses these issues by focusing on the compression of the noise estimation network, a less-explored dimension compared to the traditionally tackled problem of reducing sampling trajectories.

Summary of Contributions

Introduction of PTQ to Diffusion Models: The authors pioneer the integration of PTQ in the context of diffusion models, aiming to compress noise estimation networks without retraining. This method progresses beyond the traditional training-aware compression paradigms that necessitate extensive computational and dataset resources.
Challenges in Multi-Time-Step Scenarios: Diffusion models operate with varying output distributions across time-steps due to the nature of their generation process, which presents unique challenges for PTQ that were not faced in prior applications like CNNs and ViTs. The paper identifies these time-step-dependent distribution shifts as a key hurdle.
Tailoring PTQ for Diffusion Models: The authors propose innovative solutions including modifications in quantized operations, and the development of a specially crafted calibration dataset and metric tailored for diffusion models. These adaptations are designed to address the multi-time-step nature and ensure effective quantization.
Empirical Validation: Experimentation confirms that their PTQ method can quantize diffusion models to 8-bit precision while maintaining, or even enhancing, model performance in terms of inception scores (IS) and FID, with the models operating entirely in a training-free manner. This solution is presented as a plug-and-play module compatible with other fast-sampling techniques like DDIM.

Implications and Future Directions

This research highlights a vital, yet often overlooked, aspect of model acceleration through network compression. The introduction of PTQ in the domain of diffusion models represents a shift towards optimizing neural network architectures that are inherently resource-intensive, extending the applicability of these models to edge devices and broader real-world applications.

From a theoretical perspective, this work challenges the common beliefs about model redundancy, demonstrating that performance improvements can be achieved not just by altering inference paths but also through strategic network compression. The favorable results invite further exploration into varied quantization techniques and their effects on model efficacy across different architectures and applications.

Looking ahead, the paper sets a precedent for future investigations into PTQ methods, particularly in frameworks exhibiting dynamic operational states across different stages of processing. The scalable and generalizable nature of the proposed methodology creates opportunities to redefine compression strategies for a wide array of generative models beyond diffusion models. Moreover, this approach could be extended to other emerging paradigms in deep learning that require efficient computation and storage.

In conclusion, this paper contributes significantly to the field of generative model optimization, providing a viable solution to a complex challenge and establishing a foundation for future technological advancements in AI deployment strategies.

PDF Markdown

Related Papers

GitHub

GitHub - 42Shawn/PTQ4DM: Implementation of Post-training Quantization on Diffusion Models (CVPR 2023) (126 stars)