Effective Quantization for Diffusion Models on CPUs (2311.16133v2)
Abstract: Diffusion models have gained popularity for generating images from textual descriptions. Nonetheless, the substantial need for computational resources continues to present a noteworthy challenge, contributing to time-consuming processes. Quantization, a technique employed to compress deep learning models for enhanced efficiency, presents challenges when applied to diffusion models. These models are notably more sensitive to quantization compared to other model types, potentially resulting in a degradation of image quality. In this paper, we introduce a novel approach to quantize the diffusion models by leveraging both quantization-aware training and distillation. Our results show the quantized models can maintain the high image quality while demonstrating the inference efficiency on CPUs. The code is publicly available at: https://github.com/intel/intel-extension-for-transformers.
- Optimize weight rounding via signed gradient descent for the quantization of llms. arXiv preprint arXiv:2309.05516, 2023.
- Gptq: Accurate post-training quantization for generative pre-trained transformers. arXiv preprint arXiv:2210.17323, 2022.
- Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in neural information processing systems, 30, 2017a.
- Gans trained by a two time-scale update rule converge to a nash equilibrium. CoRR, abs/1706.08500, 2017b. URL http://arxiv.org/abs/1706.08500.
- Intel. Intel® extension for transformers, 2023. URL "https://github.com/intel/intel-extension-for-transformers. https://github.com/intel/intel-extension-for-transformers.
- Quantization and training of neural networks for efficient integer-arithmetic-only inference. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2704–2713, 2018.
- Efficient spatially sparse inference for conditional gans and diffusion models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023a.
- Q-diffusion: Quantizing diffusion models. arXiv preprint arXiv:2302.04304, 2023b.
- Microsoft coco: Common objects in context. In European Conference on Computer Vision, 2014. URL https://api.semanticscholar.org/CorpusID:14113767.
- High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022.
- Photorealistic text-to-image diffusion models with deep language understanding. Advances in Neural Information Processing Systems, 35:36479–36494, 2022.
- Post-training quantization on diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1972–1981, 2023.
- Hanwen Chang (4 papers)
- Haihao Shen (11 papers)
- Yiyang Cai (7 papers)
- Xinyu Ye (6 papers)
- Zhenzhong Xu (1 paper)
- Wenhua Cheng (3 papers)
- Kaokao Lv (3 papers)
- Weiwei Zhang (80 papers)
- Yintong Lu (1 paper)
- Heng Guo (94 papers)