MixDQ: Memory-Efficient Few-Step Text-to-Image Diffusion Models with Metric-Decoupled Mixed Precision Quantization (2405.17873v2)

Published 28 May 2024 in cs.CV and cs.AI

Abstract: Diffusion models have achieved significant visual generation quality. However, their significant computational and memory costs pose challenge for their application on resource-constrained mobile devices or even desktop GPUs. Recent few-step diffusion models reduces the inference time by reducing the denoising steps. However, their memory consumptions are still excessive. The Post Training Quantization (PTQ) replaces high bit-width FP representation with low-bit integer values (INT4/8) , which is an effective and efficient technique to reduce the memory cost. However, when applying to few-step diffusion models, existing quantization methods face challenges in preserving both the image quality and text alignment. To address this issue, we propose an mixed-precision quantization framework - MixDQ. Firstly, We design specialized BOS-aware quantization method for highly sensitive text embedding quantization. Then, we conduct metric-decoupled sensitivity analysis to measure the sensitivity of each layer. Finally, we develop an integer-programming-based method to conduct bit-width allocation. While existing quantization methods fall short at W8A8, MixDQ could achieve W8A8 without performance loss, and W4A8 with negligible visual degradation. Compared with FP16, we achieve 3-4x reduction in model size and memory cost, and 1.45x latency speedup.

PDF HTML Abstract

Summarize PDF Markdown Bookmark Chat (Pro)

References (4)

Authors (9)

Tianchen Zhao (27 papers)
Xuefei Ning (52 papers)
Tongcheng Fang (4 papers)
Enshu Liu (9 papers)
Guyue Huang (11 papers)
Zinan Lin (42 papers)
Shengen Yan (26 papers)
Guohao Dai (51 papers)
Yu Wang (939 papers)

Citations (7)

View on Semantic Scholar

Tweets

https://twitter.com/A_Suozhang98/status/1796050357899997464

https://twitter.com/gm8xx8/status/1795650516938289233

MixDQ: Memory-Efficient Few-Step Text-to-Image Diffusion Models with Metric-Decoupled Mixed Precision Quantization (2405.17873v2)

Related Papers

Tweets