BinaryDM: Accurate Weight Binarization for Efficient Diffusion Models (2404.05662v5)
Abstract: With the advancement of diffusion models (DMs) and the substantially increased computational requirements, quantization emerges as a practical solution to obtain compact and efficient low-bit DMs. However, the highly discrete representation leads to severe accuracy degradation, hindering the quantization of diffusion models to ultra-low bit-widths. This paper proposes a novel weight binarization approach for DMs, namely BinaryDM, pushing binarized DMs to be accurate and efficient by improving the representation and optimization. From the representation perspective, we present an Evolvable-Basis Binarizer (EBB) to enable a smooth evolution of DMs from full-precision to accurately binarized. EBB enhances information representation in the initial stage through the flexible combination of multiple binary bases and applies regularization to evolve into efficient single-basis binarization. The evolution only occurs in the head and tail of the DM architecture to retain the stability of training. From the optimization perspective, a Low-rank Representation Mimicking (LRM) is applied to assist the optimization of binarized DMs. The LRM mimics the representations of full-precision DMs in low-rank space, alleviating the direction ambiguity of the optimization process caused by fine-grained alignment. Comprehensive experiments demonstrate that BinaryDM achieves significant accuracy and efficiency gains compared to SOTA quantization methods of DMs under ultra-low bit-widths. With 1-bit weight and 4-bit activation (W1A4), BinaryDM achieves as low as 7.74 FID and saves the performance from collapse (baseline FID 10.87). As the first binarization method for diffusion models, W1A4 BinaryDM achieves impressive 15.2x OPs and 29.2x model size savings, showcasing its substantial potential for edge deployment. The code is available at https://github.com/Xingyu-Zheng/BinaryDM.
- Analytic-dpm: an analytic estimate of the optimal reverse variance in diffusion probabilistic models. arXiv preprint arXiv:2201.06503, 2022.
- Wavegrad: Estimating gradients for waveform generation. arXiv preprint arXiv:2009.00713, 2020.
- Binarized neural networks: Training deep neural networks with weights and activations constrained to+ 1 or-1. arXiv preprint arXiv:1602.02830, pp. 1–11, 2016.
- Imagenet: A large-scale hierarchical image database. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE, 2009.
- Learned step size quantization. In International Conference on Learning Representations, pp. 1–12, 2019.
- Structural pruning for diffusion models. arXiv preprint arXiv:2305.10924, 2023.
- A survey of quantization methods for efficient neural network inference. In Low-Power Computer Vision, pp. 291–326. Chapman and Hall/CRC, 2022.
- Differentiable soft quantization: Bridging full-precision and low-bit neural networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4852–4861, 2019.
- Efficientdm: Efficient quantization-aware fine-tuning of low-bit diffusion models. arXiv preprint arXiv:2310.03270, 2023.
- Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in neural information processing systems, 30, 2017.
- Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020.
- Imagen video: High definition video generation with diffusion models. arXiv preprint arXiv:2210.02303, 2022.
- Binarized neural networks. Advances in Neural Information Processing Systems, 29:1–9, 2016.
- Diff-tts: A denoising diffusion model for text-to-speech. arXiv preprint arXiv:2104.01409, 2021.
- A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 4401–4410, 2019.
- Learning multiple layers of features from tiny images. 2009.
- Q-diffusion: Quantizing diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 17535–17545, 2023a.
- Brecq: Pushing the limit of post-training quantization by block reconstruction. In International Conference on Learning Representations, pp. 1–16, 2020.
- Q-dm: An efficient low-bit quantized diffusion model. In Thirty-seventh Conference on Neural Information Processing Systems, 2023b.
- Fq-vit: Post-training quantization for fully quantized vision transformer. arXiv preprint arXiv:2111.13824, 2021.
- Reactnet: Towards precise binary neural network with generalized activation functions. In Proceedings of the European Conference on Computer Vision, pp. 143–159. Springer, 2020.
- Luo, W. A comprehensive survey on knowledge distillation of diffusion models. arXiv preprint arXiv:2304.04262, 2023.
- Vidm: Video implicit diffusion models. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pp. 9117–9125, 2023.
- On distillation of guided diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14297–14306, 2023.
- Symbolic music generation with diffusion models. arXiv preprint arXiv:2103.16091, 2021.
- Up or down? adaptive rounding for post-training quantization. In International Conference on Machine Learning, pp. 7197–7206. PMLR, 2020.
- Improved denoising diffusion probabilistic models. In International Conference on Machine Learning, pp. 8162–8171. PMLR, 2021.
- Permutation invariant graph generation via score-based generative modeling. In International Conference on Artificial Intelligence and Statistics, pp. 4474–4484. PMLR, 2020.
- Grad-tts: A diffusion probabilistic model for text-to-speech. In International Conference on Machine Learning, pp. 8599–8608. PMLR, 2021.
- Binary neural networks: A survey. Pattern Recognition, 105:107281, 2020.
- Distribution-sensitive information retention for accurate binary neural network. International Journal of Computer Vision, 131(1):26–47, 2023.
- Xnor-net: Imagenet classification using binary convolutional neural networks. In Proceedings of the European Conference on Computer Vision, pp. 525–542. Springer, 2016.
- Progressive distillation for fast sampling of diffusion models. arXiv preprint arXiv:2202.00512, 2022.
- Improved techniques for training gans. Advances in neural information processing systems, 29, 2016.
- Noise estimation for generative diffusion models. arXiv preprint arXiv:2104.02600, 2021.
- Post-training quantization on diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1972–1981, 2023.
- Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502, 2020a.
- Generative modeling by estimating gradients of the data distribution. Advances in neural information processing systems, 32, 2019.
- Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456, 2020b.
- Learning to efficiently sample from diffusion probabilistic models. arXiv preprint arXiv:2106.03802, 2021.
- Qdrop: Randomly dropping quantization for extremely low-bit post-training quantization. arXiv preprint arXiv:2203.05740, 2022.
- Learning frequency domain approximation for binary neural networks. Advances in Neural Information Processing Systems, 34:25553–25565, 2021a.
- Recu: Reviving the dead weights in binary neural networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5198–5208, 2021b.
- Quantization networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7308–7316, 2019.
- Lsun: Construction of a large-scale image dataset using deep learning with humans in the loop. arXiv preprint arXiv:1506.03365, 2015.