Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
194 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

BinaryDM: Accurate Weight Binarization for Efficient Diffusion Models (2404.05662v5)

Published 8 Apr 2024 in cs.CV

Abstract: With the advancement of diffusion models (DMs) and the substantially increased computational requirements, quantization emerges as a practical solution to obtain compact and efficient low-bit DMs. However, the highly discrete representation leads to severe accuracy degradation, hindering the quantization of diffusion models to ultra-low bit-widths. This paper proposes a novel weight binarization approach for DMs, namely BinaryDM, pushing binarized DMs to be accurate and efficient by improving the representation and optimization. From the representation perspective, we present an Evolvable-Basis Binarizer (EBB) to enable a smooth evolution of DMs from full-precision to accurately binarized. EBB enhances information representation in the initial stage through the flexible combination of multiple binary bases and applies regularization to evolve into efficient single-basis binarization. The evolution only occurs in the head and tail of the DM architecture to retain the stability of training. From the optimization perspective, a Low-rank Representation Mimicking (LRM) is applied to assist the optimization of binarized DMs. The LRM mimics the representations of full-precision DMs in low-rank space, alleviating the direction ambiguity of the optimization process caused by fine-grained alignment. Comprehensive experiments demonstrate that BinaryDM achieves significant accuracy and efficiency gains compared to SOTA quantization methods of DMs under ultra-low bit-widths. With 1-bit weight and 4-bit activation (W1A4), BinaryDM achieves as low as 7.74 FID and saves the performance from collapse (baseline FID 10.87). As the first binarization method for diffusion models, W1A4 BinaryDM achieves impressive 15.2x OPs and 29.2x model size savings, showcasing its substantial potential for edge deployment. The code is available at https://github.com/Xingyu-Zheng/BinaryDM.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (45)
  1. Analytic-dpm: an analytic estimate of the optimal reverse variance in diffusion probabilistic models. arXiv preprint arXiv:2201.06503, 2022.
  2. Wavegrad: Estimating gradients for waveform generation. arXiv preprint arXiv:2009.00713, 2020.
  3. Binarized neural networks: Training deep neural networks with weights and activations constrained to+ 1 or-1. arXiv preprint arXiv:1602.02830, pp.  1–11, 2016.
  4. Imagenet: A large-scale hierarchical image database. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  248–255. IEEE, 2009.
  5. Learned step size quantization. In International Conference on Learning Representations, pp.  1–12, 2019.
  6. Structural pruning for diffusion models. arXiv preprint arXiv:2305.10924, 2023.
  7. A survey of quantization methods for efficient neural network inference. In Low-Power Computer Vision, pp.  291–326. Chapman and Hall/CRC, 2022.
  8. Differentiable soft quantization: Bridging full-precision and low-bit neural networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.  4852–4861, 2019.
  9. Efficientdm: Efficient quantization-aware fine-tuning of low-bit diffusion models. arXiv preprint arXiv:2310.03270, 2023.
  10. Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in neural information processing systems, 30, 2017.
  11. Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020.
  12. Imagen video: High definition video generation with diffusion models. arXiv preprint arXiv:2210.02303, 2022.
  13. Binarized neural networks. Advances in Neural Information Processing Systems, 29:1–9, 2016.
  14. Diff-tts: A denoising diffusion model for text-to-speech. arXiv preprint arXiv:2104.01409, 2021.
  15. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.  4401–4410, 2019.
  16. Learning multiple layers of features from tiny images. 2009.
  17. Q-diffusion: Quantizing diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.  17535–17545, 2023a.
  18. Brecq: Pushing the limit of post-training quantization by block reconstruction. In International Conference on Learning Representations, pp.  1–16, 2020.
  19. Q-dm: An efficient low-bit quantized diffusion model. In Thirty-seventh Conference on Neural Information Processing Systems, 2023b.
  20. Fq-vit: Post-training quantization for fully quantized vision transformer. arXiv preprint arXiv:2111.13824, 2021.
  21. Reactnet: Towards precise binary neural network with generalized activation functions. In Proceedings of the European Conference on Computer Vision, pp.  143–159. Springer, 2020.
  22. Luo, W. A comprehensive survey on knowledge distillation of diffusion models. arXiv preprint arXiv:2304.04262, 2023.
  23. Vidm: Video implicit diffusion models. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pp.  9117–9125, 2023.
  24. On distillation of guided diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  14297–14306, 2023.
  25. Symbolic music generation with diffusion models. arXiv preprint arXiv:2103.16091, 2021.
  26. Up or down? adaptive rounding for post-training quantization. In International Conference on Machine Learning, pp.  7197–7206. PMLR, 2020.
  27. Improved denoising diffusion probabilistic models. In International Conference on Machine Learning, pp.  8162–8171. PMLR, 2021.
  28. Permutation invariant graph generation via score-based generative modeling. In International Conference on Artificial Intelligence and Statistics, pp.  4474–4484. PMLR, 2020.
  29. Grad-tts: A diffusion probabilistic model for text-to-speech. In International Conference on Machine Learning, pp.  8599–8608. PMLR, 2021.
  30. Binary neural networks: A survey. Pattern Recognition, 105:107281, 2020.
  31. Distribution-sensitive information retention for accurate binary neural network. International Journal of Computer Vision, 131(1):26–47, 2023.
  32. Xnor-net: Imagenet classification using binary convolutional neural networks. In Proceedings of the European Conference on Computer Vision, pp.  525–542. Springer, 2016.
  33. Progressive distillation for fast sampling of diffusion models. arXiv preprint arXiv:2202.00512, 2022.
  34. Improved techniques for training gans. Advances in neural information processing systems, 29, 2016.
  35. Noise estimation for generative diffusion models. arXiv preprint arXiv:2104.02600, 2021.
  36. Post-training quantization on diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  1972–1981, 2023.
  37. Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502, 2020a.
  38. Generative modeling by estimating gradients of the data distribution. Advances in neural information processing systems, 32, 2019.
  39. Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456, 2020b.
  40. Learning to efficiently sample from diffusion probabilistic models. arXiv preprint arXiv:2106.03802, 2021.
  41. Qdrop: Randomly dropping quantization for extremely low-bit post-training quantization. arXiv preprint arXiv:2203.05740, 2022.
  42. Learning frequency domain approximation for binary neural networks. Advances in Neural Information Processing Systems, 34:25553–25565, 2021a.
  43. Recu: Reviving the dead weights in binary neural networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.  5198–5208, 2021b.
  44. Quantization networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  7308–7316, 2019.
  45. Lsun: Construction of a large-scale image dataset using deep learning with humans in the loop. arXiv preprint arXiv:1506.03365, 2015.
Citations (2)

Summary

  • The paper introduces BinaryDM, which accurately binarizes diffusion model weights using a learnable multi-basis binarizer and low-rank representation mimicking.
  • It achieves a 16× reduction in FLOPs and a 27× decrease in storage requirements without significant performance loss.
  • This innovative framework establishes a new benchmark for model compression, enabling efficient diffusion model deployment on resource-limited devices.

Accurate Binarization of Diffusion Models through BinaryDM Approach

Introduction to Binarization in Diffusion Models

Diffusion models (DMs) have emerged as a significant breakthrough in generative models, offering compelling advantages in generating high-quality and diverse samples for various tasks. Despite their impressive capabilities, the practical deployment of DMs is hampered by their considerable demand for computational resources, making their application challenging, particularly on resource-constrained platforms. This situation has prompted research into model compression techniques, including quantization and binarization, aiming to mitigate computational and memory resource requirements. Unlike conventional quantization, binarization reduces the weight parameters of DMs to 1-bit, facilitating remarkable reductions in model size and computational expense. However, achieving binarization without significantly compromising model accuracy presents a critical challenge, given the drastic reduction in the representational capacity of binarized models.

Overview of BinaryDM

In addressing the challenge of effectively binarizing diffusion models without substantial performance degradation, the paper introduces BinaryDM, a novel framework designed for accurate quantization-aware training of DMs. BinaryDM pioneers the push towards 1-bit weight parameters within diffusion models, employing two key innovations to overcome the limitations associated with binarization:

  • Learnable Multi-basis Binarizer (LMB): This component is crafted to enhance the capability of binarized weights in DMs, facilitating enriched representation of information that is crucial for maintaining accuracy. It utilizes a dual set of binary bases with learnable scalars to amplify the representational flexibility of the binarized parameters.
  • Low-rank Representation Mimicking (LRM): Aimed at refining the optimization process for binarized diffusion models, LRM projects both binarized and full-precision model representations into a low-rank space. This approach effectively sharpens the focus of the optimization process, alleviating ambiguities and stabilizing the convergence of DMs, especially under stringent quantization constraints.

BinaryDM proves its effectiveness through comprehensive experiments, demonstrating significant accuracy improvements over existing state-of-the-art (SOTA) quantization methods for DMs, even under ultra-low bit-width configurations. Specifically, the framework attains remarkable savings in computational resources, showcasing a 16.0× reduction in floating-point operations (FLOPs) and a 27.1× decrease in storage requirements when employing 1-bit weight and 4-bit activation quantization. These achievements mark a substantial stride towards deploying DMs in resource-limited environments without sacrificing model performance.

Implications and Future Directions

The advent of BinaryDM introduces a promising avenue for the deployment of diffusion models in scenarios constrained by computational resources and storage capacities. By effectively circumventing the substantial performance drop traditionally associated with binarization, BinaryDM not only extends the accessibility of DMs but also sets a new benchmark for future research into model compression techniques.

Given the framework's success, future explorations could venture into various directions, including optimizing BinaryDM for a broader array of tasks beyond image generation. Additionally, investigations may focus on further refining the LMB and LRM components to enhance their efficiency and efficacy, potentially leading to even more compact and computationally frugal diffusion models. As the quest for deploying advanced AI models on edge devices continues, BinaryDM stands as a compelling testament to the feasibility of balancing computational efficiency with model accuracy, opening new horizons for the practical application of diffusion models across diverse platforms.

X Twitter Logo Streamline Icon: https://streamlinehq.com