BinaryDM: Accurate Weight Binarization for Efficient Diffusion Models (2404.05662v5)

Published 8 Apr 2024 in cs.CV

Abstract: With the advancement of diffusion models (DMs) and the substantially increased computational requirements, quantization emerges as a practical solution to obtain compact and efficient low-bit DMs. However, the highly discrete representation leads to severe accuracy degradation, hindering the quantization of diffusion models to ultra-low bit-widths. This paper proposes a novel weight binarization approach for DMs, namely BinaryDM, pushing binarized DMs to be accurate and efficient by improving the representation and optimization. From the representation perspective, we present an Evolvable-Basis Binarizer (EBB) to enable a smooth evolution of DMs from full-precision to accurately binarized. EBB enhances information representation in the initial stage through the flexible combination of multiple binary bases and applies regularization to evolve into efficient single-basis binarization. The evolution only occurs in the head and tail of the DM architecture to retain the stability of training. From the optimization perspective, a Low-rank Representation Mimicking (LRM) is applied to assist the optimization of binarized DMs. The LRM mimics the representations of full-precision DMs in low-rank space, alleviating the direction ambiguity of the optimization process caused by fine-grained alignment. Comprehensive experiments demonstrate that BinaryDM achieves significant accuracy and efficiency gains compared to SOTA quantization methods of DMs under ultra-low bit-widths. With 1-bit weight and 4-bit activation (W1A4), BinaryDM achieves as low as 7.74 FID and saves the performance from collapse (baseline FID 10.87). As the first binarization method for diffusion models, W1A4 BinaryDM achieves impressive 15.2x OPs and 29.2x model size savings, showcasing its substantial potential for edge deployment. The code is available at https://github.com/Xingyu-Zheng/BinaryDM.

References (45)

Citations (2)

View on Semantic Scholar

Summary

The paper introduces BinaryDM, which accurately binarizes diffusion model weights using a learnable multi-basis binarizer and low-rank representation mimicking.
It achieves a 16× reduction in FLOPs and a 27× decrease in storage requirements without significant performance loss.
This innovative framework establishes a new benchmark for model compression, enabling efficient diffusion model deployment on resource-limited devices.

Accurate Binarization of Diffusion Models through BinaryDM Approach

Introduction to Binarization in Diffusion Models

Diffusion models (DMs) have emerged as a significant breakthrough in generative models, offering compelling advantages in generating high-quality and diverse samples for various tasks. Despite their impressive capabilities, the practical deployment of DMs is hampered by their considerable demand for computational resources, making their application challenging, particularly on resource-constrained platforms. This situation has prompted research into model compression techniques, including quantization and binarization, aiming to mitigate computational and memory resource requirements. Unlike conventional quantization, binarization reduces the weight parameters of DMs to 1-bit, facilitating remarkable reductions in model size and computational expense. However, achieving binarization without significantly compromising model accuracy presents a critical challenge, given the drastic reduction in the representational capacity of binarized models.

Overview of BinaryDM

In addressing the challenge of effectively binarizing diffusion models without substantial performance degradation, the paper introduces BinaryDM, a novel framework designed for accurate quantization-aware training of DMs. BinaryDM pioneers the push towards 1-bit weight parameters within diffusion models, employing two key innovations to overcome the limitations associated with binarization:

Learnable Multi-basis Binarizer (LMB): This component is crafted to enhance the capability of binarized weights in DMs, facilitating enriched representation of information that is crucial for maintaining accuracy. It utilizes a dual set of binary bases with learnable scalars to amplify the representational flexibility of the binarized parameters.
Low-rank Representation Mimicking (LRM): Aimed at refining the optimization process for binarized diffusion models, LRM projects both binarized and full-precision model representations into a low-rank space. This approach effectively sharpens the focus of the optimization process, alleviating ambiguities and stabilizing the convergence of DMs, especially under stringent quantization constraints.

BinaryDM proves its effectiveness through comprehensive experiments, demonstrating significant accuracy improvements over existing state-of-the-art (SOTA) quantization methods for DMs, even under ultra-low bit-width configurations. Specifically, the framework attains remarkable savings in computational resources, showcasing a 16.0× reduction in floating-point operations (FLOPs) and a 27.1× decrease in storage requirements when employing 1-bit weight and 4-bit activation quantization. These achievements mark a substantial stride towards deploying DMs in resource-limited environments without sacrificing model performance.

Implications and Future Directions

The advent of BinaryDM introduces a promising avenue for the deployment of diffusion models in scenarios constrained by computational resources and storage capacities. By effectively circumventing the substantial performance drop traditionally associated with binarization, BinaryDM not only extends the accessibility of DMs but also sets a new benchmark for future research into model compression techniques.

Given the framework's success, future explorations could venture into various directions, including optimizing BinaryDM for a broader array of tasks beyond image generation. Additionally, investigations may focus on further refining the LMB and LRM components to enhance their efficiency and efficacy, potentially leading to even more compact and computationally frugal diffusion models. As the quest for deploying advanced AI models on edge devices continues, BinaryDM stands as a compelling testament to the feasibility of balancing computational efficiency with model accuracy, opening new horizons for the practical application of diffusion models across diverse platforms.

PDF Markdown

Related Papers

Tweets

https://twitter.com/arankomatsuzaki/status/1777527610844557765