Generative Low-bitwidth Data Free Quantization (2003.03603v3)

Published 7 Mar 2020 in cs.CV

Abstract: Neural network quantization is an effective way to compress deep models and improve their execution latency and energy efficiency, so that they can be deployed on mobile or embedded devices. Existing quantization methods require original data for calibration or fine-tuning to get better performance. However, in many real-world scenarios, the data may not be available due to confidential or private issues, thereby making existing quantization methods not applicable. Moreover, due to the absence of original data, the recently developed generative adversarial networks (GANs) cannot be applied to generate data. Although the full-precision model may contain rich data information, such information alone is hard to exploit for recovering the original data or generating new meaningful data. In this paper, we investigate a simple-yet-effective method called Generative Low-bitwidth Data Free Quantization (GDFQ) to remove the data dependence burden. Specifically, we propose a knowledge matching generator to produce meaningful fake data by exploiting classification boundary knowledge and distribution information in the pre-trained model. With the help of generated data, we can quantize a model by learning knowledge from the pre-trained model. Extensive experiments on three data sets demonstrate the effectiveness of our method. More critically, our method achieves much higher accuracy on 4-bit quantization than the existing data free quantization method. Code is available at https://github.com/xushoukai/GDFQ.

View on arXiv

Authors (7)

Shoukai Xu (3 papers)
Haokun Li (15 papers)
Bohan Zhuang (79 papers)
Jing Liu (526 papers)
Jiezhang Cao (38 papers)
Chuangrun Liang (1 paper)
Mingkui Tan (124 papers)

Citations (112)

View on Semantic Scholar

Summary

Generative Low-bitwidth Data Free Quantization: An Overview

The paper "Generative Low-bitwidth Data Free Quantization" addresses a significant challenge in the field of neural network quantization—the absence of original data due to privacy or confidentiality concerns. Quantization is crucial for deploying large models on resource-constrained devices like mobile phones or embedded systems, and traditionally requires access to original data for calibration and fine-tuning. This research presents a method that eliminates the dependency on such data, enabling effective deployment of deep models under these constraints.

Key Contributions and Methodology

The authors propose a novel approach known as Generative Data Free Quantization (GDFQ). The method leverages generative techniques to produce fake data by exploiting the knowledge encoded in pre-trained models. Notable components of this approach include:

Knowledge Matching Generator: This is the core of GDFQ. It constructs meaningful synthetic data by utilizing classification boundary information and the distribution characteristics captured in the batch normalization layers of the pre-trained models. This generator uses prior Gaussian noise conditioned on label information to produce fake data that approximate real data characteristics.
Fine-tuning with Generated Data: With generated data, the quantized model is fine-tuned by aligning its outputs with those of the full-precision model. This involves using a cross-entropy loss for label consistency and a Kullback-Leibler divergence loss to distill knowledge from the full-precision model's predictions, ensuring the quantized model learns the underlying distribution effectively.
Fixed Batch Normalization Statistics: The quantized model's batch normalization layers are fixed using statistics from the pre-trained models to stabilize fine-tuning, maintaining the data distribution akin to original data.

Experimental Results

Experiments conducted on datasets such as CIFAR-10, CIFAR-100, and ImageNet demonstrate superior performance of the GDFQ method compared to existing data-free quantization methods like ZeroQ. Particularly, GDFQ offers notable improvements in accuracy with low-bitwidth quantization (specifically 4-bit) across diverse model architectures. On ImageNet, for instance, the quantized models using GDFQ recover significantly more accuracy than those quantized with previous methods.

The results validate the effectiveness of using generative techniques for data-free scenarios, where direct access to information for generating synthetic data benefits the quantization process. By minimizing reliance on original data, GDFQ provides a practical solution that could be pivotal for applications needing secure deployment of neural networks.

Implications and Future Directions

The proposed method offers a substantial advancement in neural network deployment on edge devices, particularly in scenarios where data privacy and security are paramount. It opens the avenue for further exploration into generative approaches for other model-compression techniques beyond quantization.

Future research may focus on enhancing the generator's capabilities to further improve the fidelity of synthetic data and exploring the application of data-free quantization in broader areas of AI, including NLP and reinforcement learning. Additionally, integrating advanced GAN architectures could optimize synthetic data generation, thereby improving quantization even under stricter precision constraints. The impact of such developments would extend to numerous applications, enabling efficient and secure AI implementations in various fields.

PDF Markdown

Related Papers

Find Related Papers

GitHub

GitHub - xushoukai/GDFQ: official implementation of Generative Low-bitwidth Data Free Quantization(GDFQ) (53 stars)