Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Differentiable Search for Finding Optimal Quantization Strategy (2404.08010v2)

Published 10 Apr 2024 in cs.LG and eess.IV

Abstract: To accelerate and compress deep neural networks (DNNs), many network quantization algorithms have been proposed. Although the quantization strategy of any algorithm from the state-of-the-arts may outperform others in some network architectures, it is hard to prove the strategy is always better than others, and even cannot judge that the strategy is always the best choice for all layers in a network. In other words, existing quantization algorithms are suboptimal as they ignore the different characteristics of different layers and quantize all layers by a uniform quantization strategy. To solve the issue, in this paper, we propose a differentiable quantization strategy search (DQSS) to assign optimal quantization strategy for individual layer by taking advantages of the benefits of different quantization algorithms. Specifically, we formulate DQSS as a differentiable neural architecture search problem and adopt an efficient convolution to efficiently explore the mixed quantization strategies from a global perspective by gradient-based optimization. We conduct DQSS for post-training quantization to enable their performance to be comparable with that in full precision models. We also employ DQSS in quantization-aware training for further validating the effectiveness of DQSS. To circumvent the expensive optimization cost when employing DQSS in quantization-aware training, we update the hyper-parameters and the network parameters in a single forward-backward pass. Besides, we adjust the optimization process to avoid the potential under-fitting problem. Comprehensive experiments on high level computer vision task, i.e., image classification, and low level computer vision task, i.e., image super-resolution, with various network architectures show that DQSS could outperform the state-of-the-arts.

Towards Optimal Layer-wise Quantization Strategy: A Differentiable Approach

Introduction

The pursuit of compressing and accelerating deep neural networks (DNNs) for efficient deployment has led to various techniques, among which network quantization has emerged as a compelling approach. By reducing the precision of the network's weights and activations, quantization offers a pathway to diminishing model size and speeding up inference, catering to the constraints of resource-limited platforms. However, existing quantization practices universally apply a single strategy across all network layers, disregarding the distinct sensitivities and contributions of individual layers to the overall network performance. This paper introduces a novel Differentiable Quantization Strategy Search (DQSS) framework that addresses this limitation by autonomously determining an optimal quantization strategy for each layer.

Differentiable Quantization Strategy Search (DQSS)

DQSS is grounded on the realization that different layers within a network may respond differently to quantization, a factor that uniform quantization strategies fail to capitalize on. By formulating the search for an optimal quantization strategy as a differentiable problem akin to neural architecture search, DQSS leverages gradient-based optimization to explore a continuous space of quantization configurations. This approach facilitates the identification of layer-specific strategies from a predefined set of quantization algorithms, thereby tailoring the quantization process to the unique characteristics of each layer.

Core Contributions

  • The primary innovation of DQSS lies in its method for exploring mixed quantization strategies through a gradient-based method. By treating the search for optimal quantization as a differentiable problem, DQSS marks a distinctive shift from conventional, heuristic-driven approaches.
  • Introduction of an efficient convolution mechanism significantly reduces the computational complexity of exploring mixed strategies, paving the way for its application across various network architectures without incurring prohibitive computational costs.
  • DQSS extends its applicability beyond post-training quantization (PTQ) by incorporating it into quantization-aware training (QAT), demonstrating its versatility and effectiveness in enhancing model performance under quantization.
  • Comprehensive experiments across tasks of varying complexities underscore DQSS's superiority over state-of-the-art quantization methods. Notably, DQSS not only competes closely with full precision (FP32) models but in certain cases, surpasses their performance.

Experimental Validation

Evaluating on high-level computer vision tasks (image classification) and low-level tasks (image super-resolution) with a variety of network architectures, DQSS consistently demonstrated its ability to outperform conventional quantization approaches. This is particularly evident in scenarios involving PTQ, where DQSS showcased remarkable proficiency in retaining, and occasionally improving, the accuracy of quantized models relative to their FP32 counterparts. Additionally, the application of DQSS in QAT further validated its effectiveness, showcasing notable improvements over leading QAT methods, particularly in challenging network architectures such as MobileNet-V2.

Ablation Studies and Observations

Ablation studies provide insights into DQSS's operational dynamics, illustrating how different quantization strategies are selected for activations and weights across various layers. This intricately tailored approach is key to its success, allowing DQSS to leverage the strengths of diverse quantization algorithms according to the specific demands of each layer. Furthermore, the correlation between the performance improvements brought about by DQSS and the computational efficiencies realized through its efficient convolution mechanism highlights the framework's architectural ingenuity.

Future Directions

The groundwork laid by DQSS opens several avenues for future exploration. Incorporating a broader array of quantization algorithms into DQSS's search space could further augment its ability to fine-tune quantization strategies to the idiosyncrasies of each layer. Additionally, extending its application to a wider spectrum of tasks beyond the realms of image classification and super-resolution would offer a more comprehensive understanding of its versatility and limitations.

Conclusion

DQSS represents a significant advance in the domain of network quantization. By moving beyond the constraints of uniform quantization strategies, it introduces a sophisticated framework capable of optimizing layer-wise quantization in a principled and automated manner. The demonstrated efficacy of DQSS across diverse tasks and architectures not only underscores its immediate utility but also sets the stage for its evolution into an indispensable tool in the optimization of DNNs for resource-constrained environments.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (52)
  1. Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, 25:1097–1105, 2012.
  2. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
  3. Advances in natural language processing. Science, 349(6245):261–266, 2015.
  4. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
  5. Speech recognition with deep recurrent neural networks. In 2013 IEEE international conference on acoustics, speech and signal processing, pages 6645–6649. Ieee, 2013.
  6. Convolutional neural networks for speech recognition. IEEE/ACM Transactions on audio, speech, and language processing, 22(10):1533–1545, 2014.
  7. Filter level pruning based on similar feature extraction for convolutional neural networks. IEICE TRANSACTIONS on Information and Systems, 101(4):1203–1206, 2018.
  8. Towards optimal structured cnn pruning via generative adversarial learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2790–2799, 2019.
  9. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2015.
  10. A novel multi-knowledge distillation approach. IEICE TRANSACTIONS on Information and Systems, 104(1):216–219, 2021.
  11. Speeding up convolutional neural networks with low rank expansions. arXiv preprint arXiv:1405.3866, 2014.
  12. Convolutional neural networks with low-rank regularization. arXiv preprint arXiv:1511.06067, 2015.
  13. Efficient neural architecture search via parameters sharing. In International Conference on Machine Learning, pages 4095–4104. PMLR, 2018.
  14. Nas-bench-101: Towards reproducible neural architecture search. In International Conference on Machine Learning, pages 7105–7114. PMLR, 2019.
  15. Distribution-aware adaptive multi-bit quantization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9281–9290, 2021.
  16. Kohei Yamamoto. Learnable companding quantization for accurate low-bit neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5029–5038, 2021.
  17. Opq: Compressing deep neural networks with one-shot pruning-quantization. In Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI-21), Vancouver, VN, Canada, pages 2–9, 2021.
  18. Diversifying sample generation for accurate data-free quantization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15658–15667, 2021.
  19. Stochastic precision ensemble: self-knowledge distillation for quantized deep neural networks. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 6794–6802, 2021.
  20. Haq: Hardware-aware automated quantization with mixed precision. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8612–8620, 2019.
  21. Darts: Differentiable architecture search. arXiv preprint arXiv:1806.09055, 2018.
  22. Raghuraman Krishnamoorthi. Quantizing deep convolutional networks for efficient inference: A whitepaper. arXiv preprint arXiv:1806.08342, 2018.
  23. Han Vanholder. Efficient inference with tensorrt, 2016.
  24. Easyquant: Post-training quantization via scale optimization. arXiv preprint arXiv:2006.16669, 2020.
  25. Data-free quantization through weight equalization and bias correction. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 1325–1334, 2019.
  26. Up or down? adaptive rounding for post-training quantization. In International Conference on Machine Learning, pages 7197–7206. PMLR, 2020.
  27. Extremely low bit neural network: Squeeze the last bit out with admm. In Thirty-Second AAAI Conference on Artificial Intelligence, 2018.
  28. Fighting quantization bias with bias. arXiv preprint arXiv:1906.03193, 2019.
  29. Estimating or propagating gradients through stochastic neurons for conditional computation. arXiv preprint arXiv:1308.3432, 2013.
  30. Dorefa-net: Training low bitwidth convolutional neural networks with low bitwidth gradients. arXiv preprint arXiv:1606.06160, 2016.
  31. Differentiable soft quantization: Bridging full-precision and low-bit neural networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 4852–4861, 2019.
  32. Learned step size quantization. arXiv preprint arXiv:1902.08153, 2019.
  33. Pact: Parameterized clipping activation for quantized neural networks. arXiv preprint arXiv:1805.06085, 2018.
  34. Neural architecture search with reinforcement learning. arXiv preprint arXiv:1611.01578, 2016.
  35. Understanding and simplifying one-shot architecture search. In International Conference on Machine Learning, pages 550–559. PMLR, 2018.
  36. Search what you want: Barrier panelty nas for mixed precision quantization. In European Conference on Computer Vision, pages 1–16. Springer, 2020.
  37. Efficient bitwidth search for practical mixed precision neural network. arXiv preprint arXiv:2003.07577, 2020.
  38. Rethinking differentiable search for mixed-precision neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2349–2358, 2020.
  39. Differentiable joint pruning and quantization for hardware efficiency. In European Conference on Computer Vision, pages 259–277. Springer, 2020.
  40. Mqbench: Towards reproducible and deployable model quantization benchmark. 2021.
  41. G Anandalingam and Terry L Friesz. Hierarchical optimization: An introduction. Annals of Operations Research, 34(1):1–11, 1992.
  42. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32:8026–8037, 2019.
  43. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009.
  44. Ntire 2017 challenge on single image super-resolution: Methods and results. In CVPR 2017 Wrokshops.
  45. Low-complexity single-image super-resolution based on nonnegative neighbor embedding. In BMVC, 2012.
  46. Photo-realistic single image super-resolution using a generative adversarial network. In CVPR, 2017.
  47. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In ICCV, 2001.
  48. Single image super-resolution from transformed self-exemplars. In CVPR, 2015.
  49. Building a manga dataset “manga109” with annotations for multimedia applications. IEEE MultiMedia, 27(2):8–18, 2020.
  50. Esrgan: Enhanced super-resolution generative adversarial networks. In Proceedings of the European conference on computer vision (ECCV) workshops, pages 0–0, 2018.
  51. Enhanced deep residual networks for single image super-resolution. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, July 2017.
  52. BasicSR: Open source image and video restoration toolbox. https://github.com/xinntao/BasicSR, 2020.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Lianqiang Li (3 papers)
  2. Chenqian Yan (9 papers)
  3. Yefei Chen (3 papers)
Citations (1)
X Twitter Logo Streamline Icon: https://streamlinehq.com