Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Retraining-free Model Quantization via One-Shot Weight-Coupling Learning (2401.01543v2)

Published 3 Jan 2024 in cs.CV

Abstract: Quantization is of significance for compressing the over-parameterized deep neural models and deploying them on resource-limited devices. Fixed-precision quantization suffers from performance drop due to the limited numerical representation ability. Conversely, mixed-precision quantization (MPQ) is advocated to compress the model effectively by allocating heterogeneous bit-width for layers. MPQ is typically organized into a searching-retraining two-stage process. In this paper, we devise a one-shot training-searching paradigm for mixed-precision model compression. Specifically, in the first stage, all potential bit-width configurations are coupled and thus optimized simultaneously within a set of shared weights. However, our observations reveal a previously unseen and severe bit-width interference phenomenon among highly coupled weights during optimization, leading to considerable performance degradation under a high compression ratio. To tackle this problem, we first design a bit-width scheduler to dynamically freeze the most turbulent bit-width of layers during training, to ensure the rest bit-widths converged properly. Then, taking inspiration from information theory, we present an information distortion mitigation technique to align the behavior of the bad-performing bit-widths to the well-performing ones. In the second stage, an inference-only greedy search scheme is devised to evaluate the goodness of configurations without introducing any additional training costs. Extensive experiments on three representative models and three datasets demonstrate the effectiveness of the proposed method. Code can be available on \href{https://www.github.com/1hunters/retraining-free-quantization}{https://github.com/1hunters/retraining-free-quantization}.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (64)
  1. Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation. CoRR, abs/1308.3432, 2013.
  2. Bit-mixer: Mixed-precision networks with runtime bit-width selection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5188–5197, 2021.
  3. Rethinking Differentiable Search for Mixed-precision Neural Networks. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13-19, 2020, pages 2346–2355. Computer Vision Foundation / IEEE, 2020.
  4. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE transactions on pattern analysis and machine intelligence, 40(4):834–848, 2017.
  5. Towards Mixed-precision Quantization of Neural Networks via Constrained Optimization. In 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, October 10-17, 2021, pages 5330–5339. IEEE, 2021.
  6. PACT: Parameterized Clipping Activation for Quantized Neural Networks. CoRR, abs/1805.06085, 2018.
  7. Differentiable model compression via pseudo quantization noise. Transactions on Machine Learning Research, 2022.
  8. HAWQ: Hessian AWare Quantization of Neural Networks With Mixed-precision. In 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Korea (South), October 27 - November 2, 2019, pages 293–302. IEEE, 2019.
  9. HAWQ-V2: Hessian Aware trace-weighted Quantization of Neural Networks. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, 2020.
  10. ReLeQ : A Reinforcement Learning Approach for Automatic Deep Quantization of Neural Networks. IEEE Micro, 40(5):37–45, 2020.
  11. Learned Step Size quantization. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net, 2020.
  12. Differentiable Soft Quantization: Bridging Full-precision and Low-bit Neural Networks. In 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Korea (South), October 27 - November 2, 2019, pages 4851–4860. IEEE, 2019.
  13. Single Path One-shot Neural Architecture Search with Uniform Sampling. In Computer Vision - ECCV 2020 - 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part XVI, pages 544–560. Springer, 2020.
  14. HMQ: Hardware Friendly Mixed Precision Quantization Block for CNNs. In Computer Vision - ECCV 2020 - 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part XXVI, pages 448–463. Springer, 2020.
  15. Improving low-precision network quantization via bin regularization. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5261–5270, 2021a.
  16. Dynamic neural networks: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(11):7436–7456, 2021b.
  17. Deep Residual Learning for Image Recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pages 770–778. IEEE Computer Society, 2016.
  18. Mask r-cnn. In Proceedings of the IEEE international conference on computer vision, 2017.
  19. Searching for MobileNetV3. In 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Korea (South), October 27 - November 2, 2019, pages 1314–1324. IEEE, 2019.
  20. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. CoRR, abs/1704.04861, 2017.
  21. SDQ: Stochastic Differentiable Quantization with Mixed Precision. In International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA, pages 9295–9309. PMLR, 2022.
  22. Accurate post training quantization with small calibration sets. In International Conference on Machine Learning, pages 4466–4475. PMLR, 2021.
  23. Adabits: Neural network quantization with adaptive bit-widths. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2146–2156, 2020.
  24. Learning multiple layers of features from tiny images. 2009.
  25. Network quantization with element-wise gradient scaling. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 6448–6457, 2021.
  26. Runtime neural pruning. Advances in neural information processing systems, 30, 2017.
  27. Darts: Differentiable architecture search. arXiv preprint arXiv:1806.09055, 2018.
  28. Rethinking the Value of Network Pruning. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, 2019.
  29. Importance estimation for neural network pruning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11264–11272, 2019.
  30. Up or down? adaptive rounding for post-training quantization. In International Conference on Machine Learning, pages 7197–7206. PMLR, 2020.
  31. Overcoming oscillations in quantization-aware training. In International Conference on Machine Learning, pages 16318–16330. PMLR, 2022.
  32. Cats and dogs. In 2012 IEEE conference on computer vision and pattern recognition, pages 3498–3505. IEEE, 2012.
  33. Qbitopt: Fast and accurate bitwidth reallocation during training. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 1282–1291, 2023.
  34. Efficient neural architecture search via parameters sharing. In International conference on machine learning, pages 4095–4104. PMLR, 2018.
  35. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018.
  36. Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems, 28, 2015.
  37. MobileNetV2: Inverted Residuals and Linear Bottlenecks. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pages 4510–4520. Computer Vision Foundation / IEEE Computer Society, 2018.
  38. Fractional skipping: Towards finer-grained dynamic cnn inference. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 5700–5708, 2020.
  39. Nipq: Noise proxy-based integrated pseudo-quantization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3852–3861, 2023.
  40. Efficientnet: Rethinking model scaling for convolutional neural networks. In International conference on machine learning, pages 6105–6114. PMLR, 2019.
  41. Efficientdet: Scalable and efficient object detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10781–10790, 2020.
  42. Mixed-precision Neural Network Quantization via Learned Layer-wise Importance. In Computer Vision - ECCV 2022 - 17th European Conference, Tel Aviv, Israel, October 23-27, 2022, Proceedings, Part XI, pages 259–275. Springer, 2022a.
  43. Arbitrary bit-width network: A joint layer-wise quantization and adaptive inference approach. In Proceedings of the 30th ACM International Conference on Multimedia, pages 2899–2908, 2022b.
  44. Seam: Searching transferable mixed-precision quantization policy through large margin regularization. In Proceedings of the 31st ACM International Conference on Multimedia, pages 7971–7980, 2023a.
  45. Elasticvit: Conflict-aware supernet training for deploying fast vision transformer on diverse mobile devices. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023b.
  46. Deep learning and the information bottleneck principle. In 2015 ieee information theory workshop (itw), pages 1–5. IEEE, 2015.
  47. The information bottleneck method. arXiv preprint physics/0004057, 2000.
  48. Mixed precision dnns: All you need is a good parametrization. arXiv preprint arXiv:1905.11452, 2019.
  49. Bayesian bits: Unifying quantization and pruning. Advances in neural information processing systems, 33:5741–5752, 2020.
  50. Attentivenas: Improving neural architecture search via attentive sampling. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), 2021a.
  51. Skipbert: Efficient inference with shallow layer skipping. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 7287–7301, 2022.
  52. HAQ: Hardware-aware Automated Quantization With Mixed Precision. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019, pages 8612–8620. Computer Vision Foundation / IEEE, 2019.
  53. Skipnet: Learning dynamic routing in convolutional networks. In Proceedings of the European Conference on Computer Vision (ECCV), pages 409–424, 2018.
  54. Dynamic network pruning with interpretable layerwise channel selection. In Proceedings of the AAAI conference on artificial intelligence, pages 6299–6306, 2020.
  55. Generalizable Mixed-precision Quantization via Attribution Rank Preservation. In 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, October 10-17, 2021, pages 5271–5280. IEEE, 2021b.
  56. Mixed Precision Quantization of ConvNets via Differentiable Neural Architecture Search. CoRR, abs/1812.00090, 2018.
  57. Eq-net: Elastic quantization neural networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 1505–1514, 2023a.
  58. Q-detr: An efficient low-bit quantized detection transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3842–3851, 2023b.
  59. FracBits: Mixed Precision Quantization via Fractional Bit-widths. In Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Thirty-Third Conference on Innovative Applications of Artificial Intelligence, IAAI 2021, The Eleventh Symposium on Educational Advances in Artificial Intelligence, EAAI 2021, Virtual Event, February 2-9, 2021, pages 10612–10620. AAAI Press, 2021.
  60. Search What You Want: Barrier Panelty NAS for Mixed Precision Quantization. In Computer Vision - ECCV 2020 - 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part IX, pages 1–16. Springer, 2020a.
  61. Bignas: Scaling up neural architecture search with big single-stage models. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part VII 16, pages 702–717. Springer, 2020b.
  62. DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients. CoRR, abs/1606.06160, 2016.
  63. Adaptive quantization for deep neural network. In Proceedings of the AAAI Conference on Artificial Intelligence, 2018.
  64. Neural architecture search with reinforcement learning. arXiv preprint arXiv:1611.01578, 2016.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Chen Tang (94 papers)
  2. Yuan Meng (61 papers)
  3. Jiacheng Jiang (8 papers)
  4. Shuzhao Xie (13 papers)
  5. Rongwei Lu (7 papers)
  6. Xinzhu Ma (30 papers)
  7. Zhi Wang (261 papers)
  8. Wenwu Zhu (104 papers)
Citations (6)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com