Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

SPFQ: A Stochastic Algorithm and Its Error Analysis for Neural Network Quantization (2309.10975v1)

Published 20 Sep 2023 in cs.LG and stat.ML

Abstract: Quantization is a widely used compression method that effectively reduces redundancies in over-parameterized neural networks. However, existing quantization techniques for deep neural networks often lack a comprehensive error analysis due to the presence of non-convex loss functions and nonlinear activations. In this paper, we propose a fast stochastic algorithm for quantizing the weights of fully trained neural networks. Our approach leverages a greedy path-following mechanism in combination with a stochastic quantizer. Its computational complexity scales only linearly with the number of weights in the network, thereby enabling the efficient quantization of large networks. Importantly, we establish, for the first time, full-network error bounds, under an infinite alphabet condition and minimal assumptions on the weights and input data. As an application of this result, we prove that when quantizing a multi-layer network having Gaussian weights, the relative square quantization error exhibits a linear decay as the degree of over-parametrization increases. Furthermore, we demonstrate that it is possible to achieve error bounds equivalent to those obtained in the infinite alphabet case, using on the order of a mere $\log\log N$ bits per weight, where $N$ represents the largest number of neurons in a layer.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (40)
  1. N. N. Abdelmalek. Minimum l∞subscript𝑙l_{\infty}italic_l start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT solution of underdetermined systems of linear equations. Journal of Approximation Theory, 20(1):57–69, 1977.
  2. Discrepancy minimization via a self-balancing walk. In Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing, pages 14–20, 2021.
  3. J. A. Cadzow. A finite algorithm for the minimum l∞subscript𝑙l_{\infty}italic_l start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT solution to a system of consistent linear equations. SIAM Journal on Numerical Analysis, 10(4):607–617, 1973.
  4. J. A. Cadzow. An efficient algorithmic procedure for obtaining a minimum l∞subscript𝑙l_{\infty}italic_l start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT-norm solution to a system of consistent linear equations. SIAM Journal on Numerical Analysis, 11(6):1151–1165, 1974.
  5. Zeroq: A novel zero shot quantization framework. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13169–13178, 2020.
  6. A survey of model compression and acceleration for deep neural networks. arXiv preprint arXiv:1710.09282, 2017.
  7. Pact: Parameterized clipping activation for quantized neural networks. arXiv preprint arXiv:1805.06085, 2018.
  8. Low-bit quantization of neural networks for efficient inference. In 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), pages 3009–3018. IEEE, 2019.
  9. Binaryconnect: Training deep neural networks with binary weights during propagations. Advances in neural information processing systems, 28, 2015.
  10. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009.
  11. Model compression and hardware acceleration for neural networks: A comprehensive survey. Proceedings of the IEEE, 108(4):485–532, 2020.
  12. S. Foucart and H. Rauhut. An invitation to compressive sensing. In A mathematical introduction to compressive sensing, pages 1–39. Springer, 2013.
  13. Gptq: Accurate post-training quantization for generative pre-trained transformers. arXiv preprint arXiv:2210.17323, 2022.
  14. A survey of quantization methods for efficient neural network inference. arXiv preprint arXiv:2103.13630, 2021.
  15. Y. Guo. A survey on methods and theories of quantized neural networks. arXiv preprint arXiv:1808.04752, 2018.
  16. J. Hadar and W. R. Russell. Rules for ordering uncertain prospects. The American economic review, 59(1):25–34, 1969.
  17. G. Hanoch and H. Levy. The efficiency analysis of choices involving risk. In Stochastic Optimization Models in Finance, pages 89–100. Elsevier, 1975.
  18. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
  19. I. P. (https://mathoverflow.net/users/36721/iosif pinelis). ℓ∞subscriptℓ\ell_{\infty}roman_ℓ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT norm of two gaussian vector. MathOverflow, 2021. URL:https://mathoverflow.net/q/410242.
  20. Improving post training neural quantization: Layer-wise calibration and integer programming. arXiv preprint arXiv:2006.10518, 2020.
  21. Quantization and training of neural networks for efficient integer-arithmetic-only inference. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2704–2713, 2018.
  22. R. Krishnamoorthi. Quantizing deep convolutional networks for efficient inference: A whitepaper. arXiv preprint arXiv:1806.08342, 2018.
  23. E. Lybrand and R. Saab. A greedy algorithm for quantizing neural networks. J. Mach. Learn. Res., 22:156–1, 2021.
  24. M. Machina and J. Pratt. Increasing risk: some direct constructions. Journal of Risk and Uncertainty, 14(2):103–127, 1997.
  25. J. Maly and R. Saab. A simple approach for quantizing neural networks. Applied and Computational Harmonic Analysis, 66:138–150, 2023.
  26. Up or down? adaptive rounding for post-training quantization. In International Conference on Machine Learning, pages 7197–7206. PMLR, 2020.
  27. J. O. Neill. An overview of neural network compression. arXiv preprint arXiv:2006.03669, 2020.
  28. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32, 2019.
  29. A. Prékopa. Logarithmic concave measures with application to stochastic programming. Acta Scientiarum Mathematicarum, 32:301–316, 1971.
  30. A. Prékopa. On logarithmic concave measures and functions. Acta Scientiarum Mathematicarum, 34:335–343, 1973.
  31. M. Rothschild and J. E. Stiglitz. Increasing risk: I. a definition. Journal of Economic theory, 2(3):225–243, 1970.
  32. M. Shaked and J. G. Shanthikumar. Stochastic orders. Springer, 2007.
  33. K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. International Conference on Learning Representations, 2015.
  34. R. Vershynin. High-dimensional probability: An introduction with applications in data science, volume 47. Cambridge university press, 2018.
  35. Haq: Hardware-aware automated quantization with mixed precision. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8612–8620, 2019.
  36. Towards accurate post-training network quantization via bit-split and stitching. In International Conference on Machine Learning, pages 9847–9856. PMLR, 2020.
  37. Lq-nets: Learned quantization for highly accurate and compact deep neural networks. In Proceedings of the European conference on computer vision (ECCV), pages 365–382, 2018.
  38. Post-training quantization for neural networks with provable guarantees. SIAM Journal on Mathematics of Data Science, 5(2):373–399, 2023.
  39. Improving neural network quantization without retraining using outlier channel splitting. In International conference on machine learning, pages 7543–7552. PMLR, 2019.
  40. Incremental network quantization: Towards lossless cnns with low-precision weights. arXiv preprint arXiv:1702.03044, 2017.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Jinjie Zhang (10 papers)
  2. Rayan Saab (35 papers)

Summary

We haven't generated a summary for this paper yet.