Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
162 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

QGen: On the Ability to Generalize in Quantization Aware Training (2404.11769v2)

Published 17 Apr 2024 in cs.LG and cs.CV

Abstract: Quantization lowers memory usage, computational requirements, and latency by utilizing fewer bits to represent model weights and activations. In this work, we investigate the generalization properties of quantized neural networks, a characteristic that has received little attention despite its implications on model performance. In particular, first, we develop a theoretical model for quantization in neural networks and demonstrate how quantization functions as a form of regularization. Second, motivated by recent work connecting the sharpness of the loss landscape and generalization, we derive an approximate bound for the generalization of quantized models conditioned on the amount of quantization noise. We then validate our hypothesis by experimenting with over 2000 models trained on CIFAR-10, CIFAR-100, and ImageNet datasets on convolutional and transformer-based models.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (55)
  1. Models and pre-trained weights and mdash; Torchvision 0.15 documentation. https://pytorch.org/vision/stable/models.html. [Online; accessed 2023-05-19].
  2. Universally quantized neural compression. Advances in neural information processing systems, 33:12367–12376, 2020.
  3. Francis Bach. Breaking the curse of dimensionality with convex neural networks. The Journal of Machine Learning Research, 18(1):629–681, 2017.
  4. Batchquant: Quantized-for-all architecture search with robust quantizer. Advances in Neural Information Processing Systems, 34:1074–1085, 2021.
  5. Spectrally-normalized margin bounds for neural networks. Advances in neural information processing systems, 30, 2017.
  6. Quantization of deep neural networks for accurate edge computing. ACM Journal on Emerging Technologies in Computing Systems (JETC), 17(4):1–11, 2021.
  7. Robust quantization: One model to rule them all. Advances in neural information processing systems, 33:5308–5317, 2020.
  8. Binaryconnect: Training deep neural networks with binary weights during propagations. Advances in neural information processing systems, 28, 2015.
  9. Autoaugment: Learning augmentation policies from data. arXiv preprint arXiv:1805.09501, 2018.
  10. Differentiable model compression via pseudo quantization noise. arXiv preprint arXiv:2104.09987, 2021.
  11. Imagenet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp.  248–255, 2009. doi: 10.1109/CVPR.2009.5206848.
  12. Sharp minima can generalize for deep nets. In International Conference on Machine Learning, pp.  1019–1028. PMLR, 2017.
  13. Sharpness-aware training for free. Advances in Neural Information Processing Systems, 35:23439–23451, 2022.
  14. Computing nonvacuous generalization bounds for deep (stochastic) neural networks with many more parameters than training data. arXiv preprint arXiv:1703.11008, 2017.
  15. In search of robust measures of generalization. Advances in Neural Information Processing Systems, 33:11723–11733, 2020.
  16. Learned step size quantization. In International Conference on Learning Representations, 2020. URL https://openreview.net/forum?id=rkgO66VKDS.
  17. Sharpness-aware minimization for efficiently improving generalization. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=6Tm1mposlrM.
  18. Attacking binarized neural networks. arXiv preprint arXiv:1711.00449, 2017.
  19. Efficient error-tolerant quantized neural network accelerators. In 2019 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT), pp.  1–6. IEEE, 2019.
  20. A survey of quantization methods for efficient neural network inference. arXiv preprint arXiv:2103.13630, 2021.
  21. On the adversarial robustness of quantized neural networks. In Proceedings of the 2021 on Great Lakes Symposium on VLSI, pp.  189–194, 2021.
  22. Benchmarking neural network robustness to common corruptions and perturbations. Proceedings of the International Conference on Learning Representations, 2019.
  23. Analysis of quantized models. In International Conference on Learning Representations, 2019.
  24. Binarized neural networks. Advances in neural information processing systems, 29, 2016.
  25. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International conference on machine learning, pp.  448–456. pmlr, 2015.
  26. Averaging weights leads to wider optima and better generalization. arXiv preprint arXiv:1803.05407, 2018.
  27. Quantization and training of neural networks for efficient integer-arithmetic-only inference. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018.
  28. Neural tangent kernel: Convergence and generalization in neural networks. Advances in neural information processing systems, 31, 2018.
  29. Fantastic generalization measures and where to find them. arXiv preprint arXiv:1912.02178, 2019.
  30. On large-batch training for deep learning: Generalization gap and sharp minima. arXiv preprint arXiv:1609.04836, 2016.
  31. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  32. Raghuraman Krishnamoorthi. Quantizing deep convolutional networks for efficient inference: A whitepaper. CoRR, abs/1806.08342, 2018. URL http://arxiv.org/abs/1806.08342.
  33. Alex Krizhevsky. Learning multiple layers of features from tiny images. pp.  32–33, 2009. URL https://www.cs.toronto.edu/~kriz/learning-features-2009-TR.pdf.
  34. Training quantized nets: A deeper understanding. Advances in Neural Information Processing Systems, 30, 2017.
  35. Visualizing the loss landscape of neural nets. In Neural Information Processing Systems, 2018.
  36. Fisher-rao metric, geometry, and complexity of neural networks. In The 22nd international conference on artificial intelligence and statistics, pp.  888–896. PMLR, 2019.
  37. Defensive quantization: When efficiency meets robustness. arXiv preprint arXiv:1904.08444, 2019.
  38. Network in network. arXiv preprint arXiv:1312.4400, 2013.
  39. Sharpness-aware quantization for deep neural networks. arXiv preprint arXiv:2111.12273, 2021.
  40. Optimizing neural networks with kronecker-factored approximate curvature. In International conference on machine learning, pp.  2408–2417. PMLR, 2015.
  41. David A McAllester. Pac-bayesian model averaging. In Proceedings of the twelfth annual conference on Computational learning theory, pp.  164–170, 1999.
  42. Low-bit quantization and quantization-aware training for small-footprint keyword spotting. In 2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA), pp.  706–711. IEEE, 2019.
  43. Reading digits in natural images with unsupervised feature learning. In NIPS Workshop on Deep Learning and Unsupervised Feature Learning 2011, 2011. URL http://ufldl.stanford.edu/housenumbers/nips2011_housenumbers.pdf.
  44. A pac-bayesian approach to spectrally-normalized margin bounds for neural networks. arXiv preprint arXiv:1707.09564, 2017.
  45. Model compression via distillation and quantization. In International Conference on Learning Representations, 2018. URL https://openreview.net/forum?id=S1XolQbRW.
  46. Do imagenet classifiers generalize to imagenet? In International conference on machine learning, pp.  5389–5400. PMLR, 2019.
  47. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  48. Squat: Sharpness-and quantization-aware training for bert. arXiv preprint arXiv:2210.07171, 2022.
  49. Statistical theory of quantization. IEEE Transactions on instrumentation and measurement, 45(2):353–361, 1996.
  50. Variation-aware vision transformer quantization. arXiv preprint arXiv:2307.00331, 2023.
  51. Alternating multi-bit quantization for recurrent neural networks. arXiv preprint arXiv:1802.00150, 2018.
  52. Understanding deep learning (still) requires rethinking generalization. Communications of the ACM, 64(3):107–115, 2021.
  53. Why quantization improves generalization: Ntk of binary weight neural networks, 2022a. URL https://arxiv.org/abs/2206.05916.
  54. Qebverif: Quantization error bound verification of neural networks. arXiv preprint arXiv:2212.02781, 2022b.
  55. Aojun Zhou et al. Incremental network quantization: Towards lossless cnns with low-precision weights. CoRR, abs/1702.03044, 2017.

Summary

We haven't generated a summary for this paper yet.