Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 61 tok/s
Gemini 2.5 Pro 51 tok/s Pro
GPT-5 Medium 37 tok/s Pro
GPT-5 High 33 tok/s Pro
GPT-4o 95 tok/s Pro
Kimi K2 171 tok/s Pro
GPT OSS 120B 440 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

Probabilistic Weight Fixing: Large-scale training of neural network weight uncertainties for quantization (2309.13575v3)

Published 24 Sep 2023 in cs.LG and cs.AI

Abstract: Weight-sharing quantization has emerged as a technique to reduce energy expenditure during inference in large neural networks by constraining their weights to a limited set of values. However, existing methods for weight-sharing quantization often make assumptions about the treatment of weights based on value alone that neglect the unique role weight position plays. This paper proposes a probabilistic framework based on Bayesian neural networks (BNNs) and a variational relaxation to identify which weights can be moved to which cluster centre and to what degree based on their individual position-specific learned uncertainty distributions. We introduce a new initialisation setting and a regularisation term which allow for the training of BNNs under complex dataset-model combinations. By leveraging the flexibility of weight values captured through a probability distribution, we enhance noise resilience and downstream compressibility. Our iterative clustering procedure demonstrates superior compressibility and higher accuracy compared to state-of-the-art methods on both ResNet models and the more complex transformer-based architectures. In particular, our method outperforms the state-of-the-art quantization method top-1 accuracy by 1.6% on ImageNet using DeiT-Tiny, with its 5 million+ weights now represented by only 296 unique values.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (41)
  1. Variational network quantization. In International Conference on Learning Representations, 2018.
  2. Heavy tails in sgd and compressibility of overparametrized neural networks. Advances in Neural Information Processing Systems, 34:29364–29378, 2021.
  3. Weight uncertainty in neural network. In International conference on machine learning, pages 1613–1622. PMLR, 2015.
  4. Training with quantization noise for extreme model compression. arXiv preprint arXiv:2004.07320, 2020.
  5. Sharpness-aware minimization for efficiently improving generalization. arXiv preprint arXiv:2010.01412, 2020.
  6. Vincent Fortuin. Priors in bayesian deep learning: A review. International Statistical Review, 90(3):563–591, 2022.
  7. Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In international conference on machine learning, pages 1050–1059. PMLR, 2016.
  8. The heavy-tail phenomenon in sgd. In International Conference on Machine Learning, pages 3964–3975. PMLR, 2021.
  9. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149, 2015.
  10. Eie: Efficient inference engine on compressed deep neural network. ACM SIGARCH Computer Architecture News, 44(3):243–254, 2016.
  11. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
  12. Probabilistic backpropagation for scalable learning of bayesian neural networks. In International conference on machine learning, pages 1861–1869. PMLR, 2015.
  13. Multiplicative noise and heavy tails in stochastic optimization. In International Conference on Machine Learning, pages 4262–4274. PMLR, 2021.
  14. Mark Horowitz. 1.1 computing’s energy problem (and what we can do about it). In 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC), pages 10–14. IEEE, 2014.
  15. Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4700–4708, 2017.
  16. Quantization and training of neural networks for efficient integer-arithmetic-only inference. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2704–2713, 2018.
  17. Hands-on bayesian neural networks—a tutorial for deep learning users. IEEE Computational Intelligence Magazine, 17(2):29–48, 2022.
  18. In-datacenter performance analysis of a tensor processing unit. In Proceedings of the 44th annual international symposium on computer architecture, pages 1–12, 2017.
  19. Learning to quantize deep networks by optimizing quantization intervals with task loss. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4350–4359, 2019.
  20. When do flat minima optimizers work? Advances in Neural Information Processing Systems, 35:16577–16595, 2022.
  21. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013.
  22. Lognet: Energy-efficient neural networks using logarithmic computation. In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 5900–5904. IEEE, 2017.
  23. Additive powers-of-two quantization: An efficient non-uniform discretization for neural networks. In International Conference on Learning Representations, 2019a.
  24. Additive powers-of-two quantization: An efficient non-uniform discretization for neural networks. arXiv preprint arXiv:1909.13144, 2019b.
  25. Q-vit: Fully differentiable quantization for vision transformer. arXiv preprint arXiv:2201.07703, 2022.
  26. David JC MacKay. A practical bayesian framework for backpropagation networks. Neural computation, 4(3):448–472, 1992.
  27. Bayesian variable selection in linear regression. Journal of the american statistical association, 83(404):1023–1032, 1988.
  28. Automated log-scale quantization for low-cost deep neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 742–751, 2021.
  29. Energy efficient hardware acceleration of neural networks with power-of-two quantisation. In Computer Vision and Graphics: Proceedings of the International Conference on Computer Vision and Graphics ICCVG 2022, pages 225–236. Springer, 2023.
  30. And the bit goes down: Revisiting the quantization of neural networks. In ICLR 2020-Eighth International Conference on Learning Representations, pages 1–11, 2020.
  31. Weight fixing networks. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XI, pages 415–431. Springer, 2022.
  32. Efficient processing of deep neural networks: A tutorial and survey. Proceedings of the IEEE, 105(12):2295–2329, 2017.
  33. Training data-efficient image transformers & distillation through attention. In International conference on machine learning, pages 10347–10357. PMLR, 2021.
  34. Soft weight-sharing for neural network compression. arXiv preprint arXiv:1702.04008, 2017.
  35. Bit-shift-based accelerator for cnns with selectable accuracy and throughput. In 2019 22nd Euromicro Conference on Digital System Design (DSD), pages 663–667. IEEE, 2019.
  36. Bayesian learning via stochastic gradient langevin dynamics. In Proceedings of the 28th international conference on machine learning (ICML-11), pages 681–688, 2011.
  37. Deep κ𝜅\kappaitalic_κ-means: Re-training and parameter sharing with harder cluster assignments for compressing deep convolutions. 35th Int. Conf. Mach. Learn. ICML 2018, 12:8523–8532, 2018a.
  38. Deep k-means: Re-training and parameter sharing with harder cluster assignments for compressing deep convolutions. In International Conference on Machine Learning, pages 5363–5372. PMLR, 2018b.
  39. Kohei Yamamoto. Learnable companding quantization for accurate low-bit neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5029–5038, 2021.
  40. Incremental network quantization: Towards lossless cnns with low-precision weights. arXiv preprint arXiv:1702.03044, 2017.
  41. Dorefa-net: Training low bitwidth convolutional neural networks with low bitwidth gradients. arXiv preprint arXiv:1606.06160, 2016.

Summary

We haven't generated a summary for this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.