Probabilistic Weight Fixing: Large-scale training of neural network weight uncertainties for quantization (2309.13575v3)
Abstract: Weight-sharing quantization has emerged as a technique to reduce energy expenditure during inference in large neural networks by constraining their weights to a limited set of values. However, existing methods for weight-sharing quantization often make assumptions about the treatment of weights based on value alone that neglect the unique role weight position plays. This paper proposes a probabilistic framework based on Bayesian neural networks (BNNs) and a variational relaxation to identify which weights can be moved to which cluster centre and to what degree based on their individual position-specific learned uncertainty distributions. We introduce a new initialisation setting and a regularisation term which allow for the training of BNNs under complex dataset-model combinations. By leveraging the flexibility of weight values captured through a probability distribution, we enhance noise resilience and downstream compressibility. Our iterative clustering procedure demonstrates superior compressibility and higher accuracy compared to state-of-the-art methods on both ResNet models and the more complex transformer-based architectures. In particular, our method outperforms the state-of-the-art quantization method top-1 accuracy by 1.6% on ImageNet using DeiT-Tiny, with its 5 million+ weights now represented by only 296 unique values.
- Variational network quantization. In International Conference on Learning Representations, 2018.
- Heavy tails in sgd and compressibility of overparametrized neural networks. Advances in Neural Information Processing Systems, 34:29364–29378, 2021.
- Weight uncertainty in neural network. In International conference on machine learning, pages 1613–1622. PMLR, 2015.
- Training with quantization noise for extreme model compression. arXiv preprint arXiv:2004.07320, 2020.
- Sharpness-aware minimization for efficiently improving generalization. arXiv preprint arXiv:2010.01412, 2020.
- Vincent Fortuin. Priors in bayesian deep learning: A review. International Statistical Review, 90(3):563–591, 2022.
- Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In international conference on machine learning, pages 1050–1059. PMLR, 2016.
- The heavy-tail phenomenon in sgd. In International Conference on Machine Learning, pages 3964–3975. PMLR, 2021.
- Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149, 2015.
- Eie: Efficient inference engine on compressed deep neural network. ACM SIGARCH Computer Architecture News, 44(3):243–254, 2016.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
- Probabilistic backpropagation for scalable learning of bayesian neural networks. In International conference on machine learning, pages 1861–1869. PMLR, 2015.
- Multiplicative noise and heavy tails in stochastic optimization. In International Conference on Machine Learning, pages 4262–4274. PMLR, 2021.
- Mark Horowitz. 1.1 computing’s energy problem (and what we can do about it). In 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC), pages 10–14. IEEE, 2014.
- Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4700–4708, 2017.
- Quantization and training of neural networks for efficient integer-arithmetic-only inference. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2704–2713, 2018.
- Hands-on bayesian neural networks—a tutorial for deep learning users. IEEE Computational Intelligence Magazine, 17(2):29–48, 2022.
- In-datacenter performance analysis of a tensor processing unit. In Proceedings of the 44th annual international symposium on computer architecture, pages 1–12, 2017.
- Learning to quantize deep networks by optimizing quantization intervals with task loss. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4350–4359, 2019.
- When do flat minima optimizers work? Advances in Neural Information Processing Systems, 35:16577–16595, 2022.
- Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013.
- Lognet: Energy-efficient neural networks using logarithmic computation. In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 5900–5904. IEEE, 2017.
- Additive powers-of-two quantization: An efficient non-uniform discretization for neural networks. In International Conference on Learning Representations, 2019a.
- Additive powers-of-two quantization: An efficient non-uniform discretization for neural networks. arXiv preprint arXiv:1909.13144, 2019b.
- Q-vit: Fully differentiable quantization for vision transformer. arXiv preprint arXiv:2201.07703, 2022.
- David JC MacKay. A practical bayesian framework for backpropagation networks. Neural computation, 4(3):448–472, 1992.
- Bayesian variable selection in linear regression. Journal of the american statistical association, 83(404):1023–1032, 1988.
- Automated log-scale quantization for low-cost deep neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 742–751, 2021.
- Energy efficient hardware acceleration of neural networks with power-of-two quantisation. In Computer Vision and Graphics: Proceedings of the International Conference on Computer Vision and Graphics ICCVG 2022, pages 225–236. Springer, 2023.
- And the bit goes down: Revisiting the quantization of neural networks. In ICLR 2020-Eighth International Conference on Learning Representations, pages 1–11, 2020.
- Weight fixing networks. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XI, pages 415–431. Springer, 2022.
- Efficient processing of deep neural networks: A tutorial and survey. Proceedings of the IEEE, 105(12):2295–2329, 2017.
- Training data-efficient image transformers & distillation through attention. In International conference on machine learning, pages 10347–10357. PMLR, 2021.
- Soft weight-sharing for neural network compression. arXiv preprint arXiv:1702.04008, 2017.
- Bit-shift-based accelerator for cnns with selectable accuracy and throughput. In 2019 22nd Euromicro Conference on Digital System Design (DSD), pages 663–667. IEEE, 2019.
- Bayesian learning via stochastic gradient langevin dynamics. In Proceedings of the 28th international conference on machine learning (ICML-11), pages 681–688, 2011.
- Deep κ𝜅\kappaitalic_κ-means: Re-training and parameter sharing with harder cluster assignments for compressing deep convolutions. 35th Int. Conf. Mach. Learn. ICML 2018, 12:8523–8532, 2018a.
- Deep k-means: Re-training and parameter sharing with harder cluster assignments for compressing deep convolutions. In International Conference on Machine Learning, pages 5363–5372. PMLR, 2018b.
- Kohei Yamamoto. Learnable companding quantization for accurate low-bit neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5029–5038, 2021.
- Incremental network quantization: Towards lossless cnns with low-precision weights. arXiv preprint arXiv:1702.03044, 2017.
- Dorefa-net: Training low bitwidth convolutional neural networks with low bitwidth gradients. arXiv preprint arXiv:1606.06160, 2016.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.