Learning Discrete Weights and Activations Using the Local Reparameterization Trick (2307.01683v1)
Abstract: In computer vision and machine learning, a crucial challenge is to lower the computation and memory demands for neural network inference. A commonplace solution to address this challenge is through the use of binarization. By binarizing the network weights and activations, one can significantly reduce computational complexity by substituting the computationally expensive floating operations with faster bitwise operations. This leads to a more efficient neural network inference that can be deployed on low-resource devices. In this work, we extend previous approaches that trained networks with discrete weights using the local reparameterization trick to also allow for discrete activations. The original approach optimized a distribution over the discrete weights and uses the central limit theorem to approximate the pre-activation with a continuous Gaussian distribution. Here we show that the probabilistic modeling can also allow effective training of networks with discrete activation as well. This further reduces runtime and memory footprint at inference time with state-of-the-art results for networks with binary activations.
- Mnih C. J. Maddison, A and Y. Whye Teh. The concrete distribution: A continuous relaxation of discrete random variables. International Conference on Learning Representations (ICLR), 2017.
- chrundle. biprop. https://github.com/chrundle/biprop, 2022.
- Binaryconnect: Training deep neural networks with binary weights during propagations. Advances in Neural Information Processing Systems (NIPS), 2015.
- BNN+: improved binary network training. CoRR, abs/1812.11800, 2018. URL http://arxiv.org/abs/1812.11800.
- J. Diffenderfer and B. Kailkhura. Multi-prize lottery ticket hypothesis: Finding accurate binary neural networks by pruning A randomly weighted network. International Conference on Learning Representations (ICLR), 2021.
- Regularizing activation distribution for training binarized deep networks. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
- S. Gu E. Jang and B. Poole. Categorical reparameterization with gumbel-softmax. International Conference on Learning Representations (ICLR), 2017.
- SYQ: learning symmetric quantization for efficient deep neural networks. 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pages 4300–4309, 2018.
- Ternary weight networks. Advances in Neural Information Processing Systems (NIPS Workshop), 2016.
- The lottery ticket hypothesis: Training pruned neural networks. International Conference on Learning Representations (ICLR), 2019.
- Differentiable soft quantization: Bridging full-precision and low-bit neural networks. International Conference on Computer Vision (ICCV), 2019.
- E. J. Gumbel. Statistical theory of extreme values and some practical applications: a series of lectures. Number 33. US Govt. Print. Office, 1954.
- Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. International Conference on Computer Vision (ICCV), 2015.
- Deep residual learning for image recognition. Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
- htqin. Ir-net. https://github.com/htqin/IR-Net, 2020.
- From hashing to cnns: Training binary weight networks via hashing. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 32, 2018.
- Binarized neural networks. Advances in Neural Information Processing Systems (NIPS), 2016.
- Batch normalization: Accelerating deep network training by reducing internal covariate shift. International Conference on Machine Learning (ICML), 2015.
- D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. International Conference on Learning Representations (ICLR), 2015.
- Auto-encoding variational bayes. International Conference on Learning Representations (ICLR), 2014.
- Variational dropout and the local reparameterization trick. Advances in Neural Information Processing Systems (NIPS), 2015.
- A. Krizhevsky. Learning multiple layers of features from tiny images. Tech Report, 2009.
- Fixed point quantization of deep convolutional networks. Proceedings of the 33nd International Conference on Machine Learning, ICML 2016, New York City, NY, USA, June 19-24, 2016, 48:2849–2858, 2016.
- Towards accurate binary convolutional neural network. Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, pages 345–353, 2017.
- Bi-real net: Enhancing the performance of 1-bit cnns with improved representational capability and advanced training algorithm. European conference on computer vision (ECCV), 2018.
- Joseph Redmon Mohammad Rastegari, Vicente Ordonez and Ali Farhadi. Xnor-net: Imagenet classification using binary convolutional neural networks. ECCV, 2016.
- D. Levi O. Shayer and E. Fetaya. Learning discrete weights using the local reparameterization trick. International Conference on Learning Representations (ICLR), 2018.
- Forward and backward information retention for accurate binary neural networks. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13-19, 2020, 2020.
- Balanced binary neural networks with gated residual. International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020.
- Very deep convolutional networks for large-scale image recognition. International Conference on Learning Representations (ICLR), 2015.
- Ronald J Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning, 1992.
- Dorefa-net: Training low bitwidth convolutional neural networks with low bitwidth gradients. CoRR, abs/1606.06160, 2016. URL http://arxiv.org/abs/1606.06160.
- Trained ternary quantization. International Conference on Learning Representations (ICLR), 2017.