Training Bayesian Neural Networks with Sparse Subspace Variational Inference (2402.11025v1)
Abstract: Bayesian neural networks (BNNs) offer uncertainty quantification but come with the downside of substantially increased training and inference costs. Sparse BNNs have been investigated for efficient inference, typically by either slowly introducing sparsity throughout the training or by post-training compression of dense BNNs. The dilemma of how to cut down massive training costs remains, particularly given the requirement to learn about the uncertainty. To solve this challenge, we introduce Sparse Subspace Variational Inference (SSVI), the first fully sparse BNN framework that maintains a consistently highly sparse Bayesian model throughout the training and inference phases. Starting from a randomly initialized low-dimensional sparse subspace, our approach alternately optimizes the sparse subspace basis selection and its associated parameters. While basis selection is characterized as a non-differentiable problem, we approximate the optimal solution with a removal-and-addition strategy, guided by novel criteria based on weight distribution statistics. Our extensive experiments show that SSVI sets new benchmarks in crafting sparse BNNs, achieving, for instance, a 10-20x compression in model size with under 3\% performance drop, and up to 20x FLOPs reduction during training compared with dense VI training. Remarkably, SSVI also demonstrates enhanced robustness to hyperparameters, reducing the need for intricate tuning in VI and occasionally even surpassing VI-trained dense BNNs on both accuracy and uncertainty metrics.
- A review on bayesian deep learning in healthcare: Applications and challenges. IEEE Access, 10:36538–36562, 2022.
- Efficient variational inference for sparse deep learning with theoretical guarantee. Advances in Neural Information Processing Systems, 33:466–476, 2020.
- Weight uncertainty in neural network. In International conference on machine learning, pp. 1613–1622. PMLR, 2015.
- Stochastic gradient hamiltonian monte carlo. In International conference on machine learning, pp. 1683–1691. PMLR, 2014.
- Scaling hamiltonian monte carlo inference for bayesian neural networks with symmetric splitting. In Uncertainty in Artificial Intelligence, pp. 675–685. PMLR, 2021.
- An adaptive empirical bayesian method for sparse deep learning. Advances in neural information processing systems, 32, 2019.
- Efficient and scalable bayesian neural nets with rank-1 factors. In International conference on machine learning, pp. 2782–2792. PMLR, 2020.
- Rigging the lottery: Making all tickets winners. In International Conference on Machine Learning, pp. 2943–2952. PMLR, 2020.
- Towards safe autonomous driving: Capture uncertainty in the deep neural network for lidar 3d vehicle detection. In 2018 21st international conference on intelligent transportation systems (ITSC), pp. 3266–3273. IEEE, 2018.
- A systematic comparison of bayesian deep learning robustness in diabetic retinopathy tasks. arXiv preprint arXiv:1912.10481, 2019.
- The lottery ticket hypothesis: Finding sparse, trainable neural networks. arXiv preprint arXiv:1803.03635, 2018.
- Structured variational learning of bayesian neural networks with horseshoe priors. In International Conference on Machine Learning, pp. 1744–1753. PMLR, 2018.
- Alex Graves. Practical variational inference for neural networks. Advances in neural information processing systems, 24, 2011.
- On calibration of modern neural networks. In International conference on machine learning, pp. 1321–1330. PMLR, 2017.
- Second order derivatives for network pruning: Optimal brain surgeon. Advances in neural information processing systems, 5, 1992.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778, 2016.
- Benchmarking neural network robustness to common corruptions and perturbations. arXiv preprint arXiv:1903.12261, 2019.
- Stochastic variational inference. Journal of Machine Learning Research, 2013.
- An introduction to variational methods for graphical models. Machine learning, 37:183–233, 1999.
- Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013.
- Variational dropout and the local reparameterization trick. Advances in neural information processing systems, 28, 2015.
- Masked bayesian neural networks: Theoretical guarantee and its posterior inference. arXiv preprint arXiv:2305.14765, 2023.
- A Krizhevsky. Learning multiple layers of features from tiny images. Master’s thesis, University of Tront, 2009.
- Bayesian compression for deep learning. Advances in neural information processing systems, 30, 2017.
- David JC MacKay. A practical bayesian framework for backpropagation networks. Neural computation, 4(3):448–472, 1992.
- Variational dropout sparsifies deep neural networks. In International Conference on Machine Learning, pp. 2498–2507. PMLR, 2017.
- Structured bayesian pruning via log-normal multiplicative noise. Advances in Neural Information Processing Systems, 30, 2017.
- Sparse uncertainty representation in deep learning with inducing weights. Advances in Neural Information Processing Systems, 34:6515–6528, 2021.
- Woodfisher: Efficient second-order approximation for neural network compression. Advances in Neural Information Processing Systems, 33:18098–18109, 2020.
- How to train deep variational autoencoders and probabilistic ladder networks. arXiv preprint arXiv:1602.02282, 3(2), 2016.
- Nikko Ström. Scalable distributed dnn training using commodity gpu cloud computing. 2015.
- Computationally efficient bayesian learning of gaussian process state space models. In Artificial Intelligence and Statistics, pp. 213–221. PMLR, 2016.
- Michael E Tipping. Sparse bayesian learning and the relevance vector machine. Journal of machine learning research, 1(Jun):211–244, 2001.
- Ursabench: A system for comprehensive benchmarking of bayesian deep neural network models and inference methods. Proceedings of Machine Learning and Systems, 4:217–237, 2022a.
- Impact of parameter sparsity on stochastic gradient mcmc methods for bayesian deep learning. arXiv preprint arXiv:2202.03770, 2022b.
- Bayesian sparse learning with preconditioned stochastic gradient mcmc and its applications. Journal of Computational Physics, 432:110134, 2021.
- Bayesian learning via stochastic gradient langevin dynamics. In Proceedings of the 28th international conference on machine learning (ICML-11), pp. 681–688, 2011.
- How good is the bayes posterior in deep neural networks really? arXiv preprint arXiv:2002.02405, 2020.
- Bayesian deep learning and a probabilistic perspective of generalization. Advances in neural information processing systems, 33:4697–4708, 2020.
- Stochastic particle-optimization sampling and the non-asymptotic convergence theory. In International Conference on Artificial Intelligence and Statistics, pp. 1877–1887. PMLR, 2020a.
- Variance reduction in stochastic particle-optimization sampling. In International Conference on Machine Learning, pp. 11307–11316. PMLR, 2020b.
- Amagold: Amortized metropolis adjustment for efficient stochastic gradient mcmc. In International Conference on Artificial Intelligence and Statistics, pp. 2142–2152. PMLR, 2020c.
- Asymptotically optimal exact minibatch metropolis-hastings. Advances in Neural Information Processing Systems, 33:19500–19510, 2020d.
- Cyclical stochastic gradient mcmc for bayesian deep learning. International Conference on Learning Representations, 2020e.
- Self-adversarially learned bayesian sampling. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pp. 5893–5900, 2019.