Efficient Model Compression for Bayesian Neural Networks (2411.00273v1)
Abstract: Model Compression has drawn much attention within the deep learning community recently. Compressing a dense neural network offers many advantages including lower computation cost, deployability to devices of limited storage and memories, and resistance to adversarial attacks. This may be achieved via weight pruning or fully discarding certain input features. Here we demonstrate a novel strategy to emulate principles of Bayesian model selection in a deep learning setup. Given a fully connected Bayesian neural network with spike-and-slab priors trained via a variational algorithm, we obtain the posterior inclusion probability for every node that typically gets lost. We employ these probabilities for pruning and feature selection on a host of simulated and real-world benchmark data and find evidence of better generalizability of the pruned model in all our experiments.
- “Efficient Variational Inference for Sparse Deep Learning with Theoretical Guarantee.”
- “Ensemble Learning in Bayesian Neural Networks.” In Generalization in Neural Networks and Machine Learning, 215–237. Springer Verlag.
- “The generalization-stability tradeoff in neural network pruning.” Advances in Neural Information Processing Systems, 33: 20852–20864.
- “Reconciling modern machine-learning practice and the classical bias–variance trade-off.” Proceedings of the National Academy of Sciences, 116(32): 15849–15854.
- Bertin-Mahieux, T. (2011). “Year Prediction MSD.” UCI Machine Learning Repository. DOI: https://doi.org/10.24432/C50K61.
- “What is the State of Neural Network Pruning?” In Proceedings of Machine Learning and Systems 2020, MLSys 2020, Austin, TX, USA, March 2-4, 2020. mlsys.org.
- “Weight Uncertainty in Neural Networks.” In Proceedings of the 32nd International Conference on International Conference on Machine Learning - Volume 37, ICML’15, 1613–1622. JMLR.org.
- “Explaining bayesian neural networks.” arXiv preprint arXiv:2108.10346.
- Statistical Inference. Thomson Learning.
- “Model Compression and Acceleration for Deep Neural Networks: The Principles, Progress, and Challenges.” IEEE Signal Processing Magazine, 35(1): 126–136.
- “Prediction Via Orthogonalized Model Mixing.” Journal of the American Statistical Association, 91(435): 1197–1208.
- “Model Uncertainty.” Statistical Science, 19(1): 81 – 94.
- “Memory Bounded Deep Convolutional Networks.” arXiv:1412.1442.
- “Condition Based Maintenance of Naval Propulsion Plants.” UCI Machine Learning Repository. DOI: https://doi.org/10.24432/C5K31K.
- “Wine Quality.” UCI Machine Learning Repository. DOI: https://doi.org/10.24432/C56S3T.
- “Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1.”
- “Consistent feature selection for analytic deep neural networks.” In Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., and Lin, H. (eds.), Advances in Neural Information Processing Systems, volume 33, 2420–2431. Curran Associates, Inc. URL https://proceedings.neurips.cc/paper_files/paper/2020/file/1959eb9d5a0f7ebc58ebde81d5df400d-Paper.pdf
- “Bayesian Neural Networks for Selection of Drug Sensitive Genes.” Journal of the American Statistical Association, 113(523): 955–972. PMID: 31354179.
- Gal, Y. (2016). “Uncertainty in Deep Learning.” Ph.D. thesis, University of Cambridge.
- “Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning.” In Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48, ICML’16, 1050–1059. JMLR.org.
- “Variable Selection via Gibbs Sampling.” Journal of the American Statistical Association, 88(423): 881–889.
- — (1997). “Approaches for Bayesian Variable Selection.” Statistica Sinica, 7(2): 339–373.
- “Yacht Hydrodynamics.” UCI Machine Learning Repository. DOI: https://doi.org/10.24432/C5XG7R.
- “Model Selection in Bayesian Neural Networks via Horseshoe Priors.”
- “Bayesian Neural Networks: An Introduction and Survey.” In Case Studies in Applied Bayesian Data Science, 45–87. Springer.
- “Pruning Neural Networks During Training by Backpropagation.” In Proceedings of TENCON’94 - 1994 IEEE Region 10’s 9th Annual International Conference on: ’Frontiers of Computer Technology’, 805–808 vol.2.
- Deep Learning. MIT Press. http://www.deeplearningbook.org.
- “Dynamic Network Surgery for Efficient DNNs.”
- “Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization, and Huffman Coding.” arXiv:1510.00149.
- “Hedonic housing prices and the demand for clean air.” Journal of Environmental Economics and Management, 5: 81–102. URL https://api.semanticscholar.org/CorpusID:55571328
- “Probabilistic backpropagation for scalable learning of Bayesian neural networks.” In Proceedings of the 32nd International Conference on International Conference on Machine Learning - Volume 37, ICML’15, 1861–1869. JMLR.org.
- “Keeping the Neural Networks Simple by Minimizing the Description Length of the Weights.” In Proceedings of the Sixth Annual Conference on Computational Learning Theory, COLT ’93, 5–13. New York, NY, USA: Association for Computing Machinery.
- “Sparsity in deep learning: Pruning and growth for efficient inference and training in neural networks.” The Journal of Machine Learning Research, 22(1): 10882–11005.
- Hornik, K. (1991). “Approximation Capabilities of Multilayer Feedforward Networks.” Neural Networks, 4(2): 251 – 257.
- “Spike and Slab Variable Selection: Frequentist and Bayesian Strategies.” The Annals of Statistics, 33(2): 730–773.
- Janowsky, S. A. (1989). “Pruning versus Clipping in Neural Networks.” Physical Review A, 39: 6600–6603.
- “Variational Dropout and the Local Reparameterization Trick.” In Advances in Neural Information Processing Systems, volume 28. Curran Associates, Inc.
- “Auto-Encoding Variational Bayes.” In 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Conference Track Proceedings.
- “Learning Multiple Layers of Features from Tiny Images.”
- “Deep Learning.” Nature, 521: 436–44.
- “Gradient-based learning applied to document recognition.” Proceedings of the IEEE, 86(11): 2278–2324.
- “Optimal Brain Damage.” In Advances in Neural Information Processing Systems, 598–605. Morgan Kaufmann.
- “Deep Feature Selection: Theory and Application to Identify Enhancers and Promoters.” In Przytycka, T. M. (ed.), Research in Computational Molecular Biology, 205–217. Cham: Springer International Publishing.
- Lindley, D. V. (1957). “A Statistical Paradox.” Biometrika, 44(1-2): 187–192.
- Liu, J. (2021). “Variable Selection with Rigorous Uncertainty Quantification using Deep Bayesian Neural Networks: Posterior Concentration and Bernstein-von Mises Phenomenon.” In Banerjee, A. and Fukumizu, K. (eds.), Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, volume 130 of Proceedings of Machine Learning Research, 3124–3132. PMLR.
- “Implicit Generative Prior for Bayesian Neural Networks.”
- “Bayesian Compression for Deep Learning.”
- MacKay, D. J. C. (1992). “A Practical Bayesian Framework for Backpropagation Networks.” Neural Computation, 4(3): 448–472.
- “Bayesian Variable Selection in Linear Regression.” Journal of the American Statistical Association, 83(404): 1023–1032.
- “Variational Dropout Sparsifies Deep Neural Networks.” In International Conference on Machine Learning, 2498–2507. PMLR.
- “Skeletonization: A Technique for Trimming the Fat from a Network via Relevance Assessment.” In Proceedings of the 1st International Conference on Neural Information Processing Systems, 107–115. Cambridge, MA, USA: MIT Press.
- Neal, R. M. (1995). “Bayesian Learning for Neural Networks.” Ph.D. thesis, University of Toronto.
- “A Review of Bayesian Variable Selection Methods: What, How and Which.” Bayesian Analysis, 4(1): 85 – 117.
- “An Accurate Comparison of Methods for Quantifying Variable Importance in Artificial Neural Networks using Simulated Data.” Ecological Modelling, 3(178): 389–397.
- “Robust Sparse Regularization: Defending Adversarial Attacks Via Regularized Sparse Network.” In Proceedings of the 2020 on Great Lakes Symposium on VLSI, GLSVLSI ’20, 125–130. New York, NY, USA: Association for Computing Machinery. URL https://doi.org/10.1145/3386263.3407651
- Rana, P. (2013). “Physicochemical Properties of Protein Tertiary Structure.” UCI Machine Learning Repository. DOI: https://doi.org/10.24432/C5QW3H.
- “XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks.”
- “Hierarchical Bayesian formulations for selecting variables in regression models.” Statistics in medicine, 31(11-12): 1221–1237.
- “Learning Discrete Weights Using the Local Reparameterization Trick.” In International Conference on Learning Representations.
- “Nonparametric Regression Using Bayesian Variable Selection.” Journal of Econometrics, 75(2): 317–343.
- “Dropout: A Simple Way to Prevent Neural Networks from Overfitting.” J. Mach. Learn. Res., 15(1): 1929–1958.
- Ström, N. (1997). “Sparse Connection and Pruning in Large Dynamic Artificial Neural Networks.” In Proceedings of 5th European Conference on Speech Communication and Technology (Eurospeech 1997), 2807–2810.
- “Combined Cycle Power Plant.” UCI Machine Learning Repository. DOI: https://doi.org/10.24432/C5002N.
- “Reparameterization trick for discrete variables.”
- “Energy Efficiency.” UCI Machine Learning Repository. DOI: https://doi.org/10.24432/C51307.
- “Soft Weight-Sharing for Neural Network Compression.”
- “Fast Dropout Training.” In Proceedings of the 30th International Conference on Machine Learning, volume 28 of Proceedings of Machine Learning Research, 118–126. PMLR.
- Williams, R. J. (1992). “Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning.” Machince Learning, 8(3–4): 229–256.
- Yeh, I.-C. (2007). “Concrete Compressive Strength.” UCI Machine Learning Repository. DOI: https://doi.org/10.24432/C5PK67.
- Zagoruyko, S. (2015). “92.45 on cifar-10 in torch.” http://torch.ch/blog/2015/07/30/cifar.html.
- “Understanding deep learning (still) requires rethinking generalization.” Communications of the ACM, 64(3): 107–115.