Model and Feature Diversity for Bayesian Neural Networks in Mutual Learning (2407.02721v1)
Abstract: Bayesian Neural Networks (BNNs) offer probability distributions for model parameters, enabling uncertainty quantification in predictions. However, they often underperform compared to deterministic neural networks. Utilizing mutual learning can effectively enhance the performance of peer BNNs. In this paper, we propose a novel approach to improve BNNs performance through deep mutual learning. The proposed approaches aim to increase diversity in both network parameter distributions and feature distributions, promoting peer networks to acquire distinct features that capture different characteristics of the input, which enhances the effectiveness of mutual learning. Experimental results demonstrate significant improvements in the classification accuracy, negative log-likelihood, and expected calibration error when compared to traditional mutual learning for BNNs.
- Weight uncertainty in neural network. In ICML, 2015.
- Donald Bures. An extension of Kakutani’s theorem on infinite product measures to the tensor product of semifinite w∗superscript𝑤w^{*}italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT-algebras. Transactions of the American Mathematical Society, 135:199–212, 1969.
- Stochastic gradient hamiltonian monte carlo. In ICML, 2014.
- Feature-map-level online adversarial knowledge distillation. In ICML, 2020.
- Radial Bayesian Neural Networks: Beyond discrete support in large-scale Bayesian deep learning. AISTATS, 2020.
- A systematic comparison of Bayesian deep learning robustness in diabetic retinopathy tasks. In 4th workshop on Bayesian Deep Learning (NeurIPS 2019), 2019.
- Dropout as a Bayesian approximation: Representing model uncertainty in deep learning. In ICML, 2016.
- Structured variational learning of Bayesian neural networks with horseshoe priors. In ICML, 2018.
- On calibration of modern neural networks. In ICML, 2017.
- Online knowledge distillation via collaborative learning. In CVPR, 2020.
- Deep residual learning for image recognition. In CVPR, 2016.
- A comprehensive overhaul of feature distillation. In ICCV, 2019.
- Distilling the Knowledge in a Neural Network. In NIPS Deep Learning and Representation Learning Workshop, 2014.
- Fast and scalable Bayesian deep learning by weight-perturbation in adam. In ICML, 2018.
- Feature fusion for online mutual knowledge distillation. In ICPR, 2021.
- Variational dropout and the local reparameterization trick. NeurIPS, 2015.
- Paying more attention to attention: improving the performance of convolutional neural networks via attention transfer. In ICLR, 2017.
- Bayesian dark knowledge. NeurIPS, 2015.
- Learning multiple layers of features from tiny images. Master’s thesis, University of Toronto, 2009.
- Cifar-10 (canadian institute for advanced research), 2010.
- Stein variational gradient descent: A general purpose Bayesian inference algorithm. NeurIPS, 2016.
- Radford M Neal. Bayesian learning for neural networks, volume 118. Springer Science & Business Media, 2012.
- Flat seeking Bayesian neural network. In NeurIPS, 2023.
- Gaussian variational approximation with a factor covariance structure. Journal of Computational and Graphical Statistics, 27(3):465–478, 2018.
- Learning deep representations with probabilistic knowledge transfer. In ECCV, 2018.
- A scalable laplace approximation for neural networks. In ICLR, 2018.
- FitNets: Hints for thin deep nets. In ICLR, 2015.
- Walsh-Hadamard variational inference for Bayesian deep learning. In NeurIPS, 2020.
- Imagenet large scale visual recognition challenge. IJCV, 2015.
- Variational learning of Bayesian neural networks via Bayesian dark knowledge. In IJCAI, 2021.
- The k-tied normal distribution: A compact parameterization of gaussian mean field posteriors in Bayesian neural networks. In ICML, 2020.
- Laurens Van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. Journal of machine learning research, 2008.
- Attention is all you need. NIPS, 30, 2017.
- Cédric Villani. Topics in optimal transportation. American Mathematical Soc., 2021.
- Cédric Villani et al. Optimal transport: old and new. Springer, 2009.
- Adversarial distillation of Bayesian neural network posteriors. In ICML, 2018.
- Bayesian learning via stochastic gradient langevin dynamics. In ICML, 2011.
- Amln: adversarial-based mutual learning network for online knowledge distillation. In ECCV, 2020.
- Deep mutual learning. In CVPR, 2018.
- Knowledge distillation by on-the-fly native ensemble. NeurIPS, 2018.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.