Make Me a BNN: A Simple Strategy for Estimating Bayesian Uncertainty from Pre-trained Models (2312.15297v1)
Abstract: Deep Neural Networks (DNNs) are powerful tools for various computer vision tasks, yet they often struggle with reliable uncertainty quantification - a critical requirement for real-world applications. Bayesian Neural Networks (BNN) are equipped for uncertainty estimation but cannot scale to large DNNs that are highly unstable to train. To address this challenge, we introduce the Adaptable Bayesian Neural Network (ABNN), a simple and scalable strategy to seamlessly transform DNNs into BNNs in a post-hoc manner with minimal computational and training overheads. ABNN preserves the main predictive properties of DNNs while enhancing their uncertainty quantification abilities through simple BNN adaptation layers (attached to normalization layers) and a few fine-tuning steps on pre-trained models. We conduct extensive experiments across multiple datasets for image classification and semantic segmentation tasks, and our results demonstrate that ABNN achieves state-of-the-art performance without the computational budget typically associated with ensemble methods.
- Explainable artificial intelligence (xai): Concepts, taxonomies, opportunities and challenges toward responsible ai. Information fusion, 2020.
- Pitfalls of in-domain uncertainty estimation and ensembling in deep learning. In ICLR, 2019.
- Layer normalization. In NeurIPSW, 2016.
- Variational inference: A review for statisticians. JASA, 2017.
- Weight uncertainty in neural network. In ICML, 2015.
- On last-layer algorithms for classification: Decoupling representation from uncertainty estimation. arXiv preprint arXiv:2001.08049, 2020.
- Segmentmeifyoucan: A benchmark for anomaly segmentation. In NeurIPS, 2021.
- Encoder-decoder with atrous separable convolution for semantic image segmentation. In ECCV, 2018.
- Beyond first-order uncertainty estimation with evidential models for open-world recognition. In ICMLW, 2021.
- The cityscapes dataset for semantic urban scene understanding. In CVPR, 2016.
- Randaugment: Practical automated data augmentation with a reduced search space. In CVPR, 2020.
- Laplace redux–effortless Bayesian deep learning. In NeurIPS, 2021a.
- Bayesian deep learning via subnetwork inference. In ICML, 2021b.
- Scaling vision transformers to 22 billion parameters. In ICML, 2023.
- Imagenet: A large-scale hierarchical image database. In CVPR, 2009.
- Decomposition of uncertainty in bayesian deep learning for efficient and risk-sensitive learning. In ICML, 2018.
- Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
- An image is worth 16x16 words: Transformers for image recognition at scale. In ICLR, 2021.
- Masksembles for uncertainty estimation. In CVPR, 2021.
- Efficient and scalable bayesian neural nets with rank-1 factors. In ICML, 2020.
- William Feller. An introduction to probability theory and its applications, Volume 2. John Wiley & Sons, 1991.
- Deep ensembles: A loss landscape perspective. arXiv preprint arXiv:1912.02757, 2019.
- Tradi: Tracking deep neural network weight distributions. In ECCV, 2020.
- Muad: Multiple uncertainties for autonomous driving, a benchmark for multiple uncertainty types and tasks. In BMVC, 2022.
- Encoding the latent posterior of bayesian neural networks for uncertainty quantification. T-PAMI, 2023.
- Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In ICML, 2016.
- A survey of uncertainty in deep neural networks. Artificial Intelligence Review, 2023.
- Bayesian neural networks: An introduction and survey. Case Studies in Applied Bayesian Data Science, 2020.
- Neural network ensembles. IEEE transactions on pattern analysis and machine intelligence, 1990.
- Training independent subnetworks for robust prediction. In ICLR, 2021.
- Deep residual learning for image recognition. In CVPR, 2016.
- Mask r-cnn. In ICCV, 2017.
- Why relu networks yield high-confidence predictions far away from the training data and how to mitigate the problem. In CVPR, 2019.
- A baseline for detecting misclassified and out-of-distribution examples in neural networks. In ICLR, 2017.
- A benchmark for anomaly segmentation. arXiv preprint arXiv:1911.11132, 2019.
- Jacob steinhardt et justin gilmer. the many faces of robustness: A critical analysis of out-of-distribution generalization. In ICCV, 2021a.
- Unsolved problems in ml safety. arXiv preprint arXiv:2109.13916, 2021b.
- Probabilistic backpropagation for scalable learning of bayesian neural networks. In ICML, 2015.
- Stephen C Hora. Aleatory and epistemic uncertainty in probability elicitation with an example from hazardous waste management. Reliability Engineering & System Safety, 1996.
- Wide bayesian neural networks have a simple weight posterior: theory and accelerated sampling. In ICML, pages 8926–8945. PMLR, 2022.
- Aleatoric and epistemic uncertainty in machine learning: An introduction to concepts and methods. Machine Learning, 2021.
- Openclip, 2021.
- Batch normalization: Accelerating deep network training by reducing internal covariate shift. In ICML, 2015.
- What are bayesian neural network posteriors really like? In ICML, 2021.
- An introduction to variational methods for graphical models. ML, 1999.
- What uncertainties do we need in bayesian deep learning for computer vision? NeurIPS, 2017.
- Segment anything. arXiv preprint arXiv:2304.02643, 2023.
- Being bayesian, even just a bit, fixes overconfidence in relu networks. In ICML, 2020.
- Alex Krizhevsky. Learning multiple layers of features from tiny images. Technical report, MIT, 2009.
- Imagenet classification with deep convolutional networks. In NeurIPS, 2012.
- Simple and scalable predictive uncertainty estimation using deep ensembles. In NeurIPS, 2017.
- Deep learning under privileged information using heteroscedastic dropout. In CVPR, pages 8886–8895, 2018.
- A symmetry-aware exploration of bayesian neural network posteriors. arXiv preprint arXiv:2310.08287, 2023a.
- Packed-ensembles for efficient uncertainty estimation. In ICLR, 2023b.
- A simple unified framework for detecting out-of-distribution samples and adversarial attacks. In NeurIPS, 2018.
- Coda: A real-world road corner case dataset for object detection in autonomous driving. In ECCV, 2022.
- A convnet for the 2020s. In CVPR, 2022.
- David JC MacKay. A practical bayesian framework for backpropagation networks. Neural computation, 1992.
- A simple baseline for bayesian uncertainty in deep learning. In NeurIPS, 2019.
- Predictive uncertainty estimation via prior networks. In NeurIPS, 2018.
- Obtaining well calibrated probabilities using bayesian binning. In AAAI, 2015.
- Eric Thomas Nalisnick. On priors for Bayesian neural networks. University of California, Irvine, 2018.
- Diverse imagenet models transfer better. arXiv preprint arXiv:2204.09134, 2022.
- Radford M Neal. Bayesian learning for neural networks. 2012.
- Reading digits in natural images with unsupervised feature learning. In NeurIPSW, 2011.
- Spurious features everywhere-large-scale detection of harmful spurious features in imagenet. In ICCV, 2023.
- Can you trust your model’s uncertainty? evaluating predictive uncertainty under dataset shift. In NeurIPS, 2019.
- Pytorch: An imperative style, high-performance deep learning library. In NeurIPS, 2019.
- Improving language understanding by generative pre-training. Technical report, OpenAI, 2018.
- You only look once: Unified, real-time object detection. In CVPR, 2016.
- Imagenet-21k pretraining for the masses, 2021.
- A scalable laplace approximation for neural networks. In ICLR, 2018.
- High-resolution image synthesis with latent diffusion models. In CVPR, 2022.
- Uncertainty-guided source-free domain adaptation. In ECCV, 2022.
- Laion-5b: An open large-scale dataset for training next generation image-text models. NeurIPS, 2022.
- Learning to rank short text pairs with convolutional deep neural networks. In SIGIR, 2015.
- Better aggregation in test-time augmentation. In ICCV, pages 1214–1223, 2021.
- Efficient improvement of classification accuracy via selective test-time augmentation. Information Sciences, 642:119148, 2023.
- Rethinking the inception architecture for computer vision. In CVPR, 2016.
- Accurate approximations for posterior moments and marginal densities. Journal of the American Statistical Association, 1986.
- Consistent inference of probabilities in layered networks: predictions and generalizations. In IJCNN, 1989.
- Plex: Towards reliability using pretrained large model extensions. arXiv preprint arXiv:2207.07411, 2022.
- Instance normalization: The missing ingredient for fast stylization. arXiv preprint arXiv:1607.08022, 2016.
- ViM: Out-of-distribution with virtual-logit matching. In CVPR, 2022.
- Bayesian learning via stochastic gradient langevin dynamics. In ICML, 2011.
- BatchEnsemble: an alternative approach to efficient ensemble and lifelong learning. In ICLR, 2019.
- Ross Wightman. Pytorch image models. https://github.com/rwightman/pytorch-image-models, 2019.
- Resnet strikes back: An improved training procedure in timm. In NeurIPSW, 2021.
- Bayesian deep learning and a probabilistic perspective of generalization. NeurIPS, 2020.
- Window-based early-exit cascades for uncertainty estimation: When deep ensembles are more efficient than single models. In ICCV, 2023.
- Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In CVPR, 2020.
- Cutmix: Regularization strategy to train strong classifiers with localizable features. In CVPR, 2019.
- Explainability of deep vision-based autonomous driving systems: Review and challenges. IJCV, 2022.
- Wide residual networks. In BMVC, 2016.
- Sigmoid loss for language image pre-training. In ICCV, 2023.
- mixup: Beyond empirical risk minimization. In ICLR, 2018.