Posterior Uncertainty Quantification in Neural Networks using Data Augmentation (2403.12729v1)
Abstract: In this paper, we approach the problem of uncertainty quantification in deep learning through a predictive framework, which captures uncertainty in model parameters by specifying our assumptions about the predictive distribution of unseen future data. Under this view, we show that deep ensembling (Lakshminarayanan et al., 2017) is a fundamentally mis-specified model class, since it assumes that future data are supported on existing observations only -- a situation rarely encountered in practice. To address this limitation, we propose MixupMP, a method that constructs a more realistic predictive distribution using popular data augmentation techniques. MixupMP operates as a drop-in replacement for deep ensembles, where each ensemble member is trained on a random simulation from this predictive distribution. Grounded in the recently-proposed framework of Martingale posteriors (Fong et al., 2023), MixupMP returns samples from an implicitly defined Bayesian posterior. Our empirical analysis showcases that MixupMP achieves superior predictive performance and uncertainty quantification on various image classification datasets, when compared with existing Bayesian and non-Bayesian approaches.
- Deep ensembles work, but are they necessary? In Advances in Neural Information Processing Systems.
- Limit theorems for a class of identically distributed random variables. The Annals of Probability, 32(3):2029–2052.
- Weight uncertainty in neural network. In International Conference on Machine Learning.
- What is the effect of importance weighting in deep learning? In International Conference on Machine Learning.
- Stochastic gradient Hamiltonian Monte Carlo. In International Conference on Machine Learning.
- Implicit bias of gradient descent for wide two-layer neural networks trained with the logistic loss. In Conference on Learning Theory.
- Laplace redux-effortless Bayesian deep learning. In Advances in Neural Information Processing Systems.
- De Finetti, B. (1937). La prévision: ses lois logiques, ses sources subjectives. Annales de l’institut Henri Poincaré, 7(1):1–68.
- Doob, J. L. (1949). Application of the theory of martingales. Le calcul des probabilites et ses applications, pages 23–27.
- Efron, B. (1992). Bootstrap methods: another look at the jackknife. In Breakthroughs in statistics: Methodology and distribution, pages 569–593. Springer.
- Towards safe autonomous driving: Capture uncertainty in the deep neural network for lidar 3d vehicle detection. In International Conference on Intelligent Transportation Systems.
- A survey of data augmentation approaches for NLP. In Findings of the Association for Computational Linguistics: ACL-IJCNLP.
- A systematic comparison of Bayesian deep learning robustness in diabetic retinopathy tasks. In NeurIPS Workshop on Bayesian Deep Learning.
- Martingale posterior distributions. Journal of the Royal Statistical Society, Series A, 85(5):1357–1391.
- Scalable nonparametric sampling from multimodal posteriors with the posterior bootstrap. In International Conference on Machine Learning.
- Dropout as a Bayesian approximation: Representing model uncertainty in deep learning. In International Conference on Machine Learning.
- Neural processes. arXiv preprint arXiv:1807.01622.
- Graves, A. (2011). Practical variational inference for neural networks. In Advances in Neural Information Processing Systems, volume 24.
- Noise contrastive priors for functional uncertainty. In Uncertainty in Artificial Intelligence.
- Analysis of dropout learning regarded as ensemble learning. In International Conference on Artificial Neural Networks.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition.
- Benchmarking neural network robustness to common corruptions and surface variations. arXiv preprint arXiv:1807.01697.
- Probabilistic backpropagation for scalable learning of Bayesian neural networks. In International Conference on Machine Learning.
- What are Bayesian neural network posteriors really like? In International Conference on Machine Learning.
- Directional convergence and alignment in deep learning. In Advances in Neural Information Processing Systems.
- A simple resampling method by perturbing the minimand. Biometrika, 88(2):381–390.
- Learning multiple layers of features from tiny images. https://www.cs.utoronto.ca/ kriz/learning-features-2009-TR.pdf.
- Simple and scalable predictive uncertainty estimation using deep ensembles. In Advances in Neural Information Processing Systems.
- LeCun, Y. (1998). The mnist database of handwritten digits. http://yann. lecun. com/exdb/mnist/.
- Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324.
- Martingale posterior neural processes. In International Conference on Learning Representations.
- Training confidence-calibrated classifiers for detecting out-of-distribution samples. In International Conference on Learning Representations.
- Nonparametric learning from Bayesian models with randomized objective functions. In Advances in Neural Information Processing Systems.
- Gradient descent maximizes the margin of homogeneous neural networks. In International Conference on Learning Representations.
- MacKay, D. J. (1992). A practical Bayesian framework for backpropagation networks. Neural computation, 4(3):448–472.
- Mixspeech: Data augmentation for low-resource automatic speech recognition. In International Conference on Acoustics, Speech and Signal Processing.
- Convergence of gradient descent on separable data. In International Conference on Artificial Intelligence and Statistics.
- Obtaining well calibrated probabilities using Bayesian binning. In AAAI Conference on Artificial Intelligence, volume 29.
- Neal, R. M. (2012). Bayesian learning for neural networks. Springer Science & Business Media.
- Approximate bayesian inference by the weighted likelihood bootstrap (with discussion). Journal of the Royal Statistical Society Series B, 56:1–48.
- Weighted Bayesian bootstrap for scalable posterior distributions. Canadian Journal of Statistics, 49(2):421–437.
- Why are bootstrapped deep ensembles not better? In ”I Can’t Believe It’s Not Better!”NeurIPS 2020 workshop.
- Randomized prior functions for deep reinforcement learning. In Advances in Neural Information Processing Systems.
- Deep exploration via bootstrapped dqn. In Advances in Neural Information Processing Systems.
- Can you trust your model’s uncertainty? evaluating predictive uncertainty under dataset shift. In Advances in Neural Information Processing Systems.
- Pytorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems.
- Uncertainty quantification and deep ensembles. In Advances in Neural Information Processing Systems.
- Rubin, D. B. (1981). The Bayesian bootstrap. The Annals of Statistics, pages 130–134.
- Tractable function-space variational inference in bayesian neural networks. In Advances in Neural Information Processing Systems.
- Neural bootstrapper. In Advances in Neural Information Processing Systems.
- A survey on image data augmentation for deep learning. Journal of Big Data, 6(1):1–48.
- Understanding measures of uncertainty for adversarial example detection. In Uncertainty in Artificial Intelligence.
- Dropout: a simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15(1):1929–1958.
- On mixup training: Improved calibration and predictive uncertainty for deep neural networks. In Advances in Neural Information Processing Systems.
- Is importance weighting incompatible with interpolating classifiers? In International Conference on Learning Representations.
- Regularization matters: Generalization and optimization of neural nets vs their induced kernel. In Advances in Neural Information Processing Systems.
- Combining ensembles and data augmentation can harm your calibration. In International Conference on Learning Representations.
- Bayesian deep learning and a probabilistic perspective of generalization. In Advances in Neural Information Processing Systems.
- Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747.
- Understanding the role of importance weighting for deep learning. In International Conference on Learning Representations.
- Wide residual networks. arXiv preprint arXiv:1605.07146.
- mixup: Beyond empirical risk minimization. In International Conference on Learning Representations.
- Cyclical stochastic gradient MCMC for Bayesian deep learning. In International Conference on Learning Representations.
Collections
Sign up for free to add this paper to one or more collections.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.