Bootstrap Your Own Variance (2312.03213v1)
Abstract: Understanding model uncertainty is important for many applications. We propose Bootstrap Your Own Variance (BYOV), combining Bootstrap Your Own Latent (BYOL), a negative-free Self-Supervised Learning (SSL) algorithm, with Bayes by Backprop (BBB), a Bayesian method for estimating model posteriors. We find that the learned predictive std of BYOV vs. a supervised BBB model is well captured by a Gaussian distribution, providing preliminary evidence that the learned parameter posterior is useful for label free uncertainty estimation. BYOV improves upon the deterministic BYOL baseline (+2.83% test ECE, +1.03% test Brier) and presents better calibration and reliability when tested with various augmentations (eg: +2.4% test ECE, +1.2% test Brier for Salt & Pepper noise).
- A general framework for updating belief distributions. Journal of the Royal Statistical Society Series B: Statistical Methodology, 78(5):1103–1130, 2016.
- Weight uncertainty in neural networks. In International Conference on Machine Learning, pp. 1613–1622, 2015.
- Variational memory addressing in generative models. In Advances in Neural Information Processing Systems, pp. 3920–3929, 2017.
- Jochen Brocker. Reliability, sufficiency, and the decomposition of proper scores. Quarterly Journal of the Royal Meteorological Society, 135(643):1512–1519, 2009.
- Benchmark for uncertainty & robustness in self-supervised learning. arXiv preprint arXiv:2212.12411, 2022.
- How to scale your ema. arXiv preprint arXiv:2307.13813, 2023.
- Variational inference and model selection with generalized evidence bounds. In International Conference on Machine Learning, pp. 893–902, 2018.
- Training deep nets with sublinear memory cost. CoRR, abs/1604.06174, 2016. URL http://arxiv.org/abs/1604.06174.
- Palm: Scaling language modeling with pathways. Journal of Machine Learning Research, 24(240):1–113, 2023.
- Laplace redux-effortless Bayesian deep learning. In Advances in Neural Information Processing Systems, volume 34, pp. 20089–20103, 2021.
- The Helmholtz machine. Neural Comput., 7(5):889–904, 1995.
- Imagenet: A large-scale hierarchical image database. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255, 2009.
- Importance weighting and variational inference. volume 31, 2018.
- An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
- Bayesian neural network priors revisited. In International Conference on Learning Representations, 2022.
- The lottery ticket hypothesis: Finding sparse, trainable neural networks. arXiv preprint arXiv:1803.03635, 2018.
- Dropout as a Bayesian approximation: Representing model uncertainty in deep learning. In International Conference on Machine Learning, volume 48, pp. 1050–1059, 2016.
- Lightweight probabilistic deep networks. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 3369–3378, 2018.
- Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association, 102(477):359–378, 2007.
- Self-supervised adversarial robustness for the low-label, high-data regime. In International Conference on Learning Representations, 2021.
- Alex Graves. Practical variational inference for neural networks. In Advances in Neural Information Processing Systems, volume 24, 2011.
- Bootstrap your own latent-a new approach to self-supervised learning. Advances in Neural Information Processing Systems, 33:21271–21284, 2020.
- On calibration of modern neural networks. In International Conference on Machine Learning, pp. 1321–1330, 2017.
- Using self-supervised learning can improve model robustness and uncertainty. In Advances in Neural Information Processing Systems, volume 32, 2019.
- beta-vae: Learning basic visual concepts with a constrained variational framework. In International Conference on Learning Representations, 2016.
- Stochastic variational inference. Journal of Machine Learning Research, 14:1303–1347, 2013.
- Stephen C Hora. Aleatory and epistemic uncertainty in probability elicitation with an example from hazardous waste management. Reliability Engineering & System Safety, 54(2-3):217–223, 1996.
- Regularization matters: A nonparametric perspective on overparametrized neural network. In International Conference on Artificial Intelligence and Statistics, pp. 829–837, 2021.
- What are Bayesian neural network posteriors really like? In International Conference on Machine Learning, pp. 4629–4640, 2021.
- Generalized variational inference: Three arguments for deriving new posteriors. arXiv preprint arXiv:1904.02063, 2019.
- Bayesian model selection, the marginal likelihood, and generalization. In International Conference on Machine Learning, pp. 14223–14247, 2022.
- David J. C. MacKay. Bayesian interpolation. Neural Computation, 4(3):415–447, 1992.
- A simple baseline for Bayesian uncertainty in deep learning. Advances in Neural Information Processing Systems, 32, 2019.
- The Monte Carlo method. Journal of the American Statistical Association, 44(247):335–341, 1949.
- Thomas P. Minka. Expectation propagation for approximate Bayesian inference. In Uncertainty in Artificial Intelligence, pp. 362–369, 2001.
- Monte Carlo gradient estimation in machine learning. Journal of Machine Learning Research, 21:132:1–132:62, 2020.
- Radford M Neal et al. MCMC using Hamiltonian dynamics. Handbook of Markov Chain Monte Carlo, 2(11):2, 2011.
- OpenAI. GPT-4 technical report. CoRR, abs/2303.08774, 2023. doi: 10.48550/arXiv.2303.08774. URL https://doi.org/10.48550/arXiv.2303.08774.
- Practical deep learning with Bayesian principles. In Advances in Neural Information Processing Systems, volume 32, 2019.
- Can you trust your model’s uncertainty? evaluating predictive uncertainty under dataset shift. Advances in Neural Information Processing Systems, 32, 2019.
- Kanerva++: Extending the Kanerva machine with differentiable, locally block allocated latent memory. In International Conference on Learning Representations, 2021.
- Byol works even without batch statistics. arXiv preprint arXiv:2010.10241, 2020.
- Deep Bayesian bandits showdown: An empirical comparison of bayesian deep networks for thompson sampling. In International Conference on Learning Representations, 2018.
- High-resolution image synthesis with latent diffusion models. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 10674–10685, 2022.
- Do Bayesian neural networks need to be fully stochastic? In International Conference on Artificial Intelligence and Statistics, pp. 7694–7722, 2023.
- Practical Bayesian optimization of machine learning algorithms. In Advances in Neural Information Processing Systems, 2012.
- Ladder variational autoencoders. In Advances in Neural Information Processing Systems, pp. 3738–3746, 2016.
- VAE with a vampprior. In International Conference on Artificial Intelligence and Statistics, volume 84, pp. 1214–1223, 2018.
- Training data-efficient image transformers & distillation through attention. In International Conference on Machine Learning, pp. 10347–10357, 2021.
- Understanding priors in Bayesian neural networks at the unit level. In International Conference on Machine Learning, pp. 6458–6467, 2019.
- Natural-parameter networks: A class of probabilistic neural networks. In Advances in Neural Information Processing Systems, volume 29, 2016.
- Flipout: Efficient pseudo-independent weight perturbations on mini-batches. In International Conference on Learning Representations, 2018.
- How good is the Bayes posterior in deep neural networks really? In International Conference on Machine Learning, 2020.
- The kanerva machine: A generative distributed memory. In International Conference on Learning Representations, 2018.
- Polina Turishcheva (8 papers)
- Jason Ramapuram (23 papers)
- Sinead Williamson (14 papers)
- Dan Busbridge (23 papers)
- Eeshan Dhekane (4 papers)
- Russ Webb (16 papers)