Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Bootstrap Your Own Variance (2312.03213v1)

Published 6 Dec 2023 in cs.LG and stat.ML

Abstract: Understanding model uncertainty is important for many applications. We propose Bootstrap Your Own Variance (BYOV), combining Bootstrap Your Own Latent (BYOL), a negative-free Self-Supervised Learning (SSL) algorithm, with Bayes by Backprop (BBB), a Bayesian method for estimating model posteriors. We find that the learned predictive std of BYOV vs. a supervised BBB model is well captured by a Gaussian distribution, providing preliminary evidence that the learned parameter posterior is useful for label free uncertainty estimation. BYOV improves upon the deterministic BYOL baseline (+2.83% test ECE, +1.03% test Brier) and presents better calibration and reliability when tested with various augmentations (eg: +2.4% test ECE, +1.2% test Brier for Salt & Pepper noise).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (54)
  1. A general framework for updating belief distributions. Journal of the Royal Statistical Society Series B: Statistical Methodology, 78(5):1103–1130, 2016.
  2. Weight uncertainty in neural networks. In International Conference on Machine Learning, pp.  1613–1622, 2015.
  3. Variational memory addressing in generative models. In Advances in Neural Information Processing Systems, pp.  3920–3929, 2017.
  4. Jochen Brocker. Reliability, sufficiency, and the decomposition of proper scores. Quarterly Journal of the Royal Meteorological Society, 135(643):1512–1519, 2009.
  5. Benchmark for uncertainty & robustness in self-supervised learning. arXiv preprint arXiv:2212.12411, 2022.
  6. How to scale your ema. arXiv preprint arXiv:2307.13813, 2023.
  7. Variational inference and model selection with generalized evidence bounds. In International Conference on Machine Learning, pp.  893–902, 2018.
  8. Training deep nets with sublinear memory cost. CoRR, abs/1604.06174, 2016. URL http://arxiv.org/abs/1604.06174.
  9. Palm: Scaling language modeling with pathways. Journal of Machine Learning Research, 24(240):1–113, 2023.
  10. Laplace redux-effortless Bayesian deep learning. In Advances in Neural Information Processing Systems, volume 34, pp.  20089–20103, 2021.
  11. The Helmholtz machine. Neural Comput., 7(5):889–904, 1995.
  12. Imagenet: A large-scale hierarchical image database. In IEEE Conference on Computer Vision and Pattern Recognition, pp.  248–255, 2009.
  13. Importance weighting and variational inference. volume 31, 2018.
  14. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
  15. Bayesian neural network priors revisited. In International Conference on Learning Representations, 2022.
  16. The lottery ticket hypothesis: Finding sparse, trainable neural networks. arXiv preprint arXiv:1803.03635, 2018.
  17. Dropout as a Bayesian approximation: Representing model uncertainty in deep learning. In International Conference on Machine Learning, volume 48, pp.  1050–1059, 2016.
  18. Lightweight probabilistic deep networks. In IEEE Conference on Computer Vision and Pattern Recognition, pp.  3369–3378, 2018.
  19. Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association, 102(477):359–378, 2007.
  20. Self-supervised adversarial robustness for the low-label, high-data regime. In International Conference on Learning Representations, 2021.
  21. Alex Graves. Practical variational inference for neural networks. In Advances in Neural Information Processing Systems, volume 24, 2011.
  22. Bootstrap your own latent-a new approach to self-supervised learning. Advances in Neural Information Processing Systems, 33:21271–21284, 2020.
  23. On calibration of modern neural networks. In International Conference on Machine Learning, pp.  1321–1330, 2017.
  24. Using self-supervised learning can improve model robustness and uncertainty. In Advances in Neural Information Processing Systems, volume 32, 2019.
  25. beta-vae: Learning basic visual concepts with a constrained variational framework. In International Conference on Learning Representations, 2016.
  26. Stochastic variational inference. Journal of Machine Learning Research, 14:1303–1347, 2013.
  27. Stephen C Hora. Aleatory and epistemic uncertainty in probability elicitation with an example from hazardous waste management. Reliability Engineering & System Safety, 54(2-3):217–223, 1996.
  28. Regularization matters: A nonparametric perspective on overparametrized neural network. In International Conference on Artificial Intelligence and Statistics, pp.  829–837, 2021.
  29. What are Bayesian neural network posteriors really like? In International Conference on Machine Learning, pp.  4629–4640, 2021.
  30. Generalized variational inference: Three arguments for deriving new posteriors. arXiv preprint arXiv:1904.02063, 2019.
  31. Bayesian model selection, the marginal likelihood, and generalization. In International Conference on Machine Learning, pp.  14223–14247, 2022.
  32. David J. C. MacKay. Bayesian interpolation. Neural Computation, 4(3):415–447, 1992.
  33. A simple baseline for Bayesian uncertainty in deep learning. Advances in Neural Information Processing Systems, 32, 2019.
  34. The Monte Carlo method. Journal of the American Statistical Association, 44(247):335–341, 1949.
  35. Thomas P. Minka. Expectation propagation for approximate Bayesian inference. In Uncertainty in Artificial Intelligence, pp.  362–369, 2001.
  36. Monte Carlo gradient estimation in machine learning. Journal of Machine Learning Research, 21:132:1–132:62, 2020.
  37. Radford M Neal et al. MCMC using Hamiltonian dynamics. Handbook of Markov Chain Monte Carlo, 2(11):2, 2011.
  38. OpenAI. GPT-4 technical report. CoRR, abs/2303.08774, 2023. doi: 10.48550/arXiv.2303.08774. URL https://doi.org/10.48550/arXiv.2303.08774.
  39. Practical deep learning with Bayesian principles. In Advances in Neural Information Processing Systems, volume 32, 2019.
  40. Can you trust your model’s uncertainty? evaluating predictive uncertainty under dataset shift. Advances in Neural Information Processing Systems, 32, 2019.
  41. Kanerva++: Extending the Kanerva machine with differentiable, locally block allocated latent memory. In International Conference on Learning Representations, 2021.
  42. Byol works even without batch statistics. arXiv preprint arXiv:2010.10241, 2020.
  43. Deep Bayesian bandits showdown: An empirical comparison of bayesian deep networks for thompson sampling. In International Conference on Learning Representations, 2018.
  44. High-resolution image synthesis with latent diffusion models. In IEEE Conference on Computer Vision and Pattern Recognition, pp.  10674–10685, 2022.
  45. Do Bayesian neural networks need to be fully stochastic? In International Conference on Artificial Intelligence and Statistics, pp.  7694–7722, 2023.
  46. Practical Bayesian optimization of machine learning algorithms. In Advances in Neural Information Processing Systems, 2012.
  47. Ladder variational autoencoders. In Advances in Neural Information Processing Systems, pp.  3738–3746, 2016.
  48. VAE with a vampprior. In International Conference on Artificial Intelligence and Statistics, volume 84, pp.  1214–1223, 2018.
  49. Training data-efficient image transformers & distillation through attention. In International Conference on Machine Learning, pp.  10347–10357, 2021.
  50. Understanding priors in Bayesian neural networks at the unit level. In International Conference on Machine Learning, pp.  6458–6467, 2019.
  51. Natural-parameter networks: A class of probabilistic neural networks. In Advances in Neural Information Processing Systems, volume 29, 2016.
  52. Flipout: Efficient pseudo-independent weight perturbations on mini-batches. In International Conference on Learning Representations, 2018.
  53. How good is the Bayes posterior in deep neural networks really? In International Conference on Machine Learning, 2020.
  54. The kanerva machine: A generative distributed memory. In International Conference on Learning Representations, 2018.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Polina Turishcheva (8 papers)
  2. Jason Ramapuram (23 papers)
  3. Sinead Williamson (14 papers)
  4. Dan Busbridge (23 papers)
  5. Eeshan Dhekane (4 papers)
  6. Russ Webb (16 papers)