Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
126 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

On the Convergence of Locally Adaptive and Scalable Diffusion-Based Sampling Methods for Deep Bayesian Neural Network Posteriors (2403.08609v2)

Published 13 Mar 2024 in cs.LG and stat.ML

Abstract: Achieving robust uncertainty quantification for deep neural networks represents an important requirement in many real-world applications of deep learning such as medical imaging where it is necessary to assess the reliability of a neural network's prediction. Bayesian neural networks are a promising approach for modeling uncertainties in deep neural networks. Unfortunately, generating samples from the posterior distribution of neural networks is a major challenge. One significant advance in that direction would be the incorporation of adaptive step sizes, similar to modern neural network optimizers, into Monte Carlo Markov chain sampling algorithms without significantly increasing computational demand. Over the past years, several papers have introduced sampling algorithms with claims that they achieve this property. However, do they indeed converge to the correct distribution? In this paper, we demonstrate that these methods can have a substantial bias in the distribution they sample, even in the limit of vanishing step sizes and at full batch size.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (26)
  1. A review of uncertainty quantification in deep learning: Techniques, applications and challenges. Information Fusion, 76:243–297, 2021. ISSN 1566-2535. doi: https://doi.org/10.1016/j.inffus.2021.05.008. URL https://www.sciencedirect.com/science/article/pii/S1566253521001081.
  2. Handbook of Brownian motion: facts and formulae. Springer, 2002.
  3. Language models are few-shot learners. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 1877–1901. Curran Associates, Inc., 2020. URL https://proceedings.neurips.cc/paper_files/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf.
  4. Robustness of Bayesian neural networks to gradient-based attacks. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 15602–15613. Curran Associates, Inc., 2020. URL https://proceedings.neurips.cc/paper_files/paper/2020/file/b3f61131b6eceeb2b14835fa648a48ff-Paper.pdf.
  5. Bert: Pre-training of deep bidirectional transformers for language understanding. In North American Chapter of the Association for Computational Linguistics, 2019. URL https://api.semanticscholar.org/CorpusID:52967399.
  6. Bayesian Neural Networks: An Introduction and Survey, pages 45–87. Springer International Publishing, Cham, 2020. ISBN 978-3-030-42553-1. doi: 10.1007/978-3-030-42553-1˙3. URL https://doi.org/10.1007/978-3-030-42553-1_3.
  7. Deep residual learning for image recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, 2015. URL https://api.semanticscholar.org/CorpusID:206594692.
  8. A survey of safety and trustworthiness of deep neural networks: Verification, testing, adversarial attack and defence, and interpretability. Computer Science Review, 37:100270, 2020. ISSN 1574-0137. doi: https://doi.org/10.1016/j.cosrev.2020.100270. URL https://www.sciencedirect.com/science/article/pii/S1574013719302527.
  9. Imagenet classification with deep convolutional neural networks. Communications of the ACM, 60:84 – 90, 2012. URL https://api.semanticscholar.org/CorpusID:195908774.
  10. Uncertainty quantification using Bayesian neural networks in classification: Application to ischemic stroke lesion segmentation. In Medical Imaging with Deep Learning, 2018. URL https://openreview.net/forum?id=Sk_P2Q9sG.
  11. Batch normalization preconditioning for stochastic gradient Langevin dynamics. Journal of Machine Learning, 2(1):65–82, 2023. ISSN 2790-2048. doi: https://doi.org/10.4208/jml.220726a. URL http://global-sci.org/intro/article_detail/jml/21513.html.
  12. Preconditioned stochastic gradient Langevin dynamics for deep neural networks. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, pages 1788–1794. AAAI Press, 2016. URL http://www.aaai.org/ocs/index.php/AAAI/AAAI16/paper/view/11835.
  13. Continuous control with deep reinforcement learning. CoRR, abs/1509.02971, 2015. URL https://api.semanticscholar.org/CorpusID:16326763.
  14. A complete recipe for stochastic gradient mcmc. In Advances in Neural Information Processing Systems, 2015. URL https://proceedings.neurips.cc/paper/2015/file/9a4400501febb2a95e79248486a5f6d3-Paper.pdf.
  15. Human-level control through deep reinforcement learning. Nature, 518(7540):529–533, February 2015. ISSN 00280836. URL http://dx.doi.org/10.1038/nature14236.
  16. Bernt Øksendal. Stochastic differential equations : an introduction with applications. Journal of the American Statistical Association, 82:948, 1987. URL https://api.semanticscholar.org/CorpusID:123624130.
  17. Stochastic gradient Riemannian Langevin dynamics on the probability simplex. In Advances in Neural Information Processing Systems, volume 26, 2013. URL https://proceedings.neurips.cc/paper_files/paper/2013/file/309928d4b100a5d75adff48a9bfc1ddb-Paper.pdf.
  18. You only look once: Unified, real-time object detection. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 779–788, 2015. URL https://api.semanticscholar.org/CorpusID:206594738.
  19. U-net: Convolutional networks for biomedical image segmentation. In Nassir Navab, Joachim Hornegger, William M. Wells, and Alejandro F. Frangi, editors, Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015, pages 234–241, Cham, 2015. Springer International Publishing.
  20. Sebastian Ruder. An overview of gradient descent optimization algorithms, 2017.
  21. Stochastic gradient Langevin dynamics with adaptive drifts. Journal of Statistical Computation and Simulation, 92(2):318–336, 2022. doi: 10.1080/00949655.2021.1958812. URL https://doi.org/10.1080/00949655.2021.195881.
  22. A general reinforcement learning algorithm that masters chess, shogi, and go through self-play. Science, 362(6419):1140–1144, 2018. doi: 10.1126/science.aar6404. URL https://www.science.org/doi/abs/10.1126/science.aar6404.
  23. Attention is all you need. In Neural Information Processing Systems, 2017. URL https://api.semanticscholar.org/CorpusID:13756489.
  24. Bayesian learning via stochastic gradient Langevin dynamics. In International Conference on Machine Learning, 2011. URL https://api.semanticscholar.org/CorpusID:2178983.
  25. How good is the bayes posterior in deep neural networks really? In Proceedings of the 37th International Conference on Machine Learning, ICML’20. JMLR.org, 2020.
  26. Scalable stochastic gradient Riemannian Langevin dynamics in non-diagonal metrics. Transactions on Machine Learning Research, 2023. ISSN 2835-8856. URL https://openreview.net/forum?id=dXAuvo6CGI.

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com