Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 72 tok/s
Gemini 2.5 Pro 41 tok/s Pro
GPT-5 Medium 30 tok/s Pro
GPT-5 High 24 tok/s Pro
GPT-4o 115 tok/s Pro
Kimi K2 203 tok/s Pro
GPT OSS 120B 451 tok/s Pro
Claude Sonnet 4.5 36 tok/s Pro
2000 character limit reached

Bayesian Low-Rank LeArning (Bella): A Practical Approach to Bayesian Neural Networks (2407.20891v5)

Published 30 Jul 2024 in cs.LG, cs.AI, and cs.CV

Abstract: Computational complexity of Bayesian learning is impeding its adoption in practical, large-scale tasks. Despite demonstrations of significant merits such as improved robustness and resilience to unseen or out-of-distribution inputs over their non- Bayesian counterparts, their practical use has faded to near insignificance. In this study, we introduce an innovative framework to mitigate the computational burden of Bayesian neural networks (BNNs). Our approach follows the principle of Bayesian techniques based on deep ensembles, but significantly reduces their cost via multiple low-rank perturbations of parameters arising from a pre-trained neural network. Both vanilla version of ensembles as well as more sophisticated schemes such as Bayesian learning with Stein Variational Gradient Descent (SVGD), previously deemed impractical for large models, can be seamlessly implemented within the proposed framework, called Bayesian Low-Rank LeArning (Bella). In a nutshell, i) Bella achieves a dramatic reduction in the number of trainable parameters required to approximate a Bayesian posterior; and ii) it not only maintains, but in some instances, surpasses the performance of conventional Bayesian learning methods and non-Bayesian baselines. Our results with large-scale tasks such as ImageNet, CAMELYON17, DomainNet, VQA with CLIP, LLaVA demonstrate the effectiveness and versatility of Bella in building highly scalable and practical Bayesian deep models for real-world applications.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (56)
  1. Radford M Neal. Bayesian learning for neural networks, volume 118. Springer Science & Business Media, 2012.
  2. ADV-BNN: Improved Adversarial Defense Through Robust Bayesian Neural Network. In International Conference on Learning Representations (ICLR), 2019.
  3. Bayesian learning with information gain provably bounds risk for a robust adversarial defense. In International Conference on Machine Learning (ICML), 2022.
  4. Bayesian adversarial learning. In Advances in Neural Information Processing Systems (NeurIPS), 2018.
  5. Evaluating approximate inference in bayesian deep learning. In Proceedings of the NeurIPS 2021 Competitions and Demonstrations Track.
  6. Gold seeker: Information gain from policy distributions for goal-oriented vision-and-langauge reasoning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
  7. Christopher M. Bishop. Pattern Recognition and Machine Learning. Springer, 2006.
  8. SegCLIP: Patch aggregation with learnable centers for open-vocabulary semantic segmentation. In Proceedings of the 40th International Conference on Machine Learning (ICML), 2023.
  9. Git re-basin: Merging models modulo permutation symmetries. In The Eleventh International Conference on Learning Representations, 2023.
  10. Knowledge is a region in weight space for fine-tuned language models. arXiv preprint arXiv:2302.04863, 2023.
  11. Grand Challenge. Camelyon17 leaderboard. https://camelyon17.grand-challenge.org/evaluation/challenge/leaderboard/, 2024. Accessed: 08-Mar-2024.
  12. WILDS: A benchmark of in-the-wild distribution shifts. In International Conference on Machine Learning (ICML), 2021.
  13. Visual instruction tuning. In Advanced in Neural Information Processing Systems (NeurIPS), 2023.
  14. Parameter-Efficient Transfer Learning for NLP. arXiv:1902.00751, 2019.
  15. Learning multiple visual domains with residual adapters. arXiv:1705.08045, 2017.
  16. Exploring versatile generative language model via parameter-efficient transfer learning. In Findings of the Association for Computational Linguistics (EMNLP), 2020.
  17. LoRA: Low-rank adaptation of large language models. In International Conference on Learning Representations (ICLR), 2022.
  18. Bayesian attention modules. In Advances in Neural Information Processing Systems (NeurIPS), 2020.
  19. Bayesian attention belief networks. In International Conference on Machine Learning (ICML), 2021.
  20. Bayesian low-rank adaptation for large language models. In International Conference on Learning Representations (ICLR), 2024.
  21. What are bayesian neural network posteriors really like? In International Conference on Machine Learning (ICML), 2021.
  22. Simple and scalable predictive uncertainty estimation using deep ensembles. In Advances in neural information processing systems (NeurIPS), 2017.
  23. LoRA ensembles for large language model fine-tuning, 2024.
  24. A Rigorous Link between Deep Ensembles and (Variational) Bayesian Methods. In Advances in Neural Information Processing Systems (NeurIPS), 2023.
  25. Stein variational gradient descent: A general purpose bayesian inference algorithm. Advances in Neural Information Processing Systems (NeurIPS), 2016.
  26. Parameter-efficient fine-tuning of large-scale pre-trained language models. Nature Machine Intelligence, 2023.
  27. Delta tuning: A comprehensive study of parameter efficient methods for pre-trained language models. arXiv preprint arXiv:2203.06904, 2022.
  28. Qlora: Efficient finetuning of quantized llms. arXiv preprint arXiv:2305.14314, 2023.
  29. Radford M Neal et al. Mcmc using hamiltonian dynamics. Handbook of markov chain monte carlo, 2(11), 2011.
  30. David MacKay. Bayesian model comparison and backprop nets. In Advances in Neural Information Processing Systems (NeurIPS), 1991.
  31. Bayesian learning via stochastic gradient langevin dynamics. In Proceedings of the 28th international conference on machine learning (ICML-11), pages 681–688. Citeseer, 2011.
  32. Variational inference: A review for statisticians. Journal of the American statistical Association, 2017.
  33. Bayesian deep learning and a probabilistic perspective of generalization. In Advances in neural information processing systems (NeurIPS), 2020.
  34. Deep ensembles as approximate bayesian inference.
  35. Imagenet large scale visual recognition challenge. International journal of computer vision, 2015.
  36. Weight uncertainty in neural network. In International Conference on Machine Learning (ICML), 2015.
  37. Learning multiple layers of features from tiny images. 2009.
  38. Benchmarking neural network robustness to common corruptions and perturbations. arXiv preprint arXiv:1903.12261, 2019.
  39. An Analysis of Single Layer Networks in Unsupervised Feature Learning. In International Conference on Artificial Intelligence and Statistics (AISTATS), 2011.
  40. From detection of individual metastases to classification of lymph node status at the patient level: the camelyon17 challenge. IEEE Transactions on Medical Imaging, 2018.
  41. Moment matching for multi-source domain adaptation. In Proceedings of the IEEE/CVF international conference on computer vision (CVPR), 2019.
  42. Making the V in VQA matter: Elevating the role of image understanding in Visual Question Answering. In Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
  43. Openclip, 2021.
  44. Learning transferable visual models from natural language supervision. In International Conference on Machine Learning (ICML), 2021.
  45. Task arithmetic in the tangent space: Improved editing of pre-trained models. arXiv preprint arXiv:2305.12827, 2023.
  46. No one representation to rule them all: Overlapping features of training methods. arXiv preprint arXiv:2110.12899, 2021.
  47. Yarin Gal et al. Uncertainty in deep learning. 2016.
  48. Technical report on the cleverhans v2.1.0 adversarial examples library. arXiv preprint arXiv:1610.00768, 2018.
  49. VQA: Visual Question Answering. In International Conference on Computer Vision (ICCV), 2015.
  50. Improved baselines with visual instruction tuning. arXiv preprint arXiv:2310.03744, 2023.
  51. A generative adversarial density estimator. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019, pages 10782–10791. Computer Vision Foundation / IEEE, 2019.
  52. GADE: A generative adversarial approach to density estimation and its applications. Int. J. Comput. Vis., 128(10):2731–2743, 2020.
  53. Merging models with fisher-weighted averaging. Advances in Neural Information Processing Systems (NeurIPS), 2022.
  54. Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time. In International Conference on Machine Learning (ICML), 2022.
  55. Robust fine-tuning of zero-shot models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7959–7971, 2022.
  56. Diverse weight averaging for out-of-distribution generalization. Advances in Neural Information Processing Systems (NeurIPS), 2022.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 1 post and received 0 likes.