BayesDLL: Bayesian Deep Learning Library (2309.12928v1)

Published 22 Sep 2023 in cs.LG and stat.ML

Abstract: We release a new Bayesian neural network library for PyTorch for large-scale deep networks. Our library implements mainstream approximate Bayesian inference algorithms: variational inference, MC-dropout, stochastic-gradient MCMC, and Laplace approximation. The main differences from other existing Bayesian neural network libraries are as follows: 1) Our library can deal with very large-scale deep networks including Vision Transformers (ViTs). 2) We need virtually zero code modifications for users (e.g., the backbone network definition codes do not neet to be modified at all). 3) Our library also allows the pre-trained model weights to serve as a prior mean, which is very useful for performing Bayesian inference with the large-scale foundation models like ViTs that are hard to optimise from scratch with the downstream data alone. Our code is publicly available at: \url{https://github.com/SamsungLabs/BayesDLL}\footnote{A mirror repository is also available at: \url{https://github.com/minyoungkim21/BayesDLL}.}.

Authors (2)

Minyoung Kim (34 papers)
Timothy Hospedales (101 papers)

Summary

Overview of BayesDLL: Bayesian Deep Learning Library

The paper presents a comprehensive library named BayesDLL developed for PyTorch, facilitating the straightforward implementation of Bayesian Neural Networks (BNNs) on large-scale architectures like Vision Transformers (ViTs). The primary strength of BayesDLL lies in its integration of diverse approximate Bayesian inference algorithms such as Variational Inference, MC-Dropout, Stochastic-Gradient MCMC (SG-MCMC), and Laplace Approximation. This toolkit is designed to handle complexes without necessitating extensive code modification, making it notably accessible for practitioners handling substantial deep learning models.

Key Features and Contributions

Scalability to Large Networks: BayesDLL is distinguished from other existing Bayesian neural network libraries by its ability to handle very large-scale networks such as ViTs. This capability allows for efficient Bayesian modeling of architectures that are conventionally challenging to optimize due to their scale and complexity.
Plug-and-Play Approach: The library requires virtually no modification of the existing network architecture definitions. This characteristic is crucial for leveraging pre-trained model weights from publicly available sources, thus allowing these weights to be employed as prior means. This is particularly important for foundation models like ViTs.
Variety of Inference Methods:
- Variational Inference (VI): Implements Gaussian variational posteriors which offer a tractable framework for parameter distribution estimation.
- MC-Dropout: Provides flexibility in incorporating prior knowledge and reduces the need for extensive model re-definition.
- SG-MCMC: Utilizes stochastic-gradient Langevin dynamics for posterior sampling, offering significant flexibility without restrictive assumptions on the model distribution.
- Laplace Approximation: Employs second-order Taylor expansion to approximate the posterior distribution around the MAP estimate, although precision might vary based on Hessian approximations.

Experimental Validation

Experiments conducted with the library demonstrate its applicability to various neural network architectures across tasks like image classification using MNIST and large-scale datasets such as Flowers and Pets. The library's efficacy vis-a-vis traditional non-Bayesian methods (i.e., vanilla SGD) was assessed, where Bayesian models notably provided superior or comparable results with improved uncertainty calibration in numerous cases.

Prediction Performance: Bayesian methods using BayesDLL, namely Variational Inference and MC-Dropout, showed competitive or improved test error rates as compared to non-Bayesian counterparts. Notably, large networks such as ResNet-101 and Vision Transformer benefited from Bayesian adaptation when anchored with pre-trained weights.
Uncertainty Quantification: The library facilitates robust uncertainty estimates through metrics like Expected Calibration Error (ECE), Maximum Calibration Error (MCE), and Negative Log-Likelihood (NLL). This capability underscores the benefit of Bayesian approaches in capturing model uncertainty effectively.

Implications and Future Research

The introduction of BayesDLL marks an essential step in making Bayesian deep learning accessible for large-scale network deployment. It suggests the potential for expanding Bayesian techniques to other extensive architectures such as LLAMA and RoBERTa with minimal adaptation effort. Future developments might focus on optimizing computational efficiency further and potentially exploring advanced Hessian approximation strategies to enhance Laplace Approximation’s reliability.

BayesDLL sets a methodological foundation for deploying Bayesian approaches at scale, promoting better uncertainty handling and the benefits thereof in AI models. As the community explores broader applications, this tool could significantly influence practices in AI, particularly where model interpretability and reliability are crucial.

PDF Markdown

Related Papers

Find Related Papers

GitHub

GitHub - SamsungLabs/BayesDLL (142 stars)
GitHub - minyoungkim21/BayesDLL: Bayesian Deep Learning Library (11 stars)