Bayesian Deep Learning via Subnetwork Inference (2010.14689v4)

Published 28 Oct 2020 in cs.LG and stat.ML

Abstract: The Bayesian paradigm has the potential to solve core issues of deep neural networks such as poor calibration and data inefficiency. Alas, scaling Bayesian inference to large weight spaces often requires restrictive approximations. In this work, we show that it suffices to perform inference over a small subset of model weights in order to obtain accurate predictive posteriors. The other weights are kept as point estimates. This subnetwork inference framework enables us to use expressive, otherwise intractable, posterior approximations over such subsets. In particular, we implement subnetwork linearized Laplace as a simple, scalable Bayesian deep learning method: We first obtain a MAP estimate of all weights and then infer a full-covariance Gaussian posterior over a subnetwork using the linearized Laplace approximation. We propose a subnetwork selection strategy that aims to maximally preserve the model's predictive uncertainty. Empirically, our approach compares favorably to ensembles and less expressive posterior approximations over full networks. Our proposed subnetwork (linearized) Laplace method is implemented within the laplace PyTorch library at https://github.com/AlexImmer/Laplace.

Citations (77)

View on Semantic Scholar

Summary

The paper introduces a subnetwork inference framework that applies Bayesian techniques to a selected subset of neural network weights, reducing computational demands while preserving uncertainty calibration.
The authors implement a linearized Laplace approximation to derive a full-covariance Gaussian posterior, achieving improved performance on benchmarks like rotated MNIST and corrupted CIFAR10.
Empirical evaluations demonstrate that the method offers superior accuracy and robustness compared to techniques such as MC Dropout and diagonal Laplace inference.

Bayesian Deep Learning via Subnetwork Inference

The paper under review presents a novel approach to Bayesian deep learning, addressing the computational challenges associated with scaling Bayesian inference to large neural networks. The authors introduce the concept of subnetwork inference, which focuses on inferring only a subset of network weights to construct an expressive and computationally tractable posterior approximation.

Key Contributions

Subnetwork Inference Framework: The paper proposes a framework that applies Bayesian inference to a small subset of neural network (NN) weights, leaving the remaining weights as fixed point estimates. This allows for the use of complex posterior approximations that are usually impractical for large networks.
Linearized Laplace Approximation: A specific method, termed subnetwork linearized Laplace, is implemented. This involves obtaining a Maximum a Posteriori (MAP) estimate for all weights, followed by using the linearized Laplace approximation to derive a full-covariance Gaussian posterior over the chosen subnetwork.
Subnetwork Selection Strategy: A strategy is developed for selecting the subnetwork, aiming to preserve predictive uncertainty through leveraging Wasserstein distance. This ensures that the reduced subnetwork captures essential uncertainty characteristics of the full model.
Empirical Evaluation: The approach is empirically validated against various benchmarks. Results indicate that subnetwork inference can outperform prevalent Bayesian deep learning methods, including ensembles, by providing superior uncertainty quantification.

Numerical Results and Claims

The paper highlights that subnetwork inference can match or surpass full network inference in terms of accuracy while drastically reducing computational requirements. The results demonstrate substantial improvements in uncertainty calibration and robustness to distribution shifts, particularly when evaluated on tasks like rotated MNIST and corrupted CIFAR10 datasets. The proposed method outperforms methods such as MC Dropout and diagonal Laplace inference in these settings.

Practical and Theoretical Implications

Practical Implications: This method provides a scalable solution to Bayesian inference in neural networks, allowing practitioners to benefit from the expressiveness of Bayesian models without incurring significant computational costs. The subnetwork approach can be integrated seamlessly with existing models, facilitating broader adoption in applied settings.

Theoretical Implications: The research challenges the assumption that comprehensive Bayesian inference requires all weights to be stochastic. It suggests that with appropriate subnetwork selection, it is possible to maintain the expressive power necessary for effective uncertainty quantification. This opens avenues for further research into other subnetwork selection criteria and alternative inference techniques.

Future Directions

The paper suggests several avenues for future investigations:

Methodological Enhancements: Exploring different subnetwork selection strategies and integrating other Bayesian inference methods could yield improved performance and broader applicability.
Scalability: While subnetwork inference reduces the burden significantly, further optimizations are needed for extremely large models, such as those used in state-of-the-art transformer architectures.
Domain-specific Adaptation: Tailoring subnetwork inference strategies to domain-specific characteristics could provide additional gains in performance and applicability.

In summary, the subnetwork inference framework offers a promising approach to tackling the challenges of Bayesian deep learning in large-scale models. By focusing on a strategic subset of weights, it enables the use of sophisticated Bayesian methods that were previously considered infeasible for practical applications due to computational constraints.

PDF Markdown

Related Papers

GitHub

GitHub - aleximmer/Laplace: Laplace approximations for Deep Learning. (471 stars)