What Are Bayesian Neural Network Posteriors Really Like? (2104.14421v1)

Published 29 Apr 2021 in cs.LG and stat.ML

Abstract: The posterior over Bayesian neural network (BNN) parameters is extremely high-dimensional and non-convex. For computational reasons, researchers approximate this posterior using inexpensive mini-batch methods such as mean-field variational inference or stochastic-gradient Markov chain Monte Carlo (SGMCMC). To investigate foundational questions in Bayesian deep learning, we instead use full-batch Hamiltonian Monte Carlo (HMC) on modern architectures. We show that (1) BNNs can achieve significant performance gains over standard training and deep ensembles; (2) a single long HMC chain can provide a comparable representation of the posterior to multiple shorter chains; (3) in contrast to recent studies, we find posterior tempering is not needed for near-optimal performance, with little evidence for a "cold posterior" effect, which we show is largely an artifact of data augmentation; (4) BMA performance is robust to the choice of prior scale, and relatively similar for diagonal Gaussian, mixture of Gaussian, and logistic priors; (5) Bayesian neural networks show surprisingly poor generalization under domain shift; (6) while cheaper alternatives such as deep ensembles and SGMCMC methods can provide good generalization, they provide distinct predictive distributions from HMC. Notably, deep ensemble predictive distributions are similarly close to HMC as standard SGLD, and closer than standard variational inference.

Citations (334)

View on Semantic Scholar

Summary

The paper reveals that full-batch HMC provides a precise estimation of BNN posteriors, leading to improved predictive performance over conventional methods.
The paper demonstrates that a single extended HMC chain effectively represents the posterior, challenging the need for multiple shorter chains.
The paper finds minimal evidence for the cold posterior effect and notes BNN challenges in generalizing under domain shifts.

Overview of "What Are Bayesian Neural Network Posteriors Really Like?"

The paper "What Are Bayesian Neural Network Posteriors Really Like?" provides an empirical investigation into the characteristics of Bayesian neural network (BNN) posteriors. By leveraging full-batch Hamiltonian Monte Carlo (HMC), the paper offers insights into how BNN posteriors behave under configurations that allow for a more precise estimation of the true posterior.

The authors argue that previous methods for approximating BNN posteriors, such as stochastic gradient MCMC (SGMCMC) and variational inference, may provide biased or simplified representations due to computational constraints. In contrast, they utilize HMC, a method known for generating samples asymptotically distributed as the true posterior, albeit at a significant computational cost.

Key Findings

Performance Gains with BNNs: The paper demonstrates that BNNs can achieve notable performance improvements over conventional training methods and even deep ensembles. This underscores the potential for BNNs to provide more accurate predictive distributions when appropriately sampled.
Long Chain Representation: A single extended chain of HMC iterations achieves a representation of the posterior that is comparable to multiple shorter chains. This finding suggests that HMC's convergent behavior is robust, at least under the studied conditions.
Absence of the Cold Posterior Effect: Contrary to some recent studies, the authors find minimal evidence for the so-called "cold posterior" effect. They attribute previous observations of this effect primarily to artifacts introduced by data augmentation, rather than an inherent need for posterior tempering.
Robustness to Prior Specification: The paper finds that the Bayesian Model Average (BMA) performance is relatively unaffected by variations in prior scale, with results being reasonably consistent across diagonal Gaussian, mixture of Gaussian, and logistic priors.
Challenges with Domain Shift: Although BNNs perform well for out-of-distribution (OOD) detection in some cases, they exhibit poor generalization under domain shifts. This is a significant observation, indicating potential limitations of BNNs in environments where the input data distribution changes.
Comparison with Practical Approximations: Easier methods, like deep ensembles and SGMCMC, offer predictive distributions distinct from those of HMC. Despite this difference, deep ensembles show a closer alignment with HMC than standard variational inference, which is often used in practice.

Implications and Speculations

The findings of this paper have several implications for both the theory and practice of Bayesian deep learning:

From a theoretical perspective, the results demonstrate the complexity of true BNN posteriors, challenging the assumptions that simpler methods adequately capture these distributions. They advocate for methods that can account for the complex, muti-modal landscape of neural network posteriors.
Practically, the insights gained from deploying HMC at scale stress the importance of computational investments in obtaining precise posterior samples. This is invaluable for calibrating more efficient yet approximate methods against a reliable reference.
The observations related to posterior tempering and prior robustness suggest that the development of new repriorization strategies might focus more on model architecture and function-space properties rather than overly intricate prior assumptions.
The noted vulnerability of BNNs to domain shifts highlights an area for further research. Developing strategies to enhance BNNs' resilience to such shifts could broaden their applicability in dynamic real-world environments.
The alignment of deep ensemble predictive distributions with those of HMC suggests a potential reinterpretation of deep ensembles within the Bayesian paradigms, indicating they might offer a pathway to more effectively approximate Bayesian predictions without full HMC computation.

The paper serves as both a benchmark and a call to action for future Bayesian deep learning research. With the clear elucidation of BNN posterior properties, it propels the field toward a more nuanced understanding of approximate inference methods and the true behavior of BNNs under different configurations. Further advances could include hybrid approaches that blend practical efficiency with improved posterior fidelity, leading the way towards robust AI models in uncertain domains.

PDF Markdown

Related Papers

YouTube

Show All Videos