Bayesian Deep Ensembles via the Neural Tangent Kernel (2007.05864v2)

Published 11 Jul 2020 in stat.ML and cs.LG

Abstract: We explore the link between deep ensembles and Gaussian processes (GPs) through the lens of the Neural Tangent Kernel (NTK): a recent development in understanding the training dynamics of wide neural networks (NNs). Previous work has shown that even in the infinite width limit, when NNs become GPs, there is no GP posterior interpretation to a deep ensemble trained with squared error loss. We introduce a simple modification to standard deep ensembles training, through addition of a computationally-tractable, randomised and untrainable function to each ensemble member, that enables a posterior interpretation in the infinite width limit. When ensembled together, our trained NNs give an approximation to a posterior predictive distribution, and we prove that our Bayesian deep ensembles make more conservative predictions than standard deep ensembles in the infinite width limit. Finally, using finite width NNs we demonstrate that our Bayesian deep ensembles faithfully emulate the analytic posterior predictive when available, and can outperform standard deep ensembles in various out-of-distribution settings, for both regression and classification tasks.

Authors (3)

Bobby He (10 papers)
Balaji Lakshminarayanan (62 papers)
Yee Whye Teh (162 papers)

Citations (108)

View on Semantic Scholar

Summary

Bayesian Deep Ensembles via the Neural Tangent Kernel

The paper "Bayesian Deep Ensembles via the Neural Tangent Kernel" presents a novel approach to bridging the gap between deep ensembles and Gaussian processes (GPs) by utilizing the Neural Tangent Kernel (NTK). This work contributes to ongoing discussions in the field of Bayesian deep learning, where understanding and improving uncertainty quantification and out-of-distribution (OOD) robustness remain crucial.

Main Contributions

The authors propose a modification to standard deep ensemble training to enable a posterior interpretation in the infinite width limit. Traditionally, a deep ensemble lacks a clear Bayesian rationale, which has been a fundamental issue in aligning empirical performance with theoretical understanding. By introducing an additional random, untrainable component to each ensemble member, this paper establishes a link between the ensemble outputs and the posterior predictive distribution, akin to that used in GPs.

Technical Overview

Neural Tangent Kernel (NTK): The NTK provides a framework for analyzing wide neural networks, connecting them with kernels seen in GP literature. While previous work highlighted the equivalence between infinitely wide NNs and GPs with respect to their priors, the paper demonstrates that attaining a posterior interpretation necessitates further modifications.
Modification for Bayesian Interpretation: The proposed method involves adding a deterministic, randomized function to the ensemble members. This addition reinitializes the neural network function, allowing it to emulate the NTKGP posterior in the infinite width limit. This new training paradigm is shown to produce conservative predictions relative to standard ensembles.
Strong Numerical Results: Empirical evaluations exhibit scenarios wherein NTKGP-trained ensembles outperform traditional counterparts, notably under distributional shifts. The experiments include heteroscedastic regression tasks and classification problems with OOD data, demonstrating the practical advantages of conservative uncertainty estimation.

Implications

The implications of this work are multi-faceted:

Practical Application: In settings characterized by dataset shifts and uncertain environments, the proposed approach offers a theoretically-supported method for improved uncertainty quantification. The ability to control predictive conservatism based on NTK properties provides significant leverage in applications such as autonomous systems and medical diagnostics.
Theoretical Insight: The connection established between NTK and ensemble methods advances the theoretical robustness of deep learning models. By emulating GP posteriors, the work enriches our understanding of how to expand kernel methods' strengths into deep learning contexts.

Future Directions

Looking forward, several potential areas for further exploration emerge:

Extensions to Different Architectures: Given the dependence of NTK properties on elements such as activation functions and network depth, exploring how these factors can be tuned or selected to enhance specific prior beliefs presents opportunities for more profound insights.
Active and Reinforcement Learning: The enhanced uncertainty quantification may inform state and action distribution modeling, possibly integrating with active learning paradigms seeking optimal data acquisition strategies.

In conclusion, the paper delves deeply into the theoretical underpinnings necessary for transforming empirical successes of deep ensembles into an interpretable Bayesian format, yielding promising initial results. The extent to which these insights shape practical deployment and encourage theoretical exploration remains a pivotal question in the broader discourse on deep learning's forward march.

Related Papers

GitHub

GitHub - bobby-he/bayesian-ntk: Code to accompany paper 'Bayesian Deep Ensembles via the Neural Tangent Kernel' (26 stars)