Evaluating Scalable Bayesian Deep Learning Methods for Robust Computer Vision (1906.01620v3)

Published 4 Jun 2019 in cs.LG, cs.CV, and stat.ML

Abstract: While deep neural networks have become the go-to approach in computer vision, the vast majority of these models fail to properly capture the uncertainty inherent in their predictions. Estimating this predictive uncertainty can be crucial, for example in automotive applications. In Bayesian deep learning, predictive uncertainty is commonly decomposed into the distinct types of aleatoric and epistemic uncertainty. The former can be estimated by letting a neural network output the parameters of a certain probability distribution. Epistemic uncertainty estimation is a more challenging problem, and while different scalable methods recently have emerged, no extensive comparison has been performed in a real-world setting. We therefore accept this task and propose a comprehensive evaluation framework for scalable epistemic uncertainty estimation methods in deep learning. Our proposed framework is specifically designed to test the robustness required in real-world computer vision applications. We also apply this framework to provide the first properly extensive and conclusive comparison of the two current state-of-the-art scalable methods: ensembling and MC-dropout. Our comparison demonstrates that ensembling consistently provides more reliable and practically useful uncertainty estimates. Code is available at https://github.com/fregu856/evaluating_bdl.

PDF Abstract

An Expert Overview of "Evaluating Scalable Bayesian Deep Learning Methods for Robust Computer Vision"

The paper "Evaluating Scalable Bayesian Deep Learning Methods for Robust Computer Vision" by Gustafsson, Danelljan, and Schon undertakes a comprehensive evaluation of Bayesian approaches to model uncertainty in deep neural networks (DNNs) specific to computer vision tasks. The focus of the paper lies on assessing scalable methods for estimating epistemic uncertainty—an endeavor necessitated by the critical requirement for reliable uncertainty estimates in safety-critical real-world applications such as autonomous driving.

Key Focus and Methodology

The authors provide a thorough comparison of two prominent scalable methods for epistemic uncertainty estimation in DNNs: ensembling and MC-dropout. Unlike previous approaches that have either focused on small-scale experiments or limited settings, this paper implements a robust evaluation framework that includes both regression and classification tasks. This framework is applied to tasks in semantic segmentation and depth completion to ensure robust performance under real-world conditions.

A notable methodological choice involves training networks exclusively on synthetic datasets like Virtual KITTI and Synscapes, followed by evaluation on real-world datasets such as KITTI depth completion and Cityscapes. This simulates domain shifts consistent with practical automotive applications where out-of-distribution scenarios are inevitable.

Results and Comparative Analysis

The evaluation demonstrates that ensembling consistently outperforms MC-dropout across multiple metrics, including AUSE (Area Under the Sparsification Error) and AUCE (Area Under the Calibration Error). In regression tasks, DNN models are configured to predict Gaussian parameters, while classification tasks leverage Categorical distributions, allowing robust comparisons in evaluating aleatoric and epistemic uncertainties.

Calibration results, captured through metrics like ECE (Expected Calibration Error), further underscore ensembling's robustness, revealing improvements with increases in the number of ensemble members. In contrast, MC-dropout's efficacy deteriorates or stagnates under the same conditions, suggesting limitations in capturing parameter space uncertainty.

Implications and Future Directions

The findings imply that ensembling offers a pragmatic balance of accuracy and uncertainty estimation in DNNs for real-world computer vision applications. These results are of significant relevance where predictive reliability is paramount, such as in autonomous vehicles navigating diverse environmental conditions.

Practically, the paper suggests that ensembling's systematic exploration of model parameter space—due to diverse initialization strategies—allows it to better approximate the posterior distribution of DNN parameters. This makes ensembling a promising route for future developments in scalable uncertainty estimation. In terms of model deployment, however, the computational overhead associated with deploying multiple ensemble members concurrently remains a non-trivial challenge. Future research might focus on optimizing this complexity through techniques such as model pruning or shared architecture components.

Thus, this comprehensive comparison not only settles the relative merits of current epistemic uncertainty estimation approaches in DNNs but also lays a foundation for developing enhanced scalable methods aimed at tackling the burgeoning demand for robust AI in critical applications. The paper, by providing a publicly available code repository, encourages further exploration and validation by the community. This openness to collaborative scrutiny is likely to propel advancements in Bayesian deep learning systems tailored for increasingly complex vision tasks.

PDF Markdown Bookmark Chat (Pro)

Authors (3)

Fredrik K. Gustafsson (17 papers)
Martin Danelljan (96 papers)
Thomas B. Schön (132 papers)

Citations (285)

View on Semantic Scholar

Related Papers

Find Related Papers

GitHub

GitHub - fregu856/evaluating_bdl: Official implementation of "Evaluating Scalable Bayesian Deep Learning Methods for Robust Computer Vision", CVPR Workshops 2020. (131 stars)