A Simple Approach to Improve Single-Model Deep Uncertainty via Distance-Awareness (2205.00403v2)

Published 1 May 2022 in cs.LG and stat.ML

Abstract: Accurate uncertainty quantification is a major challenge in deep learning, as neural networks can make overconfident errors and assign high confidence predictions to out-of-distribution (OOD) inputs. The most popular approaches to estimate predictive uncertainty in deep learning are methods that combine predictions from multiple neural networks, such as Bayesian neural networks (BNNs) and deep ensembles. However their practicality in real-time, industrial-scale applications are limited due to the high memory and computational cost. Furthermore, ensembles and BNNs do not necessarily fix all the issues with the underlying member networks. In this work, we study principled approaches to improve uncertainty property of a single network, based on a single, deterministic representation. By formalizing the uncertainty quantification as a minimax learning problem, we first identify distance awareness, i.e., the model's ability to quantify the distance of a testing example from the training data, as a necessary condition for a DNN to achieve high-quality (i.e., minimax optimal) uncertainty estimation. We then propose Spectral-normalized Neural Gaussian Process (SNGP), a simple method that improves the distance-awareness ability of modern DNNs with two simple changes: (1) applying spectral normalization to hidden weights to enforce bi-Lipschitz smoothness in representations and (2) replacing the last output layer with a Gaussian process layer. On a suite of vision and language understanding benchmarks, SNGP outperforms other single-model approaches in prediction, calibration and out-of-domain detection. Furthermore, SNGP provides complementary benefits to popular techniques such as deep ensembles and data augmentation, making it a simple and scalable building block for probabilistic deep learning. Code is open-sourced at https://github.com/google/uncertainty-baselines

PDF Abstract

Distance-Aware Deep Neural Networks for Improved Uncertainty Estimation

The paper "A Simple Approach to Improve Single-Model Deep Uncertainty via Distance-Awareness" presents a methodology to enhance uncertainty quantification in deep neural networks (DNNs). This research operates under the premise that traditional ensemble methods such as Bayesian neural networks (BNNs) and deep ensembles, while effective in predictive uncertainty estimation, are often too resource-intensive for real-time applications. Instead, the paper focuses on refining a single-model approach to enhance uncertainty estimation using a deterministic representation.

Main Contributions

The authors introduce the notion of distance awareness as crucial for high-quality uncertainty estimation in DNNs. This concept requires the model to be cognizant of the distance between a test input and the training data. To incorporate this into a DNN, the authors propose the Spectral-normalized Neural Gaussian Process (SNGP), which enhances distance-awareness within modern DNNs through two major modifications:

Spectral Normalization - Applied to hidden weights to ensure bi-Lipschitz smoothness, thereby preserving distances in representation space.
Gaussian Process (GP) Output Layer - Utilized to replace the dense output layer, which inherently provides a scalable estimation of uncertainty.

These modifications render SNGP effective for improving prediction accuracy, calibration, and out-of-domain (OOD) detection across a variety of vision and language benchmarks using architectures like Wide-ResNet and BERT.

Numerical Results and Evaluation

The authors report that SNGP consistently outperforms other single-model approaches for uncertainty estimation, showing improvements in predictive quality and calibration on standard benchmarks such as CIFAR-10, CIFAR-100, and ImageNet. Moreover, it significantly enhances OOD detection capabilities, making it competitive with ensemble-based methods. Notably, the paper highlights SNGP’s scalability and its complementary benefits when used in conjunction with ensemble methods like MC Dropout and Deep Ensemble, thereby indicating its potential as a practical component in probabilistic deep learning frameworks.

Theoretical Implications

By framing uncertainty estimation as a minimax learning problem, the paper underscores distance awareness as a necessary condition for achieving optimal solutions. The SNGP model formalizes this concept and is supported by rigorous mathematical arguments demonstrating that distance preservation within the DNN's representation space is vital for reliable uncertainty quantification.

Practical and Theoretical Implications

From a practical perspective, SNGP’s approach simplifies uncertainty estimation in real-time applications by maintaining a single, deterministic network without the high computational overhead associated with ensemble-based methods. This makes it particularly relevant for on-device applications and situations with limited computational resources.

Theoretically, the paper highlights the importance of preserving input distances in the hidden layer representations to prevent feature collapse, addressing a common pitfall in existing deep learning models that leads to overconfidence in predictions. This contributes to a broader understanding of how architectural modifications can impart robust uncertainty estimation capabilities in deep learning models.

Future Directions

While SNGP addresses key aspects of deep uncertainty, future research could focus on further optimizing GP implementations for more efficient uncertainty estimation without compromising scalability. Moreover, investigating alternative or additional spectral regularization techniques might offer insights into improving model robustness even further. The interplay between distance-awareness and other probabilistic and representational strategies also presents a fertile avenue for continued exploration.

In summary, the proposed SNGP framework is a significant step forward in improving the uncertainty properties of single-model DNNs. By emphasizing distance-awareness, the paper offers a scalable and resource-efficient solution that aligns with practical application needs while contributing valuable theoretical insights into better uncertainty quantification methods.

PDF Markdown Bookmark Chat (Pro)

Authors (10)

Jeremiah Zhe Liu (15 papers)
Shreyas Padhy (18 papers)
Jie Ren (329 papers)
Zi Lin (19 papers)
Yeming Wen (14 papers)
Ghassen Jerfel (14 papers)
Zack Nado (1 paper)
Jasper Snoek (42 papers)
Dustin Tran (54 papers)
Balaji Lakshminarayanan (62 papers)

Citations (41)

View on Semantic Scholar

Related Papers

Find Related Papers

Tweets

https://twitter.com/ottamm_190/status/1524397975187128320

YouTube

Show All Videos