On the Importance of Gradients for Detecting Distributional Shifts in the Wild (2110.00218v2)

Published 1 Oct 2021 in cs.LG

Abstract: Detecting out-of-distribution (OOD) data has become a critical component in ensuring the safe deployment of machine learning models in the real world. Existing OOD detection approaches primarily rely on the output or feature space for deriving OOD scores, while largely overlooking information from the gradient space. In this paper, we present GradNorm, a simple and effective approach for detecting OOD inputs by utilizing information extracted from the gradient space. GradNorm directly employs the vector norm of gradients, backpropagated from the KL divergence between the softmax output and a uniform probability distribution. Our key idea is that the magnitude of gradients is higher for in-distribution (ID) data than that for OOD data, making it informative for OOD detection. GradNorm demonstrates superior performance, reducing the average FPR95 by up to 16.33% compared to the previous best method.

Authors (3)

Rui Huang (128 papers)
Andrew Geng (7 papers)
Yixuan Li (183 papers)

Citations (289)

View on Semantic Scholar

Summary

The paper presents GradNorm, a novel OOD detection method that exploits gradient norms from KL divergence to distinguish between in-distribution and out-of-distribution data.
It achieves up to a 16.33% reduction in FPR95 on ImageNet compared to methods like MSP, ODIN, and Mahalanobis through extensive empirical evaluation.
Its efficient implementation using only the last neural network layer and temperature scaling offers a scalable, post hoc solution for enhancing AI model reliability.

An Examination of GradNorm for Detecting Out-of-Distribution Inputs via Gradient Norms

Detecting out-of-distribution (OOD) data is paramount for ensuring the real-world reliability of machine learning models. Huang, Geng, and Li confront the conventional approach in OOD detection that traditionally leans heavily on output or feature space, largely overlooking the potential of gradient space. Their paper introduces GradNorm, a novel, straightforward method which leverages gradient information for OOD input detection by utilizing gradient norms backpropagated from the KL divergence between softmax outputs and a uniform probability distribution.

The key insight driving GradNorm is the observation that the vector norm of gradients is consistently higher for in-distribution (ID) data compared to OOD data. Through extensive empirical evaluation, they establish that GradNorm can achieve up to a 16.33% reduction in false positive rate (FPR95) on the ImageNet benchmark compared to the best existing methods, demonstrating its efficacy and robustness.

Evaluation and Results

The paper deploys GradNorm on a large-scale ImageNet benchmark involving diverse OOD dataset domains including iNaturalist, SUN, Places, and Textures. The results manifestly show that the proposed method outperforms mainstream methods like MSP, ODIN, Mahalanobis, and Energy-based approaches. With a focus on post hoc methods that use pre-trained models without the need for auxiliary training data, GradNorm decreases the average FPR95 drastically, setting a new benchmark in effectiveness.

Empirical and Theoretical Analysis

GradNorm's recommended implementation involves gradient evaluation limited to the neural network's last layer, enhancing computational efficiency without compromising on detection capabilities. Theoretical analysis supports these empirical findings by showcasing how GradNorm captures combined information from both feature and output space to improve ID and OOD data separability.

The paper also explores the role of temperature scaling, different Lp-norms, and distinct neural network parameters, all contributing to nuanced performances in OOD detection. Importantly, by using a uniform target for computing the KL divergence instead of a one-hot target, GradNorm capitalizes on uncertainty across all categories, thereby improving OOD input detectability.

Discussion on Related Work

The approach significantly diverges from prior works such as ODIN and Lee's pseudo OOD utilization, offering benefits in model simplicity and computational efficiency by not relying on explicit training with designated OOD samples. By focusing explicitly on the magnitudes of gradients, the work opens a new trajectory in OOD detection that skillfully uses gradient space to furnish valuable insights into model behavior on unfamiliar data inputs.

Implications and Future Directions

In the context of AI deployment across critical applications, from healthcare to autonomous systems, the practical implications of such robust OOD detection methodologies cannot be overstated. GradNorm provides a scalable and efficient foundation on which future research may build, exploring gradient space and its potential to drive forward more advanced forms of uncertainty quantification.

As deep learning models scale and diversify, the significance of methodologies like GradNorm will likely resonate more profoundly, inviting further investigation into leveraging gradient-driven approaches for other aspects of AI and deep learning.

PDF Markdown

Related Papers

GitHub

GitHub - deeplearning-wisc/gradnorm_ood: On the Importance of Gradients for Detecting Distributional Shifts in the Wild (56 stars)