A Simple Unified Framework for Detecting Out-of-Distribution Samples and Adversarial Attacks (1807.03888v2)

Published 10 Jul 2018 in stat.ML, cs.CR, and cs.LG

Abstract: Detecting test samples drawn sufficiently far away from the training distribution statistically or adversarially is a fundamental requirement for deploying a good classifier in many real-world machine learning applications. However, deep neural networks with the softmax classifier are known to produce highly overconfident posterior distributions even for such abnormal samples. In this paper, we propose a simple yet effective method for detecting any abnormal samples, which is applicable to any pre-trained softmax neural classifier. We obtain the class conditional Gaussian distributions with respect to (low- and upper-level) features of the deep models under Gaussian discriminant analysis, which result in a confidence score based on the Mahalanobis distance. While most prior methods have been evaluated for detecting either out-of-distribution or adversarial samples, but not both, the proposed method achieves the state-of-the-art performances for both cases in our experiments. Moreover, we found that our proposed method is more robust in harsh cases, e.g., when the training dataset has noisy labels or small number of samples. Finally, we show that the proposed method enjoys broader usage by applying it to class-incremental learning: whenever out-of-distribution samples are detected, our classification rule can incorporate new classes well without further training deep models.

Citations (1,836)

View on Semantic Scholar

Summary

The paper's main contribution is a Mahalanobis distance-based confidence score within a Gaussian Discriminant Analysis framework for robust detection of out-of-distribution samples and adversarial attacks.
The method leverages feature ensembles and input pre-processing, integrating seamlessly with pre-trained softmax classifiers without retraining.
Empirical results demonstrate significant improvements in true negative rates across datasets and attack types, ensuring enhanced model reliability in critical applications.

A Simple Unified Framework for Detecting Out-of-Distribution Samples and Adversarial Attacks

The paper by Kimin Lee et al. proposes a method to tackle a pressing issue in deploying robust machine learning models: the detection of out-of-distribution (OOD) samples and adversarial attacks. This is critical as deep neural networks (DNNs), despite their high accuracy, often produce overconfident predictions for such anomalous inputs, posing significant risks in applications like autonomous driving and secure authentication.

Key Contributions and Methodology

The authors introduce a confidence score based on the Mahalanobis distance, integrated within a Gaussian Discriminant Analysis (GDA) framework. The score detects abnormal samples by leveraging features at various levels of a pre-trained softmax neural classifier. The method stands out for its simplicity and its applicability to any pre-trained classifier without necessitating retraining. Here are the primary steps:

Generative Classifier under GDA: The method combines discriminative classifiers with a generative approach. By assuming class-conditional Gaussian distributions with tied covariance matrices, the authors fit the Mahalanobis distance metric to the deep feature space. The resulting generative classifier connects closely to the softmax classifier, ensuring compatibility without compromising classification accuracy.
Confidence Score using Mahalanobis Distance: The Mahalanobis distance is used to measure the likeliness of a test sample belonging to a class-conditional distribution. This metric effectively mitigates the issue of overconfidence, characteristic of conventional softmax scores.
Input Pre-processing and Feature Ensemble: To enhance the effectiveness, the authors employ input pre-processing by adding small perturbations designed to inflate the confidence score. Additionally, a feature ensemble approach aggregates confidence scores from multiple layers within the model. These calibrations further bolster the robustness and accuracy of the detection mechanism.
Application in Class-Incremental Learning: Beyond detection, the proposed framework aids in class-incremental learning by updating class means and shared covariance, facilitating the seamless integration of new classes into the model without retraining.

Experimental Validation

The efficacy of the proposed method is empirically validated using DenseNet and ResNet models across various datasets, including CIFAR-10, CIFAR-100, SVHN, ImageNet, and LSUN. The key findings are as follows:

OOD Detection: The proposed method consistently outperformed existing approaches like ODIN. Notably, on CIFAR-100 with LSUN as OOD, the true negative rate improved significantly from 45.6% to 90.9%.
Adversarial Sample Detection: Against adversarial attacks generated by various algorithms (e.g., FGSM, BIM, DeepFool, and CW), the method demonstrated superior performance. For example, on ResNet with CIFAR-10 under CW attacks, the true negative rate saw a marked increase from 82.9% to 95.8%.
Robustness: The method exhibited robustness against scenarios involving noisy labels and small training datasets. It was also shown to perform well even when hyperparameters were tuned only using in-distribution and adversarial samples.
Class-Incremental Learning: The method facilitated efficient addition of new classes to a pre-trained classifier, outperforming alternatives such as Euclidean distance-based classifiers and retrained softmax classifiers.

Implications and Future Work

The implications of this research are manifold. Practically, it enhances the reliability of real-world machine learning applications by providing a robust mechanism for detecting out-of-distribution and adversarial samples. Theoretically, it bridges the gap between discriminative and generative modeling, presenting a unified framework that leverages the strengths of both.

Future research could explore the integration of this framework with other forms of generative models or its application in different domains such as speech recognition or natural language processing. Additionally, the framework's potential to synergize with active learning, ensemble learning, and few-shot learning merits further investigation.

In conclusion, the proposed Mahalanobis distance-based confidence score within a GDA framework represents a significant step toward deploying more secure and reliable machine learning systems, ensuring that models can effectively recognize and handle anomalous inputs.

PDF Markdown

Related Papers

GitHub

GitHub - pokaxpoka/deep_Mahalanobis_detector: Code for the paper "A Simple Unified Framework for Detecting Out-of-Distribution Samples and Adversarial Attacks". (346 stars)