Deep Anomaly Detection with Outlier Exposure (1812.04606v3)

Published 11 Dec 2018 in cs.LG, cs.CL, cs.CV, and stat.ML

Abstract: It is important to detect anomalous inputs when deploying machine learning systems. The use of larger and more complex inputs in deep learning magnifies the difficulty of distinguishing between anomalous and in-distribution examples. At the same time, diverse image and text data are available in enormous quantities. We propose leveraging these data to improve deep anomaly detection by training anomaly detectors against an auxiliary dataset of outliers, an approach we call Outlier Exposure (OE). This enables anomaly detectors to generalize and detect unseen anomalies. In extensive experiments on natural language processing and small- and large-scale vision tasks, we find that Outlier Exposure significantly improves detection performance. We also observe that cutting-edge generative models trained on CIFAR-10 may assign higher likelihoods to SVHN images than to CIFAR-10 images; we use OE to mitigate this issue. We also analyze the flexibility and robustness of Outlier Exposure, and identify characteristics of the auxiliary dataset that improve performance.

Authors (3)

Dan Hendrycks (63 papers)
Mantas Mazeika (27 papers)
Thomas Dietterich (4 papers)

Citations (1,352)

View on Semantic Scholar

Summary

Deep Anomaly Detection with Outlier Exposure

The paper "Deep Anomaly Detection with Outlier Exposure" by Dan Hendrycks, Mantas Mazeika, and Thomas Dietterich discusses the development and empirical evaluation of a novel method for improving anomaly detection in machine learning systems. The authors propose leveraging auxiliary datasets of outliers, termed Outlier Exposure (OE), to train anomaly detectors to identify and manage out-of-distribution (OOD) examples effectively.

Introduction and Motivation

Anomaly detection is crucial for deploying robust machine learning systems, particularly in scenarios where models encounter data not represented in their training distribution. The paper identifies the challenge posed by the tendency of deep neural networks to produce high confidence predictions even for anomalous inputs, thereby underscoring the need for reliable anomaly detection mechanisms.

Methodology

The core idea introduced by the authors is OE, which improves anomaly detection by exposing the model to a diversified set of out-of-distribution samples during training. This approach contrasts with traditional methods that either rely exclusively on in-distribution data or employ synthetic anomalies. The model, through OE, learns to discriminate between in-distribution and out-of-distribution data, thereby generalizing to detect unseen anomalies.

Formally, if $\mathcal{D}_{in}$ represents the in-distribution dataset and $\mathcal{D}_{out}^{OE}$ denotes the outlier dataset used for training, the training objective is modified to:

$\mathbb{E}_{(x,y) \sim \mathcal{D}_{in}} [ \mathcal{L}(f(x), y) ] + \lambda \mathbb{E}_{x' \sim \mathcal{D}_{out}^{OE}} [ \mathcal{L}_{OE}(f(x'), f(x), y) ],$

where $\mathcal{L}_{OE}$ represents the OE-specific loss function, emphasizing learning lower confidence on outlier examples $x'$ .

Experimental Evaluation

The authors conduct extensive experiments across various domains including computer vision and natural language processing to validate the effectiveness of OE. The evaluation involves standard datasets for in-distribution and OOD detection: CIFAR-10, CIFAR-100, SVHN, Tiny ImageNet, Places365, and text datasets like 20 Newsgroups, TREC, and SST. The experiments leverage multiple OOD detectors, including the maximum softmax probability (MSP) and density estimation based methods like PixelCNN++.

Key Findings:

Improvement in Detection Performance: The application of OE consistently enhances the performance of OOD detectors, as shown by metrics like FPR95 (False Positive Rate at 95% True Positive Rate), AUROC (Area Under the Receiver Operating Characteristic Curve), and AUPR (Area Under the Precision-Recall Curve). For instance, the FPR95 for the SVHN dataset decreased significantly from 6.3% to 0.1% with OE.
Dominance over Synthetic Alternatives: Compared to using synthetic outliers, models trained with real, diverse datasets of anomalies through OE demonstrate superior performance, underscoring the efficacy of using realistic data.
Flexibility Across Domains: OE effectively improves the calibration and detection performance in both vision and NLP tasks, indicating its broad applicability. In particular, NLP experiments with datasets like 20 Newsgroups and TREC showed substantial improvements in anomaly detection metrics.
Calibration Enhancements: OE also advances the calibration of models in realistic settings where OOD data is present during test time. The authors adopt metrics like RMS Calibration Error and MAD Calibration Error to quantify improvements, with significant reductions in calibration errors observed post OE application.

Implications and Future Directions

The research has both practical and theoretical implications:

Practical Utility:

From an application standpoint, OE is computationally efficient and can be applied to enhance existing deployment pipelines by incorporating outlier datasets. This is particularly useful in operational environments requiring robust anomaly detection capabilities.

Theoretical Insights:

The paper offers a basis for understanding the generalization of anomaly detection techniques across unseen OOD distributions. Future work could explore the limits of such generalization, potentially investigating the impact of different types of auxiliary datasets or the repercussions of dataset overlap between training and unseen OOD examples.

Advancements in Calibration:

Improving calibration in the presence of OOD data remains an active area of research. OE provides an empirical foundation for developing new calibration techniques that account for both in-distribution and OOD data.

In conclusion, Outlier Exposure emerges as a simple yet powerful method to consistently enhance the performance of OOD detectors across various domains. The findings underscore the importance of leveraging real, diverse datasets of outliers, which enables models to generalize better and handle anomalies more effectively. The research opens avenues for further exploration into anomaly detection and model calibration, setting a new benchmark for practical and scalable solutions in machine learning deployment.

PDF Markdown

Related Papers

Find Related Papers

Tweets

https://twitter.com/tdietterich/status/1819067844363276582