Modeling the Distribution of Normal Data in Pre-Trained Deep Features for Anomaly Detection (2005.14140v2)

Published 28 May 2020 in cs.CV

Abstract: Anomaly Detection (AD) in images is a fundamental computer vision problem and refers to identifying images and image substructures that deviate significantly from the norm. Popular AD algorithms commonly try to learn a model of normality from scratch using task specific datasets, but are limited to semi-supervised approaches employing mostly normal data due to the inaccessibility of anomalies on a large scale combined with the ambiguous nature of anomaly appearance. We follow an alternative approach and demonstrate that deep feature representations learned by discriminative models on large natural image datasets are well suited to describe normality and detect even subtle anomalies in a transfer learning setting. Our model of normality is established by fitting a multivariate Gaussian (MVG) to deep feature representations of classification networks trained on ImageNet using normal data only. By subsequently applying the Mahalanobis distance as the anomaly score we outperform the current state of the art on the public MVTec AD dataset, achieving an AUROC value of $95.8 \pm 1.2$ (mean $\pm$ SEM) over all 15 classes. We further investigate why the learned representations are discriminative to the AD task using Principal Component Analysis. We find that the principal components containing little variance in normal data are the ones crucial for discriminating between normal and anomalous instances. This gives a possible explanation to the often sub-par performance of AD approaches trained from scratch using normal data only. By selectively fitting a MVG to these most relevant components only, we are able to further reduce model complexity while retaining AD performance. We also investigate setting the working point by selecting acceptable False Positive Rate thresholds based on the MVG assumption. Code available at https://github.com/ORippler/gaussian-ad-mvtec

PDF Abstract

Analysis of "Modeling the Distribution of Normal Data in Pre-Trained Deep Features for Anomaly Detection"

This paper presents a novel approach to image anomaly detection (AD) leveraging pre-trained deep neural networks, challenging the prevailing paradigm of training models from scratch. The authors focus on utilizing feature representations learned from large-scale natural image datasets to characterize normality, achieving superior performance over traditional methods by employing these deep features in a transfer learning context.

Methodology

The technique involves modeling the distribution of normal data using a multivariate Gaussian (MVG) distribution fitted to deep feature representations extracted from classification networks pre-trained on ImageNet. The Mahalanobis distance, utilized as an anomaly score, distinguishes normal from anomalous data. The authors explore the discriminative nature of deep features by analyzing their variance using Principal Component Analysis (PCA), discovering that principal components with minimal variance in normal data are key to discrimination. This contributes to their hypothesis that learning these discriminative features is non-trivial when trained solely on normal data.

Results

The approach delivers state-of-the-art results on the MVTec AD dataset, with an average Area Under the Receiver Operating Characteristic curve (AUROC) of $95.8 \pm 1.2\%$ across 15 classes, outperforming previous methods. Comparing features from various levels of neural networks, deeper feature representations yield the best performance, consistent with established hypotheses regarding abstraction levels necessary for modeling normality. Furthermore, dimensionality reduction via PCA and NPCA shows that excluding high-variance components in normal data maintains performance, supporting the authors’ claim regarding feature discriminability.

Implications

The findings suggest that anomaly detection systems should leverage pre-trained feature spaces rather than training from scratch when only normal instances are accessible, reaffirming the potential of transfer learning in AD. Interestingly, the assumption of a MVG distribution not only enhances performance but also provides a theoretical basis for setting working points based on false positive rate (FPR) expectations. This methodology aligns with the goals of scalable and generalizable anomaly detectors required in various domains, such as quality control and medical imaging, where anomalous instances are rare and ill-defined.

Future Directions

The paper opens several avenues for further work. Firstly, it proposes examining more complex data distributions, potentially extending the framework to Gaussian Mixture Models for multi-modal AD scenarios. Additionally, as the gap between source and target domains widen (e.g., medical imagery), methods that incorporate domain adaptation into pre-training could be explored. Another potential direction is integrating self-normalizing networks into the framework to enhance Gaussian distribution characteristics in learned features.

In summary, the paper provides a robust theoretical and practical approach to anomaly detection, successfully utilizing the power of pre-trained models, thus impacting potential future AI developments in anomaly detection frameworks and other related domains.

PDF Markdown Bookmark Chat (Pro)

Authors (3)

Oliver Rippel (8 papers)
Patrick Mertens (1 paper)
Dorit Merhof (75 papers)

Citations (217)

View on Semantic Scholar

Related Papers

Find Related Papers

GitHub

GitHub - ORippler/gaussian-ad-mvtec: Code underlying our publication "Modeling the Distribution of Normal Data in Pre-Trained Deep Features for Anomaly Detection" at ICPR2020 (96 stars)