Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
167 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

PaDiM: a Patch Distribution Modeling Framework for Anomaly Detection and Localization (2011.08785v1)

Published 17 Nov 2020 in cs.CV

Abstract: We present a new framework for Patch Distribution Modeling, PaDiM, to concurrently detect and localize anomalies in images in a one-class learning setting. PaDiM makes use of a pretrained convolutional neural network (CNN) for patch embedding, and of multivariate Gaussian distributions to get a probabilistic representation of the normal class. It also exploits correlations between the different semantic levels of CNN to better localize anomalies. PaDiM outperforms current state-of-the-art approaches for both anomaly detection and localization on the MVTec AD and STC datasets. To match real-world visual industrial inspection, we extend the evaluation protocol to assess performance of anomaly localization algorithms on non-aligned dataset. The state-of-the-art performance and low complexity of PaDiM make it a good candidate for many industrial applications.

Citations (701)

Summary

  • The paper introduces a CNN-based framework that models patch embeddings with a multivariate Gaussian to achieve effective anomaly detection and localization.
  • It employs a one-class learning paradigm and outperforms state-of-the-art methods on MVTec AD and ShanghaiTech Campus using AUROC and PRO-score metrics.
  • The method demonstrates scalability and robustness in both aligned and non-aligned images while reducing time and memory compared to K-NN approaches.

PaDiM: A Framework for Anomaly Detection and Localization

The paper introduces a novel framework, PaDiM (Patch Distribution Modeling), designed for anomaly detection and localization within images, particularly in an industrial context, and under a one-class learning paradigm. It leverages pretrained convolutional neural networks (CNNs) and employs multivariate Gaussian distributions to model the probabilistic characteristics of normal data, thus effectively detecting and localizing anomalies.

Methodology

The PaDiM framework employs a pretrained CNN to extract patch embeddings from input images. These embeddings capture varying semantic levels due to the hierarchical nature of CNN features. Through detailed experimentation, the authors show that maintaining correlations between these levels significantly enhances anomaly localization.

Key phases in the framework include:

  1. Embedding Extraction: Patch embeddings are derived from activation vectors at different layers of a pretrained CNN. Random dimensionality reduction is explored, proving to be superior to Principal Component Analysis (PCA) in preserving necessary discriminatory information.
  2. Learning Normality: Patch embeddings from training data, assumed to be normal, are used to estimate a multivariate Gaussian distribution for each patch position. This effectively captures the distribution of normal patch characteristics in the image.
  3. Anomaly Inference: At test time, the Mahalanobis distance measures how a patch embedding deviates from the learned Gaussian distribution. This distance determines the anomaly score, culminating in an anomaly map and a global image score.

Experimental Evaluation

The method is evaluated on two primary datasets: MVTec AD and ShanghaiTech Campus (STC), addressing both aligned and non-aligned scenarios. It outperforms state-of-the-art models in terms of anomaly localization and detection across these datasets.

  • Performance Metrics: The evaluation primarily uses AUROC and PRO-score metrics to assess performance, offering insights into pixel-wise anomaly localization efficacy and sensitivity to varying anomaly types.
  • Robustness to Non-Aligned Data: Unlike many current models, PaDiM demonstrates robustness to transformations such as random rotations and crops, maintaining its performance even when applied to non-aligned datasets.

Scalability and Practical Considerations

Beyond its superior performance, PaDiM offers favorable scalability properties. The method's time and memory requirements are significantly lower than those of nearest neighbor-based approaches like SPADE, especially on larger datasets. This efficiency emerges from the lightweight nature of Gaussian parameter storage compared to memory-intensive embedding repositories required by K-NN approaches.

Implications and Future Directions

PaDiM represents a significant contribution to the field of anomaly detection in industrial contexts. Its effectiveness in a one-class learning setting where anomalies are rare and diverse presents practical advantages for automated visual inspections. The framework's successful application to both aligned and non-aligned datasets suggests its potential broader applicability in various unstructured environments.

Further research could explore extensions of PaDiM to dynamic or video data, integrating temporal coherence with existing spatial models. Additionally, investigating its adaptability to other anomaly-rich domains, such as medical imaging, might yield productive advancements.

PaDiM's balance of performance, efficiency, and ease of deployment positions it as a compelling choice for numerous real-world applications demanding precise anomaly detection and localization.