Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
11 tokens/sec
GPT-4o
12 tokens/sec
Gemini 2.5 Pro Pro
40 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
37 tokens/sec
DeepSeek R1 via Azure Pro
33 tokens/sec
2000 character limit reached

SPot-the-Difference Self-Supervised Pre-training for Anomaly Detection and Segmentation (2207.14315v1)

Published 28 Jul 2022 in cs.CV

Abstract: Visual anomaly detection is commonly used in industrial quality inspection. In this paper, we present a new dataset as well as a new self-supervised learning method for ImageNet pre-training to improve anomaly detection and segmentation in 1-class and 2-class 5/10/high-shot training setups. We release the Visual Anomaly (VisA) Dataset consisting of 10,821 high-resolution color images (9,621 normal and 1,200 anomalous samples) covering 12 objects in 3 domains, making it the largest industrial anomaly detection dataset to date. Both image and pixel-level labels are provided. We also propose a new self-supervised framework - SPot-the-difference (SPD) - which can regularize contrastive self-supervised pre-training, such as SimSiam, MoCo and SimCLR, to be more suitable for anomaly detection tasks. Our experiments on VisA and MVTec-AD dataset show that SPD consistently improves these contrastive pre-training baselines and even the supervised pre-training. For example, SPD improves Area Under the Precision-Recall curve (AU-PR) for anomaly segmentation by 5.9% and 6.8% over SimSiam and supervised pre-training respectively in the 2-class high-shot regime. We open-source the project at http://github.com/amazon-research/spot-diff .

Citations (191)

Summary

  • The paper presents SPD, a novel self-supervised framework that uses the SmoothBlend augmentation to enhance local anomaly detection and segmentation.
  • It introduces the VisA dataset, the largest high-resolution collection with pixel-level annotations, advancing both 1-class and 2-class industrial quality inspection.
  • SPD improves performance in both high-shot and low-shot settings, achieving up to 6.8% AU-PR gains over baseline models and showing advantages in low-resource scenarios.

An Analysis of SPot-the-Difference Self-Supervised Pre-training for Anomaly Detection and Segmentation

The paper "SPot-the-Difference Self-Supervised Pre-training for Anomaly Detection and Segmentation" presents a novel approach and dataset for improving visual anomaly detection and segmentation within the domain of industrial quality inspection. This work addresses the challenges faced by defect detection in manufacturing, where anomalies are infrequent and often subtle, necessitating advanced models that effectively generalize across diverse defect types. The authors introduce a self-supervised framework, SPot-the-Difference (SPD), designed to complement and enhance existing contrastive self-supervised learning approaches such as SimSiam, MoCo, and SimCLR, and demonstrate its superior performance in both high-shot and low-shot learning environments.

Introduction of the VisA Dataset

A central contribution of this paper is the introduction of the Visual Anomaly (VisA) Dataset, comprising 10,821 high-resolution color images across 12 objects and three domains, making it the most extensive dataset of its kind. The VisA dataset provides pixel-level annotations, adding granularity essential for anomaly segmentation tasks. This dataset is structured to support both 1-class and 2-class training schemes, and specifically addresses the limitations of the MVTec-AD benchmark, offering a more challenging environment for model evaluation. The dataset proposes a basis for both high- and low-shot learning, reflecting real-world variations in available anomalous data within industrial contexts.

SPot-the-Difference (SPD) Framework

The SPD framework leverages a novel self-supervised methodology to augment existing SSL models with enhanced sensitivity to local anomalies, which is crucial given the subtle and minute nature of defects in industrial inspection tasks. SPD introduces a new augmentation technique, SmoothBlend, which creates challenging synthetic spot-the-difference scenarios by embedding local perturbations. This contrasts with standard global transformations used in previous SSL methods, thereby encouraging models to remain sensitive to small, local details pertinent to anomaly detection. The SPD method enhances both feature learning and anomaly sensitivity through the minimization of cosine similarity loss between locally and globally augmented images.

Experimental Results and Key Findings

The experimental validation presented in the paper shows that models pre-trained with SPD exhibit significant improvement in anomaly detection and segmentation tasks on both VisA and MVTec-AD datasets. The SPD-enhanced versions of SimSiam, MoCo, and SimCLR consistently outperform their baseline counterparts, with improvements up to 6.8% in AU-PR, especially notable in high-shot 2-class segmentation regimes. The efficacy of SPD is further pronounced in low-shot learning scenarios, where labeled anomalous data is scarce. Furthermore, SPD shows potential advantages over supervised pre-training in low-resource settings, suggesting the utility of self-supervised techniques in situations with limited labeled data.

Implications and Future Research

The introduction of SPD and the VisA dataset has several implications for the field of computer vision applied to anomaly detection. The authors' approach elucidates the role of local feature sensitivity and enriches the context for employing self-supervised learning in industrial inspection environments. Future developments could explore the adaptation of SPD to other domains where anomaly detection is key, such as medical imaging or security surveillance. Additionally, the versatility of SPD in extending beyond pre-training and into direct applications remains an open area for exploration.

Overall, the present paper lays groundwork for future advancements, broadening the scope of self-supervised learning applications and datasets in anomaly detection, and highlighting the impact of nuanced dataset collection and innovative pre-training strategies in enhancing AI model performance in specialized tasks.