Anomaly Detection via Reverse Distillation from One-Class Embedding (2201.10703v2)

Published 26 Jan 2022 in cs.CV

Abstract: Knowledge distillation (KD) achieves promising results on the challenging problem of unsupervised anomaly detection (AD).The representation discrepancy of anomalies in the teacher-student (T-S) model provides essential evidence for AD. However, using similar or identical architectures to build the teacher and student models in previous studies hinders the diversity of anomalous representations. To tackle this problem, we propose a novel T-S model consisting of a teacher encoder and a student decoder and introduce a simple yet effective "reverse distillation" paradigm accordingly. Instead of receiving raw images directly, the student network takes teacher model's one-class embedding as input and targets to restore the teacher's multiscale representations. Inherently, knowledge distillation in this study starts from abstract, high-level presentations to low-level features. In addition, we introduce a trainable one-class bottleneck embedding (OCBE) module in our T-S model. The obtained compact embedding effectively preserves essential information on normal patterns, but abandons anomaly perturbations. Extensive experimentation on AD and one-class novelty detection benchmarks shows that our method surpasses SOTA performance, demonstrating our proposed approach's effectiveness and generalizability.

Citations (365)

View on Semantic Scholar

Summary

The paper introduces a reverse distillation paradigm that uses a teacher encoder and student decoder to enhance anomaly detection sensitivity.
The proposed OCBE module compresses high-dimensional data into a one-class feature space, preserving normal patterns while discarding anomalies.
Evaluations on benchmarks like MVTec demonstrate significant AUROC and PRO improvements, underscoring the method’s superior performance.

Anomaly Detection via Reverse Distillation from One-Class Embedding

This paper presents a novel approach to unsupervised anomaly detection (AD) by leveraging a reverse distillation framework within a teacher-student (T-S) model architecture. The research aims to address the limitations associated with traditional knowledge distillation (KD) methods, which typically use similar architectures for both teacher and student networks, thereby limiting representation diversity needed for effective anomaly detection.

Key Contributions

Reverse Distillation Paradigm: The paper introduces a reverse distillation paradigm where the T-S model consists of a teacher encoder and a student decoder. Unlike conventional KD, which transfers knowledge from encoder to encoder, this approach involves transmitting information from high-level to low-level features, thereby augmenting representation discrepancy when encountering anomalies.
One-Class Bottleneck Embedding (OCBE): The authors introduce a trainable OCBE module that compresses high-dimensional data into a compact one-class feature space. This module enhances the model’s capability to retain essential normal pattern information while effectively discarding anomaly perturbations.
Demonstration of SOTA Performance: The proposed method has been extensively evaluated on AD and one-class novelty detection benchmarks, showing superior performance compared to existing state-of-the-art (SOTA) methods. The integration of the OCBE module is highlighted as a key factor in achieving these results, further emphasizing the method’s efficacy and generalizability.

Experimental Results

The reverse distillation framework has been tested on the MVTec anomaly detection dataset and other one-class novelty detection datasets. The results indicate that the method achieves leading performance metrics, with significant improvements in AUROC and PRO scores for anomaly localization tasks. The robustness of this approach is attributed to the distinct structural diversity between the teacher and student models, which fundamentally enhances the system’s sensitivity to deviations presented by anomalies.

Theoretical and Practical Implications

The reverse distillation approach sets a precedent for developing alternative KD structures that accommodate and harness diversity in representations for unsupervised AD tasks. This divergence from conventional KD aligns with the broader philosophy of leveraging neural network architecture diversity to address challenging problems in machine learning, particularly those where labeled anomalous data is scarce.

Practically, the enhanced discriminative power provided by this method can translate into more reliable and precise AD systems across various applications, such as industrial defect detection and medical out-of-distribution detection, where the identification and localization of anomalies are critical.

Future Directions

Building upon the findings of this paper, future research could explore further architectural variations within the reversible paradigm, possibly integrating other forms of representation learning techniques to enrich the feature extraction process. Additionally, investigating the scalability of the proposed system to handle larger and more diverse datasets could uncover broader applications, particularly in high-dimensional spaces encountered in real-world scenarios.

In summary, the paper introduces a novel direction for knowledge distillation-based anomaly detection, demonstrating the viability and advantages of reverse distillation and one-class compact representation in improving the reliability and accuracy of detecting unknown anomalies. This contribution not only advances the field of anomaly detection but also opens avenues for rethinking and redesigning T-S model architectures to better tackle diverse machine learning challenges.

PDF Markdown