FUN-AD: Fully Unsupervised Learning for Anomaly Detection with Noisy Training Data (2411.16110v1)

Published 25 Nov 2024 in cs.LG and cs.CV

Abstract: While the mainstream research in anomaly detection has mainly followed the one-class classification, practical industrial environments often incur noisy training data due to annotation errors or lack of labels for new or refurbished products. To address these issues, we propose a novel learning-based approach for fully unsupervised anomaly detection with unlabeled and potentially contaminated training data. Our method is motivated by two observations, that i) the pairwise feature distances between the normal samples are on average likely to be smaller than those between the anomaly samples or heterogeneous samples and ii) pairs of features mutually closest to each other are likely to be homogeneous pairs, which hold if the normal data has smaller variance than the anomaly data. Building on the first observation that nearest-neighbor distances can distinguish between confident normal samples and anomalies, we propose a pseudo-labeling strategy using an iteratively reconstructed memory bank (IRMB). The second observation is utilized as a new loss function to promote class-homogeneity between mutually closest pairs thereby reducing the ill-posedness of the task. Experimental results on two public industrial anomaly benchmarks and semantic anomaly examples validate the effectiveness of FUN-AD across different scenarios and anomaly-to-normal ratios. Our code is available at https://github.com/HY-Vision-Lab/FUNAD.

Authors (3)

Jiin Im (1 paper)
Yongho Son (1 paper)
Je Hyeong Hong (4 papers)

Summary

An Analysis of "FUN-AD: Fully Unsupervised Learning for Anomaly Detection with Noisy Training Data"

The paper "FUN-AD: Fully Unsupervised Learning for Anomaly Detection with Noisy Training Data" proposes a novel approach to effectively tackle anomaly detection challenges in industrial settings where training data is unlabeled and potentially contaminated with anomalies. This work stands out by addressing the prevalent issue of noisy training environments, a reality often overlooked by conventional one-class classification methods that assume availability of clean and fully labeled datasets.

Overview of the Proposed Methodology

The key contribution of this paper is the introduction of a fully unsupervised learning framework, FUN-AD, designed to enhance anomaly detection capabilities in scenarios where labeled data is unavailable. The method is inspired by the insight that a pair of normal samples tends to have smaller pairwise feature distances compared to those from heterogeneous pairs. This statistical observation is empirically validated, further reinforcing the feasibility of utilizing pairwise distances as a mechanism for pseudo-labeling.

The framework employs an Iteratively Re-constructed Memory Bank (IRMB) to capture features representative of the normal class, iteratively refining the distinction between normals and anomalies. Additionally, pseudo-labeling strategies are developed, leveraging nearest-neighbor search to assign patch-level pseudo labels, subsequently refined through the mutual smoothness loss that aligns anomaly scores in mutually closest feature pairs to reduce erroneous classifications.

Empirical Evaluation and Results

The efficacy of FUN-AD is impressive, as demonstrated by its performance on public industrial anomaly benchmarks such as MVTec AD and VisA. The method consistently exceeds current state-of-the-art approaches in both the presence and absence of training data contamination. For example, the paper reports AUROC improvements in both detection and localization tasks, cementing FUN-AD as a robust, adaptable solution for anomaly detection devoid of reliance on clean data acquisition.

Implications and Future Directions

The practical implications of this research are particularly significant for industries where updating and labeling datasets are both costly and labor-intensive. By leveraging the structure of feature spaces through fully unsupervised learning, FUN-AD provides a scalable solution adaptable to evolving industrial environments, such as manufacturing processes where product updates or refurbishments are common.

From a theoretical standpoint, this work opens several avenues for future exploration. The use of feature distance-based analysis for pseudo-labeling is a promising area, meriting further quantitative and empirical investigation. Moreover, while the focus is primarily on industrial datasets, adapting FUN-AD for semantic anomaly detection presents a compelling potential application, as hinted by preliminary results using datasets like CIFAR-10.

Conclusion

In conclusion, the paper offers meaningful advancements in anomaly detection in fully unsupervised settings, addressing core challenges in achieving reliable performance amidst noisy, unlabeled training data. The introduction of an iteratively refined learning strategy and novel loss functions to reinforce class homogeneity signposts significant progress in synthetic data utilization for anomaly detection tasks. While chiefly aimed at enhancing industrial anomaly detection, FUN-AD’s methodology presents a robust framework adaptable to multiple real-world applications where reliable label acquisition is a challenge.

PDF Markdown

Related Papers

GitHub

GitHub - HY-Vision-Lab/FUNAD: Official code for FUN-AD (4 stars)