AnomalyGPT: Detecting Industrial Anomalies Using Large Vision-Language Models (2308.15366v4)

Published 29 Aug 2023 in cs.CV

Abstract: Large Vision-LLMs (LVLMs) such as MiniGPT-4 and LLaVA have demonstrated the capability of understanding images and achieved remarkable performance in various visual tasks. Despite their strong abilities in recognizing common objects due to extensive training datasets, they lack specific domain knowledge and have a weaker understanding of localized details within objects, which hinders their effectiveness in the Industrial Anomaly Detection (IAD) task. On the other hand, most existing IAD methods only provide anomaly scores and necessitate the manual setting of thresholds to distinguish between normal and abnormal samples, which restricts their practical implementation. In this paper, we explore the utilization of LVLM to address the IAD problem and propose AnomalyGPT, a novel IAD approach based on LVLM. We generate training data by simulating anomalous images and producing corresponding textual descriptions for each image. We also employ an image decoder to provide fine-grained semantic and design a prompt learner to fine-tune the LVLM using prompt embeddings. Our AnomalyGPT eliminates the need for manual threshold adjustments, thus directly assesses the presence and locations of anomalies. Additionally, AnomalyGPT supports multi-turn dialogues and exhibits impressive few-shot in-context learning capabilities. With only one normal shot, AnomalyGPT achieves the state-of-the-art performance with an accuracy of 86.1%, an image-level AUC of 94.1%, and a pixel-level AUC of 95.3% on the MVTec-AD dataset. Code is available at https://github.com/CASIA-IVA-Lab/AnomalyGPT.

PDF HTML Abstract

AnomalyGPT: Detecting Industrial Anomalies Using Large Vision-LLMs

The paper "AnomalyGPT: Detecting Industrial Anomalies Using Large Vision-LLMs" introduces a novel approach to industrial anomaly detection utilizing Large Vision-LLMs (LVLMs). The paper leverages advancements in LVLMs like MiniGPT-4 and LLaVA, extending their capabilities beyond general recognition tasks to address the specific challenges of Industrial Anomaly Detection (IAD).

Key Contributions

AnomalyGPT combines LVLMs with an innovative strategy to effectively detect and localize anomalies in industrial contexts. Unlike conventional IAD methods that rely on manual threshold settings for anomaly detection, AnomalyGPT utilizes LVLMs, which inherently support natural dialogue interactions and exhibit a robust understanding of image content. This eliminates the need for subjective threshold adjustments and enhances practicality in real-world applications.

LVLM-Based IAD Framework: AnomalyGPT introduces a framework wherein LVLMs are fine-tuned using a combination of simulated anomaly data and prompt learning. This enables the model to detect and localize anomalies without needing predefined anomaly thresholds. The incorporation of LVLMs in the process allows the system to interpret complex image-text relationships effectively.
Enhanced Few-Shot Learning: The proposed framework supports few-shot learning, allowing it to adapt quickly to new or unseen industrial objects with minimal labeled data. In experimental evaluations, AnomalyGPT achieves state-of-the-art results in few-shot settings with a single normal image, reaching an accuracy of 86.1%, an image-level AUC of 94.1%, and a pixel-level AUC of 95.3% on the MVTec-AD dataset.
Discussion of Limitations and Improvements: While LVLMs are powerful, they lack specific domain knowledge and sensitivity to localized details. AnomalyGPT addresses this by integrating a decoder module for finer semantic analysis and utilizing prompt embeddings to enhance the contextual understanding within the LVLM. This approach mitigates issues related to data scarcity and overfitting, often seen in domain-specific training.

Numerical Results and Discussion

The methodology of AnomalyGPT was tested on two widely recognized datasets: MVTec-AD and VisA, demonstrating superior performance in both unsupervised and few-shot scenarios. The paper reports an image-level AUC of 97.4% and pixel-level AUC of 93.1% in unsupervised settings on the MVTec-AD dataset, underscoring its efficacy and transferability across different industrial contexts.

Implications and Future Directions

AnomalyGPT's integration of advanced LVLMs into IAD presents significant implications for the evolution of automated quality control in manufacturing. By transcending the limitations of conventional anomaly detection methods, AnomalyGPT offers enhanced adaptability, reduced reliance on extensive labeled datasets, and the ability to engage in meaningful dialogues about image content and anomalies.

From a theoretical perspective, this paper opens avenues for exploring the convergence of vision, language, and anomaly detection. Future developments may involve enhancing the model's ability to learn from fewer samples while maintaining accuracy, further improving its application in dynamic production environments.

Conclusion

AnomalyGPT represents a significant step towards leveraging the capabilities of LVLMs in industrial applications. By addressing specific challenges in IAD with an innovative combination of anomaly simulation, prompt learning, and advanced LVLM features, the methodology sets a new standard for practical, robust, and scalable industrial anomaly detection solutions. This research exemplifies the potential for LVLMs to transcend conventional limitations and underscores the opportunities for further exploration in related domains.

PDF Markdown Bookmark Chat (Pro)

References (37)

Authors (6)

Zhaopeng Gu (4 papers)
Bingke Zhu (13 papers)
Guibo Zhu (18 papers)
Yingying Chen (37 papers)
Ming Tang (199 papers)
Jinqiao Wang (76 papers)

Citations (64)

View on Semantic Scholar

AnomalyGPT: Detecting Industrial Anomalies Using Large Vision-Language Models (2308.15366v4)

AnomalyGPT: Detecting Industrial Anomalies Using Large Vision-LLMs

Key Contributions

Numerical Results and Discussion

Implications and Future Directions

Conclusion

Related Papers

GitHub

YouTube