Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

AnomalyGPT: Detecting Industrial Anomalies Using Large Vision-Language Models (2308.15366v4)

Published 29 Aug 2023 in cs.CV
AnomalyGPT: Detecting Industrial Anomalies Using Large Vision-Language Models

Abstract: Large Vision-LLMs (LVLMs) such as MiniGPT-4 and LLaVA have demonstrated the capability of understanding images and achieved remarkable performance in various visual tasks. Despite their strong abilities in recognizing common objects due to extensive training datasets, they lack specific domain knowledge and have a weaker understanding of localized details within objects, which hinders their effectiveness in the Industrial Anomaly Detection (IAD) task. On the other hand, most existing IAD methods only provide anomaly scores and necessitate the manual setting of thresholds to distinguish between normal and abnormal samples, which restricts their practical implementation. In this paper, we explore the utilization of LVLM to address the IAD problem and propose AnomalyGPT, a novel IAD approach based on LVLM. We generate training data by simulating anomalous images and producing corresponding textual descriptions for each image. We also employ an image decoder to provide fine-grained semantic and design a prompt learner to fine-tune the LVLM using prompt embeddings. Our AnomalyGPT eliminates the need for manual threshold adjustments, thus directly assesses the presence and locations of anomalies. Additionally, AnomalyGPT supports multi-turn dialogues and exhibits impressive few-shot in-context learning capabilities. With only one normal shot, AnomalyGPT achieves the state-of-the-art performance with an accuracy of 86.1%, an image-level AUC of 94.1%, and a pixel-level AUC of 95.3% on the MVTec-AD dataset. Code is available at https://github.com/CASIA-IVA-Lab/AnomalyGPT.

AnomalyGPT: Detecting Industrial Anomalies Using Large Vision-LLMs

The paper "AnomalyGPT: Detecting Industrial Anomalies Using Large Vision-LLMs" introduces a novel approach to industrial anomaly detection utilizing Large Vision-LLMs (LVLMs). The paper leverages advancements in LVLMs like MiniGPT-4 and LLaVA, extending their capabilities beyond general recognition tasks to address the specific challenges of Industrial Anomaly Detection (IAD).

Key Contributions

AnomalyGPT combines LVLMs with an innovative strategy to effectively detect and localize anomalies in industrial contexts. Unlike conventional IAD methods that rely on manual threshold settings for anomaly detection, AnomalyGPT utilizes LVLMs, which inherently support natural dialogue interactions and exhibit a robust understanding of image content. This eliminates the need for subjective threshold adjustments and enhances practicality in real-world applications.

  1. LVLM-Based IAD Framework: AnomalyGPT introduces a framework wherein LVLMs are fine-tuned using a combination of simulated anomaly data and prompt learning. This enables the model to detect and localize anomalies without needing predefined anomaly thresholds. The incorporation of LVLMs in the process allows the system to interpret complex image-text relationships effectively.
  2. Enhanced Few-Shot Learning: The proposed framework supports few-shot learning, allowing it to adapt quickly to new or unseen industrial objects with minimal labeled data. In experimental evaluations, AnomalyGPT achieves state-of-the-art results in few-shot settings with a single normal image, reaching an accuracy of 86.1%, an image-level AUC of 94.1%, and a pixel-level AUC of 95.3% on the MVTec-AD dataset.
  3. Discussion of Limitations and Improvements: While LVLMs are powerful, they lack specific domain knowledge and sensitivity to localized details. AnomalyGPT addresses this by integrating a decoder module for finer semantic analysis and utilizing prompt embeddings to enhance the contextual understanding within the LVLM. This approach mitigates issues related to data scarcity and overfitting, often seen in domain-specific training.

Numerical Results and Discussion

The methodology of AnomalyGPT was tested on two widely recognized datasets: MVTec-AD and VisA, demonstrating superior performance in both unsupervised and few-shot scenarios. The paper reports an image-level AUC of 97.4% and pixel-level AUC of 93.1% in unsupervised settings on the MVTec-AD dataset, underscoring its efficacy and transferability across different industrial contexts.

Implications and Future Directions

AnomalyGPT's integration of advanced LVLMs into IAD presents significant implications for the evolution of automated quality control in manufacturing. By transcending the limitations of conventional anomaly detection methods, AnomalyGPT offers enhanced adaptability, reduced reliance on extensive labeled datasets, and the ability to engage in meaningful dialogues about image content and anomalies.

From a theoretical perspective, this paper opens avenues for exploring the convergence of vision, language, and anomaly detection. Future developments may involve enhancing the model's ability to learn from fewer samples while maintaining accuracy, further improving its application in dynamic production environments.

Conclusion

AnomalyGPT represents a significant step towards leveraging the capabilities of LVLMs in industrial applications. By addressing specific challenges in IAD with an innovative combination of anomaly simulation, prompt learning, and advanced LVLM features, the methodology sets a new standard for practical, robust, and scalable industrial anomaly detection solutions. This research exemplifies the potential for LVLMs to transcend conventional limitations and underscores the opportunities for further exploration in related domains.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (37)
  1. Mvtec ad–a comprehensive real-world dataset for unsupervised anomaly detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9592–9600, 2019.
  2. A zero-/few-shot anomaly classification and segmentation method for cvpr 2023 vand workshop challenge tracks 1&2: 1st place on zero-shot ad and 4th place on few-shot ad. arXiv preprint arXiv:2305.17382, 2023.
  3. Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality. See https://vicuna. lmsys. org (accessed 14 April 2023), 2023.
  4. Scaling instruction-finetuned language models. arXiv preprint arXiv:2210.11416, 2022.
  5. Sub-image anomaly detection with deep pyramid correspondences. arXiv preprint arXiv:2005.02357, 2020.
  6. Padim: a patch distribution modeling framework for anomaly detection and localization. In International Conference on Pattern Recognition, pages 475–489. Springer, 2021.
  7. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
  8. Imagebind: One embedding space to bind them all. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15180–15190, 2023.
  9. Cflow-ad: Real-time unsupervised anomaly detection with localization via conditional normalizing flows. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 98–107, 2022.
  10. Registration based few-shot anomaly detection. In European Conference on Computer Vision, pages 303–319. Springer, 2022.
  11. Winclip: Zero-/few-shot anomaly classification and segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 19606–19616, 2023.
  12. Cfa: Coupled-hypersphere-based feature adaptation for target-oriented anomaly localization. IEEE Access, 10:78446–78454, 2022.
  13. Pyramidflow: High-resolution defect contrastive localization using pyramid normalizing flow. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14143–14152, 2023.
  14. Cutpaste: Self-supervised learning for anomaly detection and localization. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9664–9674, 2021.
  15. Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. arXiv preprint arXiv:2301.12597, 2023.
  16. Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision, pages 2980–2988, 2017.
  17. Visual instruction tuning. arXiv preprint arXiv:2304.08485, 2023.
  18. V-net: Fully convolutional neural networks for volumetric medical image segmentation. In 2016 fourth international conference on 3D vision (3DV), pages 565–571. Ieee, 2016.
  19. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744, 2022.
  20. Poisson image editing. In ACM SIGGRAPH 2003 Papers, pages 313–318. 2003.
  21. Inpainting transformer for anomaly detection. In International Conference on Image Analysis and Processing, pages 394–406. Springer, 2022.
  22. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
  23. Towards total recall in industrial anomaly detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14318–14328, 2022.
  24. Natural synthetic anomalies for self-supervised anomaly detection and localization. In European Conference on Computer Vision, pages 474–489. Springer, 2022.
  25. Pandagpt: One model to instruction-follow them all. arXiv preprint arXiv:2305.16355, 2023.
  26. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023.
  27. Visionllm: Large language model is also an open-ended decoder for vision-centric tasks. arXiv preprint arXiv:2305.11175, 2023.
  28. Anoddpm: Anomaly detection with denoising diffusion probabilistic models using simplex noise. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 650–656, 2022.
  29. Pushing the limits of fewshot anomaly detection in industry vision: Graphcore. arXiv preprint arXiv:2301.12082, 2023.
  30. Learning semantic context from normal samples for unsupervised anomaly detection. In Proceedings of the AAAI conference on artificial intelligence, pages 3110–3118, 2021.
  31. Patch svdd: Patch-level svdd for anomaly detection and segmentation. In Proceedings of the Asian conference on computer vision, 2020.
  32. A unified model for multi-class anomaly detection. Advances in Neural Information Processing Systems, 35:4571–4584, 2022.
  33. Reconstruction by inpainting for visual anomaly detection. Pattern Recognition, 112:107706, 2021.
  34. Ying Zhao. Just noticeable learning for unsupervised anomaly localization and detection. In 2022 IEEE International Conference on Multimedia and Expo (ICME), pages 01–06. IEEE, 2022.
  35. Ying Zhao. Omnial: A unified cnn framework for unsupervised anomaly localization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3924–3933, 2023.
  36. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:2304.10592, 2023.
  37. Spot-the-difference self-supervised pre-training for anomaly detection and segmentation. In European Conference on Computer Vision, pages 392–408. Springer, 2022.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Zhaopeng Gu (4 papers)
  2. Bingke Zhu (13 papers)
  3. Guibo Zhu (18 papers)
  4. Yingying Chen (37 papers)
  5. Ming Tang (199 papers)
  6. Jinqiao Wang (76 papers)
Citations (64)
Youtube Logo Streamline Icon: https://streamlinehq.com