Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

AnomalyCLIP: Object-agnostic Prompt Learning for Zero-shot Anomaly Detection (2310.18961v8)

Published 29 Oct 2023 in cs.CV

Abstract: Zero-shot anomaly detection (ZSAD) requires detection models trained using auxiliary data to detect anomalies without any training sample in a target dataset. It is a crucial task when training data is not accessible due to various concerns, eg, data privacy, yet it is challenging since the models need to generalize to anomalies across different domains where the appearance of foreground objects, abnormal regions, and background features, such as defects/tumors on different products/organs, can vary significantly. Recently large pre-trained vision-LLMs (VLMs), such as CLIP, have demonstrated strong zero-shot recognition ability in various vision tasks, including anomaly detection. However, their ZSAD performance is weak since the VLMs focus more on modeling the class semantics of the foreground objects rather than the abnormality/normality in the images. In this paper we introduce a novel approach, namely AnomalyCLIP, to adapt CLIP for accurate ZSAD across different domains. The key insight of AnomalyCLIP is to learn object-agnostic text prompts that capture generic normality and abnormality in an image regardless of its foreground objects. This allows our model to focus on the abnormal image regions rather than the object semantics, enabling generalized normality and abnormality recognition on diverse types of objects. Large-scale experiments on 17 real-world anomaly detection datasets show that AnomalyCLIP achieves superior zero-shot performance of detecting and segmenting anomalies in datasets of highly diverse class semantics from various defect inspection and medical imaging domains. Code will be made available at https://github.com/zqhang/AnomalyCLIP.

An Insightful Overview of "AnomalyCLIP: Object-agnostic Prompt Learning for Zero-shot Anomaly Detection"

The paper, "AnomalyCLIP: Object-agnostic Prompt Learning for Zero-shot Anomaly Detection", proposes a novel approach to enhance zero-shot anomaly detection (ZSAD) capabilities in vision-LLMs (VLMs) like CLIP. Anomaly detection is critical for applications where training data in the target domain are unavailable due to privacy reasons or domain novelty, and this paper addresses this problem by leveraging large pre-trained VLMs, which traditionally exhibit weak performance in zero-shot scenarios.

Core Contributions and Methodology

AnomalyCLIP introduces a pivotal idea of using object-agnostic prompt learning, where the main innovation lies in the generation of generic text prompts that capture both normality and abnormality within images, independent of specific foreground objects. The methodological strength of AnomalyCLIP is in its design of universally applicable prompts that disregard irrelevant object semantics, enhancing the model's ability to detect anomalies purely based on abnormal regions, thus enabling omni-domain adaptability.

To operationalize this, the paper employs a learnable prompt template strategy to refine these generic prompts and align them with both image-level and pixel-level features. The mechanism combines global and local context optimization, which together allow for the capturing of both overarching and fine-grained anomaly characteristics. Moreover, the introduction of a Diagonally Prominent Attention Map (DPAM) addresses the shortcoming of traditional attention mechanisms by refining local visual semantics, crucial for accurate anomaly segmentation.

Experimental Evaluation and Results

The research paper supports its claims with large-scale experimental validations across 17 real-world anomaly detection datasets, encompassing diverse domains such as industrial inspection and medical imaging. AnomalyCLIP demonstrates superior performance in both anomaly classification and segmentation tasks across datasets involving diverse object content and textures. Notably, the adoption of object-agnostic prompts allows the model to generalize effectively across vastly different domains, circumventing the need for domain-specific training data.

Implications and Future Directions

The implications of AnomalyCLIP are twofold. Practically, it sets a new benchmark for zero-shot anomaly detection tasks by achieving substantive performance without domain-specific fine-tuning. Theoretically, it challenges the conventional paradigm of anomaly detection by decoupling object-specific knowledge from the anomaly detection process, opening avenues for further exploration in unsupervised and self-supervised learning paradigms.

For future work, the authors suggest potential expansions in the scope of auxiliary datasets to further enhance the generalization capabilities and robustness of the model. Additionally, exploring the integration of AnomalyCLIP with other modalities such as text and audio could foster cross-modal anomaly detection systems, paving the way towards more holistic and robust AI applications.

In conclusion, "AnomalyCLIP" makes significant strides towards improving zero-shot anomaly detection by introducing a paradigm shift in how prompts are utilized to identify anomalies across diverse datasets, while maintaining applicability and efficiency. The work charts a promising course for the intersection of anomaly detection and large pre-trained models, potentially catalyzing future advancements in the field.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (60)
  1. Zero-shot versus many-shot: Unsupervised texture anomaly detection. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp.  5564–5572, 2023.
  2. Mvtec ad–a comprehensive real-world dataset for unsupervised anomaly detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.  9592–9600, 2019.
  3. Uninformed students: Student-teacher anomaly detection with discriminative latent embeddings. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.  4183–4192, 2020.
  4. Wm-dova maps for accurate polyp highlighting in colonoscopy: Validation vs. saliency maps from physicians. Computerized medical imaging and graphics, 43:99–111, 2015.
  5. Anomaly detection under distribution shift. arXiv preprint arXiv:2303.13845, 2023.
  6. A zero-/few-shot anomaly classification and segmentation method for cvpr 2023 vand workshop challenge tracks 1&2: 1st place on zero-shot ad and 4th place on few-shot ad. arXiv preprint arXiv:2305.17382, 2023.
  7. Deep one-class classification via interpolated gaussian descriptor. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pp.  383–392, 2022.
  8. Can ai help in screening viral and covid-19 pneumonia? IEEE Access, 8:132665–132676, 2020. doi: 10.1109/ACCESS.2020.3010287.
  9. Anomaly detection via reverse distillation from one-class embedding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  9737–9746, 2022.
  10. Catching both gray and black swans: Open-set supervised anomaly detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  7388–7398, 2022.
  11. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
  12. Zero-shot out-of-distribution detection based on the pre-trained model clip. In Proceedings of the AAAI conference on artificial intelligence, volume 36, pp.  6568–6576, 2022.
  13. Deep learning for medical anomaly detection–a survey. ACM Computing Surveys (CSUR), 54(7):1–37, 2021.
  14. Multi-task learning for thyroid nodule segmentation with thyroid region prior. In 2021 IEEE 18th international symposium on biomedical imaging (ISBI), pp.  257–261. IEEE, 2021.
  15. Anomalygpt: Detecting industrial anomalies using large vision-language models, 2023.
  16. Skin lesion analysis toward melanoma detection: A challenge at the international symposium on biomedical imaging (isbi) 2016, hosted by the international skin imaging collaboration (isic), 2016.
  17. A. Hamada. Br35h: Brain tumor detection 2020. Online. Available: https://www.kaggle.com/datasets/ahmedhamada0/brain-tumor-detection, 2020.
  18. The endotect 2020 challenge: evaluation and comparison of classification, segmentation and inference time for endoscopy. In Pattern Recognition. ICPR International Workshops and Challenges: Virtual Event, January 10-15, 2021, Proceedings, Part VIII, pp. 263–274. Springer, 2021.
  19. Registration based few-shot anomaly detection. In European Conference on Computer Vision, pp.  303–319. Springer, 2022.
  20. Winclip: Zero-/few-shot anomaly classification and segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  19606–19616, 2023.
  21. Deep learning-based defect detection of metal parts: evaluating current methods in complex conditions. In 2021 13th International congress on ultra modern telecommunications and control systems and workshops (ICUMT), pp.  66–71. IEEE, 2021.
  22. Kvasir-seg: A segmented polyp dataset. In MultiMedia Modeling: 26th International Conference, MMM 2020, Daejeon, South Korea, January 5–8, 2020, Proceedings, Part II 26, pp.  451–462. Springer, 2020.
  23. Visual prompt tuning. In European Conference on Computer Vision, pp.  709–727. Springer, 2022.
  24. Maple: Multi-modal prompt learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  19113–19122, 2023.
  25. Zegot: Zero-shot segmentation through optimal transport of text prompts. arXiv preprint arXiv:2301.12171, 2023.
  26. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  27. Segment anything. arXiv preprint arXiv:2304.02643, 2023.
  28. Zero-shot anomaly detection via batch normalization. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
  29. Dice loss for data-imbalanced nlp tasks. arXiv preprint arXiv:1911.02855, 2019.
  30. Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision, pp.  2980–2988, 2017.
  31. Clip-driven universal model for organ segmentation and tumor detection. arXiv preprint arXiv:2301.00785, 2023.
  32. Explainable deep one-class classification. arXiv preprint arXiv:2007.01760, 2020.
  33. Exposing outlier exposure: What can be learned from few, one, and zero outlier images. arXiv preprint arXiv:2205.11474, 2022.
  34. Vt-adl: A vision transformer network for image anomaly detection and localization. In 2021 IEEE 30th International Symposium on Industrial Electronics (ISIE), pp.  01–06. IEEE, 2021.
  35. Rgi: robust gan-inversion for mask-free image inpainting and unsupervised pixel-wise anomaly detection. In The Eleventh International Conference on Learning Representations, 2022.
  36. Explainable deep few-shot anomaly detection with deviation networks. arXiv preprint arXiv:2108.00462, 2021a.
  37. Deep learning for anomaly detection: A review. ACM computing surveys (CSUR), 54(2):1–38, 2021b.
  38. Medical image understanding with pretrained vision language models: A comprehensive study. arXiv preprint arXiv:2209.15517, 2022.
  39. Learning transferable visual models from natural language supervision. In International conference on machine learning, pp. 8748–8763. PMLR, 2021.
  40. Exploring the effect of image enhancement techniques on covid-19 detection using chest x-ray images. Computers in biology and medicine, 132:104319, 2021.
  41. Denseclip: Language-guided dense prediction with context-aware prompting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  18082–18091, 2022.
  42. Mean-shifted contrastive loss for anomaly detection. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pp.  2155–2162, 2023.
  43. Towards total recall in industrial anomaly detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  14318–14328, 2022.
  44. A unifying review of deep and shallow anomaly detection. Proceedings of the IEEE, 109(5):756–795, 2021.
  45. Clip for all things zero-shot sketch-based image retrieval, fine-grained or not. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  2765–2775, 2023.
  46. Multiresolution knowledge distillation for anomaly detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.  14902–14912, 2021.
  47. Dualcoop: Fast adaptation to multi-label recognition with limited annotations. Advances in Neural Information Processing Systems, 35:30569–30582, 2022.
  48. Segmentation-based deep-learning approach for surface-defect detection. Journal of Intelligent Manufacturing, 31(3):759–776, 2020.
  49. Automated polyp detection in colonoscopy videos using shape and context information. IEEE transactions on medical imaging, 35(2):630–644, 2015.
  50. Constrained contrastive distribution learning for unsupervised anomaly detection and localisation in medical images. In Medical Image Computing and Computer Assisted Intervention–MICCAI 2021: 24th International Conference, Strasbourg, France, September 27–October 1, 2021, Proceedings, Part V 24, pp.  128–140. Springer, 2021.
  51. Self-supervised pseudo multi-class pre-training for unsupervised anomaly detection and segmentation in medical images. Medical Image Analysis, pp.  102930, 2023.
  52. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  53. Weakly supervised learning for industrial optical inspection. In DAGM symposium in, volume 6, 2007.
  54. Aligning bag of regions for open-vocabulary object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  15254–15264, 2023.
  55. Pushing the limits of fewshot anomaly detection in industry vision: Graphcore. arXiv preprint arXiv:2301.12082, 2023.
  56. A unified model for multi-class anomaly detection. Advances in Neural Information Processing Systems, 35:4571–4584, 2022.
  57. Regionclip: Region-based language-image pretraining. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  16793–16803, 2022.
  58. Conditional prompt learning for vision-language models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  16816–16825, 2022a.
  59. Learning to prompt for vision-language models. International Journal of Computer Vision, 130(9):2337–2348, 2022b.
  60. Spot-the-difference self-supervised pre-training for anomaly detection and segmentation. In European Conference on Computer Vision, pp.  392–408. Springer, 2022.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Qihang Zhou (9 papers)
  2. Guansong Pang (82 papers)
  3. Yu Tian (249 papers)
  4. Shibo He (44 papers)
  5. Jiming Chen (105 papers)
Citations (67)
Github Logo Streamline Icon: https://streamlinehq.com
X Twitter Logo Streamline Icon: https://streamlinehq.com
Youtube Logo Streamline Icon: https://streamlinehq.com