Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

PromptAD: Learning Prompts with only Normal Samples for Few-Shot Anomaly Detection (2404.05231v2)

Published 8 Apr 2024 in cs.CV

Abstract: The vision-LLM has brought great improvement to few-shot industrial anomaly detection, which usually needs to design of hundreds of prompts through prompt engineering. For automated scenarios, we first use conventional prompt learning with many-class paradigm as the baseline to automatically learn prompts but found that it can not work well in one-class anomaly detection. To address the above problem, this paper proposes a one-class prompt learning method for few-shot anomaly detection, termed PromptAD. First, we propose semantic concatenation which can transpose normal prompts into anomaly prompts by concatenating normal prompts with anomaly suffixes, thus constructing a large number of negative samples used to guide prompt learning in one-class setting. Furthermore, to mitigate the training challenge caused by the absence of anomaly images, we introduce the concept of explicit anomaly margin, which is used to explicitly control the margin between normal prompt features and anomaly prompt features through a hyper-parameter. For image-level/pixel-level anomaly detection, PromptAD achieves first place in 11/12 few-shot settings on MVTec and VisA.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (61)
  1. Pni: industrial anomaly detection using position and neighborhood information. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 6373–6383, 2023.
  2. Vlmo: Unified vision-language pre-training with mixture-of-modality-experts. Advances in Neural Information Processing Systems, 35:32897–32912, 2022.
  3. Efficientad: Accurate visual anomaly detection at millisecond-level latencies. arXiv preprint arXiv:2303.14535, 2023.
  4. Mvtec ad–a comprehensive real-world dataset for unsupervised anomaly detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9592–9600, 2019.
  5. Uninformed students: Student-teacher anomaly detection with discriminative latent embeddings. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4183–4192, 2020.
  6. Beyond dents and scratches: Logical constraints in unsupervised anomaly detection and localization. International Journal of Computer Vision, 130(4):947–969, 2022.
  7. Segment any anomaly without training via hybrid prompt regularization. arXiv preprint arXiv:2305.10724, 2023.
  8. A simple framework for contrastive learning of visual representations. In International conference on machine learning, pages 1597–1607. PMLR, 2020.
  9. Semantic prompt for few-shot image recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 23581–23591, 2023a.
  10. A zero-/few-shot anomaly classification and segmentation method for cvpr 2023 vand workshop challenge tracks 1&2: 1st place on zero-shot ad and 4th place on few-shot ad. arXiv preprint arXiv:2305.17382, 2023b.
  11. Sub-image anomaly detection with deep pyramid correspondences. arXiv preprint arXiv:2005.02357, 2020.
  12. Padim: a patch distribution modeling framework for anomaly detection and localization. In International Conference on Pattern Recognition, pages 475–489. Springer, 2021.
  13. Anovl: Adapting vision-language models for unified zero-shot anomaly localization. arXiv preprint arXiv:2308.15939, 2023.
  14. Fastrecon: Few-shot industrial anomaly detection via fast feature reconstruction. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 17481–17490, 2023.
  15. Clip-adapter: Better vision-language models with feature adapters. International Journal of Computer Vision, pages 1–15, 2023.
  16. Remembering normality: Memory-guided knowledge distillation for unsupervised anomaly detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 16401–16409, 2023.
  17. Template-guided hierarchical feature restoration for anomaly detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 6447–6458, 2023.
  18. Momentum contrast for unsupervised visual representation learning. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13-19, 2020, pages 9726–9735. Computer Vision Foundation / IEEE, 2020.
  19. Registration based few-shot anomaly detection. In Computer Vision - ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23-27, 2022, Proceedings, Part XXIV, pages 303–319. Springer, 2022.
  20. Openclip, 2021.
  21. Winclip: Zero-/few-shot anomaly classification and segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 19606–19616, 2023.
  22. Deep learning-based defect detection of metal parts: evaluating current methods in complex conditions. In 2021 13th International congress on ultra modern telecommunications and control systems and workshops (ICUMT), pages 66–71. IEEE, 2021.
  23. Scaling up visual and vision-language representation learning with noisy text supervision. In International conference on machine learning, pages 4904–4916. PMLR, 2021.
  24. How can we know what language models know? Transactions of the Association for Computational Linguistics, 8:423–438, 2020.
  25. Maple: Multi-modal prompt learning. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023, Vancouver, BC, Canada, June 17-24, 2023, pages 19113–19122. IEEE, 2023.
  26. En-compactness: Self-distillation embedding & contrastive generation for generalized zero-shot learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9306–9315, 2022.
  27. Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation. In International Conference on Machine Learning, pages 12888–12900. PMLR, 2022a.
  28. Clip-reid: exploiting vision-language model for image re-identification without concrete text labels. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 1405–1413, 2023a.
  29. Vs-boost: Boosting visual-semantic association for generalized zero-shot learning. In Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, IJCAI 2023, 19th-25th August 2023, Macao, SAR, China, pages 1107–1115. ijcai.org, 2023b.
  30. Exploring visual interpretability for contrastive language-image pre-training. arXiv preprint arXiv:2209.07046, 2022b.
  31. Clip surgery for better explainability with enhancement in open-vocabulary tasks. arXiv preprint arXiv:2304.05653, 2023c.
  32. Deep dual consecutive network for human pose estimation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 525–534, 2021.
  33. Temporal feature alignment and mutual information maximization for video-based human pose estimation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11006–11016, 2022.
  34. Simplenet: A simple network for image anomaly detection and localization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20402–20411, 2023.
  35. Inter-realization channels: Unsupervised anomaly detection beyond one-class classification. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 6285–6295, 2023.
  36. Improving language understanding by generative pre-training. 2018.
  37. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
  38. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022.
  39. Towards total recall in industrial anomaly detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14318–14328, 2022.
  40. Same same but differnet: Semi-supervised defect detection with normalizing flows. In IEEE Winter Conference on Applications of Computer Vision, WACV 2021, Waikoloa, HI, USA, January 3-8, 2021, pages 1906–1915. IEEE, 2021.
  41. Deep one-class classification. In International conference on machine learning, pages 4393–4402. PMLR, 2018.
  42. Multiresolution knowledge distillation for anomaly detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 14902–14912, 2021.
  43. Laion-400m: Open dataset of clip-filtered 400 million image-text pairs. arXiv preprint arXiv:2111.02114, 2021.
  44. Laion-5b: An open large-scale dataset for training next generation image-text models. Advances in Neural Information Processing Systems, 35:25278–25294, 2022.
  45. A hierarchical transformation-discriminating generative model for few shot anomaly detection. In 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, October 10-17, 2021, pages 8475–8484. IEEE, 2021.
  46. Autoprompt: Eliciting knowledge from language models with automatically generated prompts. arXiv preprint arXiv:2010.15980, 2020.
  47. Masato Tamura. Random word data augmentation with clip for zero-shot anomaly detection. arXiv preprint arXiv:2308.11119, 2023.
  48. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  49. Student-teacher feature pyramid matching for anomaly detection. In 32nd British Machine Vision Conference 2021, BMVC 2021, Online, November 22-25, 2021, page 306. BMVA Press, 2021.
  50. Few-shot learning with visual distribution calibration and cross-modal distribution alignment. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 23445–23454, 2023.
  51. Cris: Clip-driven referring image segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11686–11695, 2022.
  52. Perturbed progressive learning for semisupervised defect segmentation. IEEE Transactions on Neural Networks and Learning Systems, 2023.
  53. Patch svdd: Patch-level svdd for anomaly detection and segmentation. In Proceedings of the Asian conference on computer vision, 2020.
  54. Draem-a discriminatively trained reconstruction embedding for surface anomaly detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 8330–8339, 2021.
  55. Unsupervised surface anomaly detection with diffusion probabilistic model. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 6782–6791, 2023a.
  56. Destseg: Segmentation guided denoising student-teacher for anomaly detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3914–3923, 2023b.
  57. Extract free dense labels from clip. In European Conference on Computer Vision, pages 696–712. Springer, 2022a.
  58. Conditional prompt learning for vision-language models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16816–16825, 2022b.
  59. Learning to prompt for vision-language models. International Journal of Computer Vision, 130(9):2337–2348, 2022c.
  60. Zegclip: Towards adapting clip for zero-shot semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11175–11185, 2023.
  61. Spot-the-difference self-supervised pre-training for anomaly detection and segmentation. In European Conference on Computer Vision, pages 392–408. Springer, 2022.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Xiaofan Li (52 papers)
  2. Zhizhong Zhang (42 papers)
  3. Xin Tan (63 papers)
  4. Chengwei Chen (7 papers)
  5. Yanyun Qu (39 papers)
  6. Yuan Xie (188 papers)
  7. Lizhuang Ma (145 papers)
Citations (14)

Summary

An Expert Overview of "PromptAD: Learning Prompts with only Normal Samples for Few-Shot Anomaly Detection"

The paper under discussion introduces a novel methodology termed PromptAD, which pivots around the idea of applying one-class prompt learning to few-shot anomaly detection using only normal samples. This paper addresses key challenges faced in unsupervised industrial anomaly detection, particularly within the framework of one-class classification (OCC). PromptAD leverages the inherent capabilities of vision-LLMs, specifically CLIP, but innovates by introducing a new paradigm to efficiently learn prompts in scenarios where only normal data samples are accessible during training.

Key Contributions and Methodologies

The authors thoughtfully dissect the inefficacies of conventional prompt learning in anomaly detection domains and propose solutions to bridge this gap. The main contributions of the paper are structured around two pivotal components: semantic concatenation (SC) and explicit anomaly margin (EAM).

  1. Semantic Concatenation (SC): The technique of semantic concatenation is proposed to address the absence of anomaly samples in typical one-class anomaly detection tasks. SC enables the transformation of semantics for normal prompts into anomaly prompts through the strategic concatenation of normal prompts with anomaly-specific suffixes. This not only enriches the range of prompts available for training but also constructs a pseudo-negative prompt landscape that aids in effective prompt learning.
  2. Explicit Anomaly Margin (EAM): To overcome the lack of actual negative samples in the training phase, EAM is introduced as a mechanism to enforce an explicit margin between the learned features of normal prompts and transposed anomaly prompts. By controlling this margin through a hyper-parameter, the model enhances its ability to distinguish between normal and anomalous scenarios without direct exposure to anomalous samples during training.

The integration of these methods into the CLIP architecture results in a prompt-guided anomaly detection process that significantly improves upon existing strategies, particularly in few-shot settings.

Empirical Evaluation and Results

The paper reports comprehensive experimental results demonstrating PromptAD’s performance on standard anomaly detection benchmarks, including MVTec and VisA. Notably, PromptAD secures the leading position in 11 out of 12 few-shot tasks across both image-level and pixel-level settings.

  • Image-Level Results: PromptAD outperforms existing methodologies such as WinCLIP+ and RWDA, reporting substantial improvements in few-shot scenarios, underscoring its robustness when very few samples are available.
  • Pixel-Level Results: While traditionally more challenging, pixel-level anomaly detection results also indicate that PromptAD exhibits marked improvements, particularly when employing its proposed innovations, namely SC and EAM.

The paper supplements its findings with rigorous ablation studies, highlighting the significant contributions of SC and EAM to the overall efficacy of PromptAD in anomaly detection tasks.

Implications and Future Directions

The findings underscore the potential for PromptAD to redefine approaches to anomaly detection, especially in industrial scenarios where rapid adaptation to new conditions is paramount, and sample scarcity is prevalent. The capability of PromptAD to learn effective prompts with minimal data represents a noteworthy advancement towards more generalized and automated anomaly detection systems.

Theoretically, this research extends the understanding of prompt learning applicability beyond general classification tasks into more specialized and complex inferencing domains. Practically, it opens up pathways for further exploration into anomaly detection frameworks where minimal labeled data is required for robust model performance.

Future research might explore the feasibility of integrating PromptAD with other multimodal data and more complex anomaly types, potentially expanding its application across varied domains. Additionally, advancing the theoretical understanding of margin-based learning in zero-shot and few-shot contexts might yield further improvements in related tasks.

In conclusion, this paper presents a substantial contribution to the field of anomaly detection, providing both a solid theoretical framework and demonstrating empirical success, offering promising insights for applied computer vision research.