Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Customizing Visual-Language Foundation Models for Multi-modal Anomaly Detection and Reasoning (2403.11083v1)

Published 17 Mar 2024 in cs.CV and cs.CL

Abstract: Anomaly detection is vital in various industrial scenarios, including the identification of unusual patterns in production lines and the detection of manufacturing defects for quality control. Existing techniques tend to be specialized in individual scenarios and lack generalization capacities. In this study, we aim to develop a generic anomaly detection model applicable across multiple scenarios. To achieve this, we customize generic visual-language foundation models that possess extensive knowledge and robust reasoning abilities into anomaly detectors and reasoners. Specifically, we introduce a multi-modal prompting strategy that incorporates domain knowledge from experts as conditions to guide the models. Our approach considers multi-modal prompt types, including task descriptions, class context, normality rules, and reference images. In addition, we unify the input representation of multi-modality into a 2D image format, enabling multi-modal anomaly detection and reasoning. Our preliminary studies demonstrate that combining visual and language prompts as conditions for customizing the models enhances anomaly detection performance. The customized models showcase the ability to detect anomalies across different data modalities such as images and point clouds. Qualitative case studies further highlight the anomaly detection and reasoning capabilities, particularly for multi-object scenes and temporal data. Our code is available at https://github.com/Xiaohao-Xu/Customizable-VLM.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (15)
  1. The MVTec anomaly detection dataset: A comprehensive real-world dataset for unsupervised anomaly detection. International Journal of Computer Vision, 129(4):1038–1059, 2021.
  2. Beyond dents and scratches: Logical constraints in unsupervised anomaly detection and localization. International Journal of Computer Vision, 130(4):947–969, 2022.
  3. Uninformed students: Student-teacher anomaly detection with discriminative latent embeddings. pages 4183–4192, 2020.
  4. The MVTec 3d-AD dataset for unsupervised 3d anomaly detection and localization. In Proceedings of the 17th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, pages 202–213, 2022.
  5. Collaborative discrepancy optimization for reliable image anomaly localization. IEEE Transactions on Industrial Informatics, pages 1–10, 2023.
  6. A survey on visual anomaly detection: Challenge, approach, and prospect. arXiv preprint arXiv:2401.16402, 2024.
  7. Winclip: Zero-/few-shot anomaly classification and segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 19606–19616, 2023.
  8. OpenAI. Gpt-4v(ision) system card. 2023.
  9. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
  10. Towards total recall in industrial anomaly detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14318–14328, 2022.
  11. Timeseries anomaly detection using temporal hierarchical one-class network. Advances in Neural Information Processing Systems, 33:13016–13026, 2020.
  12. Real-world anomaly detection in surveillance videos. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 6479–6488, 2018.
  13. Gemini: a family of highly capable multimodal models. arXiv preprint arXiv:2312.11805, 2023.
  14. The dawn of lmms: Preliminary explorations with gpt-4v (ision). arXiv preprint arXiv:2309.17421, 2023.
  15. Variational lstm enhanced anomaly detection for industrial big data. IEEE Transactions on Industrial Informatics, 17(5):3469–3477, 2020.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Xiaohao Xu (46 papers)
  2. Yunkang Cao (23 papers)
  3. Yongqi Chen (8 papers)
  4. Weiming Shen (53 papers)
  5. Xiaonan Huang (32 papers)
Citations (4)