Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Adapting Visual-Language Models for Generalizable Anomaly Detection in Medical Images (2403.12570v1)

Published 19 Mar 2024 in cs.CV

Abstract: Recent advancements in large-scale visual-language pre-trained models have led to significant progress in zero-/few-shot anomaly detection within natural image domains. However, the substantial domain divergence between natural and medical images limits the effectiveness of these methodologies in medical anomaly detection. This paper introduces a novel lightweight multi-level adaptation and comparison framework to repurpose the CLIP model for medical anomaly detection. Our approach integrates multiple residual adapters into the pre-trained visual encoder, enabling a stepwise enhancement of visual features across different levels. This multi-level adaptation is guided by multi-level, pixel-wise visual-language feature alignment loss functions, which recalibrate the model's focus from object semantics in natural imagery to anomaly identification in medical images. The adapted features exhibit improved generalization across various medical data types, even in zero-shot scenarios where the model encounters unseen medical modalities and anatomical regions during training. Our experiments on medical anomaly detection benchmarks demonstrate that our method significantly surpasses current state-of-the-art models, with an average AUC improvement of 6.24% and 7.33% for anomaly classification, 2.03% and 2.37% for anomaly segmentation, under the zero-shot and few-shot settings, respectively. Source code is available at: https://github.com/MediaBrain-SJTU/MVFA-AD

Definition Search Book Streamline Icon: https://streamlinehq.com
References (68)
  1. The rsna-asnr-miccai brats 2021 benchmark on brain tumor segmentation and radiogenomic classification. arXiv preprint arXiv:2107.02314, 2021.
  2. Advancing the cancer genome atlas glioma mri collections with expert segmentation labels and radiomic features. Scientific Data, 4(1):1–13, 2017.
  3. Bmad: Benchmarks for medical anomaly detection. arXiv preprint arXiv:2306.11876, 2023.
  4. Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer. Jama, 318(22):2199–2210, 2017.
  5. Mvtec ad–a comprehensive real-world dataset for unsupervised anomaly detection. In CVPR, 2019.
  6. Uninformed students: Student-teacher anomaly detection with discriminative latent embeddings. In CVPR, 2020.
  7. The liver tumor segmentation benchmark (lits). Medical Image Analysis, 84:102680, 2023.
  8. Dual-distribution discrepancy with self-supervised refinement for anomaly detection in medical images. Medical Image Analysis, 86:102794, 2023.
  9. A zero-/few-shot anomaly classification and segmentation method for cvpr 2023 vand workshop challenge tracks 1&2: 1st place on zero-shot ad and 4th place on few-shot ad. arXiv preprint arXiv:2305.17382, 2023.
  10. Anomaly detection via reverse distillation from one-class embedding. In CVPR, 2022.
  11. Imagenet: A large-scale hierarchical image database. In CVPR, 2009.
  12. Catching both gray and black swans: Open-set supervised anomaly detection. In CVPR, 2022a.
  13. Unsupervised anomaly segmentation for brain lesions using dual semantic-manifold reconstruction. In International Conference on Neural Information Processing, 2022b.
  14. Deep learning for medical anomaly detection–a survey. ACM Computing Surveys (CSUR), 54(7):1–37, 2021.
  15. Clip-adapter: Better vision-language models with feature adapters. IJCV, pages 1–15, 2023.
  16. Multimodal neurons in artificial neural networks. Distill, 6(3):e30, 2021.
  17. Memorizing normality to detect anomaly: Memory-augmented deep autoencoder for unsupervised anomaly detection. In ICCV, 2019.
  18. Open-vocabulary object detection via vision and language knowledge distillation. In ICLR, 2021.
  19. Cflow-ad: Real-time unsupervised anomaly detection with localization via conditional normalizing flows. In WACV, 2022.
  20. Frequency-enhanced data augmentation for vision-and-language navigation. In NeurIPS, 2023.
  21. Automated segmentation of macular edema in oct using deep neural networks. Medical Image Analysis, 55:216–227, 2019.
  22. Registration based few-shot anomaly detection. In ECCV, 2022a.
  23. Multi-scale memory comparison for zero-/few-shot anomaly detection. arXiv preprint arXiv:2308.04789, 2023.
  24. Lesionpaste: One-shot anomaly detection for medical images. arXiv preprint arXiv:2203.06354, 2022b.
  25. Openclip, 2021.
  26. Winclip: Zero-/few-shot anomaly classification and segmentation. In CVPR, 2023.
  27. Multi-scale cross-restoration framework for electrocardiogram anomaly detection. In MICCAI, 2023.
  28. Identifying medical diagnoses and treatable diseases by image-based deep learning. Cell, 172(5):1122–1131, 2018.
  29. Miccai multi-atlas labeling beyond the cranial vault–workshop and challenge. In Proc. MICCAI Multi-Atlas Labeling Beyond Cranial Vault—Workshop Challenge, page 12, 2015.
  30. Unsupervised anomaly segmentation using image-semantic cycle translation. arXiv preprint arXiv:2103.09094, 2021a.
  31. Cutpaste: Self-supervised learning for anomaly detection and localization. In CVPR, 2021b.
  32. Graphadapter: Tuning vision-language models with dual knowledge graph. In NeurIPS, 2023.
  33. Focal loss for dense object detection. In ICCV, 2017.
  34. Deepcache: Accelerating diffusion models for free. In CVPR, 2024.
  35. The multimodal brain tumor image segmentation benchmark (brats). IEEE Transactions on Medical Imaging, 34(10):1993–2024, 2014.
  36. V-net: Fully convolutional neural networks for volumetric medical image segmentation. In International Conference on 3D Vision (3DV), pages 565–571. IEEE, 2016.
  37. Deep anomaly detection with deviation networks. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 353–362, 2019.
  38. Learning transferable visual models from natural language supervision. In ICML, 2021.
  39. Denseclip: Language-guided dense prediction with context-aware prompting. In CVPR, 2022.
  40. High-resolution image synthesis with latent diffusion models. In CVPR, 2022.
  41. Towards total recall in industrial anomaly detection. In CVPR, 2022.
  42. Deep semi-supervised anomaly detection. In ICLR, 2020.
  43. Adversarially learned one-class classifier for novelty detection. In CVPR, 2018.
  44. Multiresolution knowledge distillation for anomaly detection. In CVPR, 2021.
  45. Laion-5b: An open large-scale dataset for training next generation image-text models. NeurIPS, 2022.
  46. A hierarchical transformation-discriminating generative model for few shot anomaly detection. In ICCV, 2021.
  47. Few-shot domain-adaptive anomaly detection for cross-site brain images. TPAMI, 2021.
  48. Measuring robustness to natural distribution shifts in image classification. NeurIPS, 2020.
  49. Laurens Van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. JMLR, 9(11), 2008.
  50. Glancing at the patch: Anomaly localization with global and local feature comparison. In CVPR, 2021.
  51. Chestx-ray8: Hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases. In CVPR, 2017.
  52. Medclip: Contrastive learning from unpaired medical images and text. In EMNLP, 2022.
  53. Diffusion models for medical anomaly detection. In MICCAI, 2022.
  54. Learning unsupervised metaformer for anomaly detection. In ICCV, 2021.
  55. Afsc: Adaptive fourier space compression for anomaly detection. arXiv preprint arXiv:2204.07963, 2022.
  56. Diffusion model as representation learner. In ICCV, 2023.
  57. Explicit boundary guided semi-push-pull contrastive learning for supervised anomaly detection. In CVPR, 2023.
  58. Attribute restoration framework for anomaly detection. IEEE Transactions on Multimedia, 24:116–127, 2022.
  59. Mutual-modality adversarial attack with semantic perturbation. In AAAI, 2024.
  60. Task residual for tuning vision-language models. In CVPR, 2023.
  61. Viral pneumonia screening on chest x-ray images using confidence-aware anomaly detection. IEEE Transactions on Medical Imaging, 40(3):879–890, 2021.
  62. Grace: A generalized and personalized federated learning method for medical imaging. In MICCAI, 2023.
  63. Regionclip: Region-based language-image pretraining. In CVPR, pages 16793–16803, 2022.
  64. Encoding structure-texture relation with p-net for anomaly detection in retinal images. In ECCV, 2020.
  65. Proxy-bridged image reconstruction network for anomaly detection in medical images. IEEE Transactions on Medical Imaging, 41(3):582–594, 2021a.
  66. Memorizing structure-texture correspondence for image anomaly detection. TNNLS, 33(6):2335–2349, 2021b.
  67. Learning to prompt for vision-language models. IJCV, 130(9):2337–2348, 2022.
  68. Visual anomaly and novelty detection (vand) challenge in cvpr 2023 workshop, 2023. https://sites.google.com/view/vand-cvpr23/challenge.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Chaoqin Huang (15 papers)
  2. Aofan Jiang (7 papers)
  3. Jinghao Feng (2 papers)
  4. Ya Zhang (222 papers)
  5. Xinchao Wang (203 papers)
  6. Yanfeng Wang (211 papers)
Citations (8)

Summary

Adaptation of Visual-LLMs for Generalizable Anomaly Detection in Medical Imagery

The paper presents a nuanced approach to enhancing the applicability of visual-LLMs (VLMs), specifically through the adaptation of the Contrastive Language–Image Pre-training (CLIP) model, to the domain of medical anomaly detection. The primary focus is on overcoming the domain divergence between natural and medical images, which inherently limits the utility of traditional VLMs in medical contexts.

The crux of the methodology lies in a lightweight multi-level adaptation framework that integrates into the pre-trained visual encoder of CLIP using a series of auxiliary residual adapters. These adapters facilitate the progressive refinement of visual features across multiple levels, capitalizing on pixel-wise visual-language feature alignment loss functions to redirect the model’s focus from object semantics to the nuances of anomaly detection in medical imagery.

Key Findings and Numerical Results

The proposed method demonstrates considerable improvements over state-of-the-art models, as evidenced by empirical results on medical anomaly detection benchmarks. Notably, the method yields an impressive average improvement in area under the curve (AUC) statistics: 6.24% in anomaly classification and 7.33% in anomaly segmentation under zero-shot conditions, rising to improvements of 2.03% and 2.37% in few-shot scenarios. These numerical results underscore the model’s capability to generalize across unseen medical modalities and anatomical regions, even when the model is pre-trained on natural images.

Practical and Theoretical Implications

Practically, the framework’s adaptability to varied medical data types without the necessity for exhaustive retraining makes it a promising tool for enhancing diagnostic accuracy and efficiency in medical contexts. Theoretically, the paper sets a precedent for the transformative potential of VLMs if appropriately aligned and adapted through residual learning strategies. The shift from semantic identification to anomaly detection reflects a broader trend in machine learning, where domain-specific challenges are addressed through innovative architectural modifications and alignment strategies.

Speculation on Future Developments in AI

The intersection of visual-language processing and medical imaging presents fertile ground for future AI developments. One could anticipate further enhancements in model architectures, employing more sophisticated adapters and loss functions to refine the fine-tuning process for specific medical anomalies further. Additionally, future research may explore the integration of multimodal datasets beyond text and imagery, encompassing broader diagnostic data types, thereby crafting more holistic and robust diagnostic AI models.

In conclusion, this paper offers a meticulous exploration of adapting VLMs for medical anomaly detection, delivering strong empirical evidence of the model's enhanced performance across varied medical datasets. The proposed approach paves the way for more effective and efficient diagnostic tools in healthcare, contributing significantly to both theoretical advancement and practical deployment in AI-powered medical imaging solutions.