Test-Time Adaptation with SaLIP: A Cascade of SAM and CLIP for Zero shot Medical Image Segmentation (2404.06362v2)
Abstract: The Segment Anything Model (SAM) and CLIP are remarkable vision foundation models (VFMs). SAM, a prompt driven segmentation model, excels in segmentation tasks across diverse domains, while CLIP is renowned for its zero shot recognition capabilities. However, their unified potential has not yet been explored in medical image segmentation. To adapt SAM to medical imaging, existing methods primarily rely on tuning strategies that require extensive data or prior prompts tailored to the specific task, making it particularly challenging when only a limited number of data samples are available. This work presents an in depth exploration of integrating SAM and CLIP into a unified framework for medical image segmentation. Specifically, we propose a simple unified framework, SaLIP, for organ segmentation. Initially, SAM is used for part based segmentation within the image, followed by CLIP to retrieve the mask corresponding to the region of interest (ROI) from the pool of SAM generated masks. Finally, SAM is prompted by the retrieved ROI to segment a specific organ. Thus, SaLIP is training and fine tuning free and does not rely on domain expertise or labeled data for prompt engineering. Our method shows substantial enhancements in zero shot segmentation, showcasing notable improvements in DICE scores across diverse segmentation tasks like brain (63.46%), lung (50.11%), and fetal head (30.82%), when compared to un prompted SAM. Code and text prompts are available at: https://github.com/aleemsidra/SaLIP.
- Gpt-4 technical report. arXiv preprint arXiv:2303.08774, 2023.
- Risab Biswas. Polyp-sam++: Can a text guided sam perform better for polyp segmentation? arXiv preprint arXiv:2308.06623, 2023.
- Segment anything model (sam) for digital pathology: Assess zero-shot segmentation on whole slide imaging. arXiv preprint arXiv:2304.04155, 2023.
- An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
- Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16000–16009, 2022.
- When sam meets medical images: An investigation of segment anything model (sam) on multi-phase liver tumor segmentation. arXiv preprint arXiv:2304.08506, 2023.
- Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685, 2021.
- Segment anything model for medical images? Medical Image Analysis, 92:103061, 2024.
- Ultralytics yolo, 2023.
- Segment anything. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 4015–4026, 2023.
- Clip-driven universal model for organ segmentation and tumor detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 21152–21164, 2023a.
- Grounding dino: Marrying dino with grounded pre-training for open-set object detection. arXiv preprint arXiv:2303.05499, 2023b.
- Image segmentation using text and image prompts. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 7086–7096, 2022.
- Segment anything in medical images. Nature Communications, 15(1):654, 2024.
- Enhancing clip with gpt-4: Harnessing visual descriptions as prompts. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 262–271, 2023.
- Exploring the zero-shot capabilities of the segment anything model (sam) in 2d medical imaging: A comprehensive evaluation and practical guideline. arXiv e-prints, pages arXiv–2305, 2023.
- Segment anything model for medical image analysis: an experimental study. Medical Image Analysis, 89:102918, 2023.
- Brain extraction comparing segment anything model (sam) and fsl brain extraction tool. arXiv preprint arXiv:2304.04738, 2023.
- Nikhil Pandey. Chest x-ray masks and labels. https://www.kaggle.com/datasets/nikhilpandey360/chest-xray-masks-and-labels/data, 2019.
- Comprehensive multimodal segmentation in medical imaging: Combining yolov8 with sam and hq-sam models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 2592–2598, 2023.
- Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32, 2019.
- Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
- U-Net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015, pages 234–241. Springer International Publishing, 2015.
- What does clip know about a red circle? visual prompt engineering for vlms. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 11987–11997, 2023.
- An open, multi-vendor, multi-field-strength brain mr dataset and analysis of publicly available skull stripping methods agreement. NeuroImage, 170:482–494, 2018.
- Reclip: A strong zero-shot baseline for referring expression comprehension. arXiv preprint arXiv:2204.05991, 2022.
- Fourier features let networks learn high frequency functions in low dimensional domains. Advances in neural information processing systems, 33:7537–7547, 2020.
- Automated measurement of fetal head circumference using 2D ultrasound images. PLoS One, 13(8):e0200412, 2018.
- Medclip: Contrastive learning from unpaired medical images and text. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language (EMNLP), pages 3876–3887, 2022.
- Medical sam adapter: Adapting segment anything model for medical image segmentation. arXiv preprint arXiv:2304.12620, 2023.
- Fine-grained visual prompting. Advances in Neural Information Processing Systems, 36, 2024.
- Cpt: Colorful prompt tuning for pre-trained vision-language models. AI Open, 5:30–38, 2024.
- CXR-CLIP: Toward large scale chest x-ray Language-Image pre-training. In Medical Image Computing and Computer Assisted Intervention – MICCAI 2023, pages 101–111. Springer Nature Switzerland, 2023.
- Customized segment anything model for medical image segmentation. arXiv preprint arXiv:2304.13785, 2023.
- TPRO: Text-Prompting-Based weakly supervised histopathology tissue segmentation. In Medical Image Computing and Computer Assisted Intervention – MICCAI 2023, pages 109–118. Springer Nature Switzerland, 2023a.
- Continual learning for abdominal multi-organ and tumor segmentation. In Medical Image Computing and Computer Assisted Intervention – MICCAI 2023, pages 35–45. Springer Nature Switzerland, 2023b.
- Can sam segment polyps? arXiv preprint arXiv:2304.07583, 2023.