Unleashing the Potential of SAM for Medical Adaptation via Hierarchical Decoding (2403.18271v1)
Abstract: The Segment Anything Model (SAM) has garnered significant attention for its versatile segmentation abilities and intuitive prompt-based interface. However, its application in medical imaging presents challenges, requiring either substantial training costs and extensive medical datasets for full model fine-tuning or high-quality prompts for optimal performance. This paper introduces H-SAM: a prompt-free adaptation of SAM tailored for efficient fine-tuning of medical images via a two-stage hierarchical decoding procedure. In the initial stage, H-SAM employs SAM's original decoder to generate a prior probabilistic mask, guiding a more intricate decoding process in the second stage. Specifically, we propose two key designs: 1) A class-balanced, mask-guided self-attention mechanism addressing the unbalanced label distribution, enhancing image embedding; 2) A learnable mask cross-attention mechanism spatially modulating the interplay among different image regions based on the prior mask. Moreover, the inclusion of a hierarchical pixel decoder in H-SAM enhances its proficiency in capturing fine-grained and localized details. This approach enables SAM to effectively integrate learned medical priors, facilitating enhanced adaptation for medical image segmentation with limited samples. Our H-SAM demonstrates a 4.78% improvement in average Dice compared to existing prompt-free SAM variants for multi-organ segmentation using only 10% of 2D slices. Notably, without using any unlabeled data, H-SAM even outperforms state-of-the-art semi-supervised models relying on extensive unlabeled training data across various medical datasets. Our code is available at https://github.com/Cccccczh404/H-SAM.
- Transdeeplab: Convolution-free transformer-based deeplab v3+ for medical image segmentation. In International Workshop on PRedictive Intelligence In MEdicine, pages 91–102. Springer, 2022.
- Dae-former: Dual attention-guided efficient transformer for medical image segmentation. In International Workshop on PRedictive Intelligence In MEdicine, pages 83–95. Springer, 2023.
- Bidirectional copy-paste for semi-supervised medical image segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11514–11524, 2023.
- Swin-unet: Unet-like pure transformer for medical image segmentation. In European conference on computer vision, pages 205–218. Springer, 2022.
- Ladder fine-tuning approach for sam integrating complementary network. arXiv preprint arXiv:2306.12737, 2023.
- Multi-task learning for left atrial segmentation on ge-mri. In Statistical Atlases and Computational Models of the Heart. Atrial Segmentation and LV Quantification Challenges: 9th International Workshop, STACOM 2018, Held in Conjunction with MICCAI 2018, Granada, Spain, September 16, 2018, Revised Selected Papers 9, pages 292–301. Springer, 2019.
- Magicnet: Semi-supervised multi-organ segmentation via magic-cube partition and recovery. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 23869–23878, 2023a.
- Transunet: Transformers make strong encoders for medical image segmentation. arXiv preprint arXiv:2102.04306, 2021.
- 3d transunet: Advancing medical image segmentation through vision transformers. arXiv preprint arXiv:2310.07781, 2023b.
- Sam fails to segment anything? – sam-adapter: Adapting sam in underperformed scenes: Camouflage, shadow, and more, 2023c.
- Vision transformer adapter for dense predictions. arXiv preprint arXiv:2205.08534, 2022.
- Masked-attention mask transformer for universal image segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1290–1299, 2022a.
- Sam on medical images: A comprehensive study on three prompt modes. arXiv preprint arXiv:2305.00035, 2023.
- Resganet: Residual group attention network for medical image classification and segmentation. Medical Image Analysis, 76:102313, 2022b.
- All-in-sam: from weak annotation to pixel-wise nuclei segmentation with prompt-based finetuning. arXiv preprint arXiv:2307.00290, 2023.
- Clinically applicable deep learning for diagnosis and referral in retinal disease. Nature medicine, 24(9):1342–1350, 2018.
- Sam-u: Multi-box prompts triggered uncertainty estimation for reliable sam in medical image. arXiv preprint arXiv:2307.04973, 2023a.
- Segment anything model (sam) for digital pathology: Assess zero-shot segmentation on whole slide imaging. arXiv preprint arXiv:2304.04155, 2023b.
- Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
- Resunet-a: A deep learning framework for semantic segmentation of remotely sensed data. ISPRS Journal of Photogrammetry and Remote Sensing, 162:94–114, 2020.
- An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
- Cheap lunch for medical image segmentation by fine-tuning sam on few exemplars. arXiv preprint arXiv:2308.14133, 2023.
- A review of deep learning based methods for medical image multi-organ segmentation. Physica Medica, 85:107–122, 2021.
- Desam: Decoupling segment anything model for generalizable medical image segmentation. arXiv preprint arXiv:2306.00499, 2023.
- 3dsam-adapter: Holistic adaptation of sam from 2d to 3d for promptable medical image segmentation. arXiv preprint arXiv:2306.13465, 2023.
- Accuracy of segment-anything model (sam) in medical image segmentation tasks. arXiv preprint arXiv:2304.09324, 2023.
- Parameter-efficient transfer learning for nlp. In International Conference on Machine Learning, pages 2790–2799. PMLR, 2019.
- When sam meets medical images: An investigation of segment anything model (sam) on multi-phase liver tumor segmentation. arXiv preprint arXiv:2304.08506, 2023.
- Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685, 2021.
- Skinsam: Empowering skin cancer segmentation with segment anything model. arXiv preprint arXiv:2304.13973, 2023a.
- How to efficiently adapt large segmentation model (sam) to medical images. arXiv preprint arXiv:2306.13731, 2023b.
- Missformer: An effective medical image segmentation transformer. arXiv preprint arXiv:2109.07162, 2021.
- Segment anything model for medical images? arXiv preprint arXiv:2304.14660, 2023.
- nnu-net: Self-adapting framework for u-net-based medical image segmentation. arXiv preprint arXiv:1809.10486, 2018.
- nnu-net: a self-configuring method for deep learning-based biomedical image segmentation. Nature methods, 18(2):203–211, 2021.
- Sam struggles in concealed scenes–empirical study on" segment anything". arXiv preprint arXiv:2304.06022, 2023.
- Amos: A large-scale abdominal multi-organ benchmark for versatile medical image segmentation. Advances in Neural Information Processing Systems, 35:36722–36732, 2022.
- Segment anything. arXiv preprint arXiv:2304.02643, 2023.
- Miccai multi-atlas labeling beyond the cranial vault–workshop and challenge. In Proc. MICCAI Multi-Atlas Labeling Beyond Cranial Vault—Workshop Challenge, page 12, 2015.
- Medlsam: Localize and segment anything model for 3d medical images. arXiv preprint arXiv:2306.14752, 2023.
- Auto-prompting sam for mobile friendly 3d medical image segmentation. arXiv preprint arXiv:2308.14936, 2023a.
- Long-tailed visual recognition via gaussian clouded logit adjustment. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6929–6938, 2022.
- Shape-aware semi-supervised 3d semantic segmentation for medical images. In Medical Image Computing and Computer Assisted Intervention–MICCAI 2020: 23rd International Conference, Lima, Peru, October 4–8, 2020, Proceedings, Part I 23, pages 552–561. Springer, 2020.
- Polyp-sam: Transfer sam for polyp segmentation. arXiv preprint arXiv:2305.00293, 2023b.
- Few shot medical image segmentation with cross attention transformer. arXiv preprint arXiv:2303.13867, 2023.
- Evaluation of prostate segmentation algorithms for mri: the promise12 challenge. Medical image analysis, 18(2):359–373, 2014.
- Samm (segment any medical model): A 3d slicer integration to sam. arXiv preprint arXiv:2304.05622, 2023.
- Semi-supervised medical image segmentation through dual-task consistency. In Proceedings of the AAAI conference on artificial intelligence, pages 8801–8809, 2021a.
- Efficient semi-supervised gross target volume of nasopharyngeal carcinoma segmentation via uncertainty rectified pyramid consistency. In Medical Image Computing and Computer Assisted Intervention–MICCAI 2021: 24th International Conference, Strasbourg, France, September 27–October 1, 2021, Proceedings, Part II 24, pages 318–329. Springer, 2021b.
- Segment anything in medical images. arXiv preprint arXiv:2304.12306, 2023.
- Segment anything model for medical image analysis: an experimental study. Medical Image Analysis, 89:102918, 2023.
- V-net: Fully convolutional neural networks for volumetric medical image segmentation. In 2016 fourth international conference on 3D vision (3DV), pages 565–571. Ieee, 2016.
- Sam vs bet: A comparative study for brain extraction and segmentation of magnetic resonance images using deep learning. arXiv preprint arXiv:2304.04738, 2:4, 2023.
- Video-based ai for beat-to-beat assessment of cardiac function. Nature, 580(7802):252–256, 2020.
- Self-paced contrastive learning for semi-supervised medical image segmentation with meta-labels. Advances in Neural Information Processing Systems, 34:16686–16699, 2021.
- Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
- G-cascade: Efficient cascaded graph convolutional decoding for 2d medical image segmentation. arXiv preprint arXiv:2310.16175, 2023a.
- Multi-scale hierarchical vision transformer with cascaded attention decoding for medical image segmentation. arXiv preprint arXiv:2303.16892, 2023b.
- U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18, pages 234–241. Springer, 2015.
- Recurrent mask refinement for few-shot medical image segmentation. In Proceedings of the IEEE/CVF international conference on computer vision, pages 3918–3928, 2021.
- Sam. md: Zero-shot medical image segmentation capabilities of the segment anything model. In Medical Imaging with Deep Learning, short paper track, 2023.
- Sam meets robotic surgery: An empirical study in robustness perspective. arXiv preprint arXiv:2304.14674, 2023.
- Abdominal multi-organ segmentation with organ-attention networks and statistical fusion. Medical image analysis, 55:88–102, 2019.
- Consistency-guided meta-learning for bootstrapping semi-supervised medical image segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 183–193. Springer, 2023.
- Medical sam adapter: Adapting segment anything model for medical image segmentation. arXiv preprint arXiv:2304.12620, 2023.
- Semi-supervised left atrium segmentation with mutual consistency training. In Medical Image Computing and Computer Assisted Intervention–MICCAI 2021: 24th International Conference, Strasbourg, France, September 27–October 1, 2021, Proceedings, Part II 24, pages 297–306. Springer, 2021.
- Exploring smoothness and class-separation for semi-supervised medical image segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 34–43. Springer, 2022.
- Uncertainty-aware self-ensembling model for semi-supervised 3d left atrium segmentation. In Medical Image Computing and Computer Assisted Intervention–MICCAI 2019: 22nd International Conference, Shenzhen, China, October 13–17, 2019, Proceedings, Part II 22, pages 605–613. Springer, 2019.
- Review of deep learning approaches for the segmentation of multiple sclerosis lesions on brain mri. Frontiers in Neuroinformatics, 14:610967, 2020.
- Sam-path: A segment anything model for semantic segmentation in digital pathology. arXiv preprint arXiv:2307.09570, 2023a.
- Customized segment anything model for medical image segmentation. arXiv preprint arXiv:2304.13785, 2023.
- Segment anything model (sam) for radiation oncology. arXiv preprint arXiv:2306.11730, 2023b.
- Self-sampling meta sam: Enhancing few-shot medical image segmentation with meta-learning. arXiv preprint arXiv:2308.16466, 2023c.
- Input augmentation with sam: Boosting medical image segmentation with segmentation foundation model. arXiv preprint arXiv:2304.11332, 2023d.
- Prior-aware neural network for partially-supervised multi-organ segmentation. In Proceedings of the IEEE/CVF international conference on computer vision, pages 10672–10681, 2019a.
- Semi-supervised 3d abdominal multi-organ segmentation via deep multi-planar co-training. In 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), pages 121–140. IEEE, 2019b.
- Unet++: A nested u-net architecture for medical image segmentation. In Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support: 4th International Workshop, DLMIA 2018, and 8th International Workshop, ML-CDS 2018, Held in Conjunction with MICCAI 2018, Granada, Spain, September 20, 2018, Proceedings 4, pages 3–11. Springer, 2018.
- Zhiheng Cheng (3 papers)
- Qingyue Wei (8 papers)
- Hongru Zhu (8 papers)
- Yan Wang (733 papers)
- Liangqiong Qu (31 papers)
- Wei Shao (95 papers)
- Yuyin Zhou (92 papers)