Computer-Vision Benchmark Segment-Anything Model (SAM) in Medical Images: Accuracy in 12 Datasets

Published 18 Apr 2023 in eess.IV and cs.CV | (2304.09324v3)

Abstract: Background: The segment-anything model (SAM), introduced in April 2023, shows promise as a benchmark model and a universal solution to segment various natural images. It comes without previously-required re-training or fine-tuning specific to each new dataset. Purpose: To test SAM's accuracy in various medical image segmentation tasks and investigate potential factors that may affect its accuracy in medical images. Methods: SAM was tested on 12 public medical image segmentation datasets involving 7,451 subjects. The accuracy was measured by the Dice overlap between the algorithm-segmented and ground-truth masks. SAM was compared with five state-of-the-art algorithms specifically designed for medical image segmentation tasks. Associations of SAM's accuracy with six factors were computed, independently and jointly, including segmentation difficulties as measured by segmentation ability score and by Dice overlap in U-Net, image dimension, size of the target region, image modality, and contrast. Results: The Dice overlaps from SAM were significantly lower than the five medical-image-based algorithms in all 12 medical image segmentation datasets, by a margin of 0.1-0.5 and even 0.6-0.7 Dice. SAM-Semantic was significantly associated with medical image segmentation difficulty and the image modality, and SAM-Point and SAM-Box were significantly associated with image segmentation difficulty, image dimension, target region size, and target-vs-background contrast. All these 3 variations of SAM were more accurate in 2D medical images, larger target region sizes, easier cases with a higher Segmentation Ability score and higher U-Net Dice, and higher foreground-background contrast.

Abstract PDF Upgrade to Chat

Citations (58)

View on Semantic Scholar

Summary

The paper finds that SAM's zero-shot segmentation consistently yields lower Dice scores than specialized medical models.
It utilizes a comprehensive analysis over 12 datasets to assess how factors like dimensionality, contrast, and target size affect segmentation accuracy.
Results indicate SAM performs better on higher contrast 2D images, while struggling with complex 3D and low contrast scenarios, highlighting areas for improvement.

The Segment-Anything Model (SAM) in Medical Image Segmentation

Introduction

The Segment Anything Model (SAM) marks a significant advancement in computer vision, characterized by its ability to generalize across diverse image segmentation tasks without requiring task-specific re-training or fine-tuning. While SAM demonstrates impressive zero-shot performance in natural image segmentation, its accuracy and applicability to medical imaging tasks—characterized by complexity and diversity—remains uncertain. This essay discusses the exploratory study evaluating SAM's efficacy in medical image segmentation across multiple datasets, its comparative performance relative to established medical-image-specific algorithms, and the influencing factors impacting its segmentation accuracy.

Methods

The study employs SAM's zero-shot capability to segment medical images without any retraining, assessing its performance across 12 public datasets comprising 7,451 subjects and spanning a variety of organs, pathologies, dimensions, and imaging modalities. The accuracy of SAM is measured using the Dice coefficient, comparing its performance with five state-of-the-art algorithms tailored for medical image segmentation tasks like U-Net, U-Net++, Attention U-Net, Trans U-Net, and UCTransNet. Diverse other factors including segmentation difficulty, image dimension, modality, target region size, and contrast were analyzed for their potential influence on SAM's accuracy.

Evaluation of SAM's Accuracy

The results indicate SAM's performance lagging behind specialized medical-image segmentation algorithms. In all datasets, SAM exhibited lower Dice scores by margins of 0.1 to 0.7, underscoring the challenges posed by medical images differing from natural images in terms of complexity, dimensionality, and target characteristics.

Figure 1: Accuracy of SAM in 12 medical image segmentation datasets. (a): Dice overlaps of 5 medical-image-specific algorithms and 3 variations of SAM. (b): scatter plots of SAM's Dice (y axis) with U-Net's Dice (x axis) across datasets, for SAM-Semantic (left), SAM-Point (middle), and SAM-Box (right).

Furthermore, SAM's lower accuracy is most pronounced in complex images—those with low contrast, smaller target regions, and 3D dimensions, corroborated by its comparatively better performance on 2D images with higher U-Net Dice values.

Factors Influencing SAM's Performance

The degree to which different factors impact SAM's segmentation accuracy was analyzed using both single-factor and multi-factor analyses. It was observed that SAM's accuracy was significantly influenced by image dimensionality, segmentation ability scores, and target region characteristics.

Figure 2: Single-factor analysis associating 6 potential factors with SAM's accuracies in 12 medical image datasets. In the top four rows, every dot is a subject, and the color of the dot denotes the dataset this subject comes from. p-values are provided and highlighted with a red background if a factor has a significant association with the SAM Dice overlap. In the $6^{th}$ row, Derm--Dermology; Colo--Colonoscopy; US--Ultrasound.

SAM variants demonstrated different sensitivities to these factors: SAM-Semantic showed an association with medical image segmentation difficulty, whereas SAM-Point and SAM-Box were more impacted by target region size, dimensionality, and contrast.

Discussion

The analysis confirms that while SAM represents a step towards unified segmentation across different contexts, its performance in medical imaging is sub-optimal compared to specialized models. The distinctive characteristics of medical images, such as higher dimensionality, lower contrast, and smaller target regions, pose unique challenges that SAM currently does not surmount effectively.

SAM's underperformance highlights the necessity for strategic adaptations or alternative frameworks when applying zero-shot segmentation models to medical images. Promising avenues for future research include the development of medical-image-specific benchmark models or adapting existing models by incorporating medical image datasets into their training regime.

Conclusion

In sum, this study provides a critical evaluation of SAM's capabilities in medical image segmentation and sets the groundwork for future explorations. Enhancements to SAM or the development of specialized models tailored to medical imaging needs are imperative for achieving comparable accuracy to current state-of-the-art algorithms. Such efforts must consider the unique characteristics inherent in medical images—including 3D imaging, small and low-contrast target regions—that challenge current methodologies.

Markdown