Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
129 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Survey on Segment Anything Model (SAM): Vision Foundation Model Meets Prompt Engineering (2306.06211v4)

Published 12 May 2023 in cs.CV

Abstract: The Segment Anything Model (SAM), developed by Meta AI Research, represents a significant breakthrough in computer vision, offering a robust framework for image and video segmentation. This survey provides a comprehensive exploration of the SAM family, including SAM and SAM 2, highlighting their advancements in granularity and contextual understanding. Our study demonstrates SAM's versatility across a wide range of applications while identifying areas where improvements are needed, particularly in scenarios requiring high granularity and in the absence of explicit prompts. By mapping the evolution and capabilities of SAM models, we offer insights into their strengths and limitations and suggest future research directions, including domain-specific adaptations and enhanced memory and propagation mechanisms. We believe that this survey comprehensively covers the breadth of SAM's applications and challenges, setting the stage for ongoing advancements in segmentation technology.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (137)
  1. One small step for generative ai, one giant leap for agi: A complete survey on chatgpt in aigc era. arXiv preprint arXiv:2304.06488, 2023a.
  2. A complete survey on generative ai (aigc): Is chatgpt from gpt-4 to gpt-5 all you need? arXiv preprint arXiv:2303.11717, 2023b.
  3. On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258, 2021.
  4. Learning transferable visual models from natural language supervision. In ICML, 2021.
  5. Scaling up visual and vision-language representation learning with noisy text supervision. In ICML, 2021.
  6. Florence: A new foundation model for computer vision. arXiv preprint arXiv:2111.11432, 2021.
  7. Momentum contrast for unsupervised visual representation learning. In CVPR, 2020.
  8. Mp-fedcl: Multi-prototype federated contrastive learning for edge intelligence. arXiv preprint arXiv:2304.01950, 2023a.
  9. How does simsiam avoid collapse without negative samples? a unified understanding with self-supervised contrastive learning. In ICLR, 2022a.
  10. A survey on masked autoencoder for self-supervised learning in vision and beyond. arXiv preprint arXiv:2208.00173, 2022b.
  11. Segment anything. arXiv preprint arXiv:2304.02643, 2023.
  12. Language models are few-shot learners. Advances in neural information processing systems, 2020.
  13. Accuracy of segment-anything model (sam) in medical image segmentation tasks. arXiv preprint arXiv:2304.09324, 2023a.
  14. When sam meets medical images: An investigation of segment anything model (sam) on multi-phase liver tumor segmentation. arXiv preprint arXiv:2304.08506, 2023.
  15. Can sam segment polyps? arXiv preprint arXiv:2304.07583, 2023a.
  16. Segment anything model for medical images?, 2023a.
  17. Breastsam: A study of segment anything model for breast tumor detection in ultrasound images. arXiv preprint arXiv:2305.12447, 2023a.
  18. Brain extraction comparing segment anything model (sam) and fsl brain extraction tool. arXiv preprint arXiv:2304.04738, 2023.
  19. Sam on medical images: A comprehensive study on three prompt modes, 2023a.
  20. Sam meets robotic surgery: An empirical study in robustness perspective, 2023a.
  21. Sam.md: Zero-shot medical image segmentation capabilities of the segment anything model. In Medical Imaging with Deep Learning, short paper track, 2023. URL https://openreview.net/forum?id=iilLHaINUW.
  22. Zero-shot performance of the segment anything model (sam) in 2d medical imaging: A comprehensive evaluation and practical guidelines, 2023.
  23. Segment anything model for medical image analysis: an experimental study, 2023.
  24. Polyp-sam: Transfer sam for polyp segmentation, 2023a.
  25. Skinsam: Empowering skin cancer segmentation with segment anything model, 2023b.
  26. Jun Ma and Bo Wang. Segment anything in medical images. arXiv preprint arXiv:2304.12306, 2023.
  27. Learnable ophthalmology sam, 2023.
  28. Customized segment anything model for medical image segmentation, 2023.
  29. Medical sam adapter: Adapting segment anything model for medical image segmentation, 2023.
  30. Desam: Decoupling segment anything model for generalizable medical image segmentation. arXiv preprint arXiv:2306.00499, 2023.
  31. Sam struggles in concealed scenes–empirical study on" segment anything". arXiv preprint arXiv:2304.06022, 2023a.
  32. Segment anything is not always perfect: An investigation of sam on different real-world applications. arXiv preprint arXiv:2304.05750, 2023b.
  33. Segment anything, from space?, 2023.
  34. When sam meets shadow detection. arXiv preprint arXiv:2305.11513, 2023.
  35. Sam for poultry science. arXiv preprint arXiv:2305.10254, 2023a.
  36. Can sam segment anything? when sam meets camouflaged object detection. arXiv preprint arXiv:2304.04709, 2023.
  37. Segment anything model (sam) meets glass: Mirror and transparent objects cannot be easily detected. arXiv preprint, 2023.
  38. Can sam count anything? an empirical study on sam counting, 2023.
  39. Prompt what you need: Enhancing segmentation in rainy scenes with anchor-based prompting. arXiv preprint arXiv:2305.03902, 2023.
  40. Weakly-supervised concealed object segmentation with sam-based pseudo labeling and multi-scale feature grouping. arXiv preprint arXiv:2305.11003, 2023b.
  41. Segment anything in high quality. arXiv preprint arXiv:2306.01567, 2023.
  42. Deep learning universal crater detection using segment anything model (sam), 2023.
  43. Leaf only sam: A segment anything pipeline for zero-shot automated leaf segmentation. arXiv preprint arXiv:2305.09418, 2023.
  44. Sam fails to segment anything?–sam-adapter: Adapting sam in underperformed scenes: Camouflage, shadow, and more. arXiv preprint arXiv:2304.09148, 2023a.
  45. Knowledge distillation with segment anything (sam) model for planetary geological mapping. arXiv preprint arXiv:2305.07586, 2023.
  46. Segment any anomaly without training via hybrid prompt regularization, 2023.
  47. Segment anything meets semantic communication. arXiv preprint arXiv:2306.02094, 2023.
  48. Attack-sam: Towards evaluating adversarial robustness of segment anything model. arXiv preprint, 2023c.
  49. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199, 2013.
  50. Explaining and harnessing adversarial examples. In ICLR, 2015.
  51. Adversarial machine learning at scale. In ICLR, 2017.
  52. Towards deep learning models resistant to adversarial attacks. In ICLR, 2018.
  53. On the robustness of segment anything. arXiv preprint arXiv:2305.16220, 2023b.
  54. An empirical study on the robustness of the segment anything model (sam). arXiv preprint arXiv:2305.06422, 2023b.
  55. Robustness of sam: Segment anything under corruptions and beyond. arXiv preprint arXiv:2306.07713, 2023b.
  56. Interactive data synthesis for systematic vision adaptation via llms-aigcs collaboration. arXiv preprint arXiv:2305.12799, 2023a.
  57. Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation. In ICML, pages 12888–12900. PMLR, 2022.
  58. Towards label-free scene understanding by vision foundation models. arXiv preprint arXiv:2306.03899, 2023b.
  59. Bridging the domain gap: Self-supervised 3d scene understanding with foundation models. arXiv preprint arXiv:2305.08776, 2023.
  60. Learning deep features for discriminative localization, 2015.
  61. Segment anything model (sam) enhanced pseudo labels for weakly supervised semantic segmentation. arXiv preprint arXiv:2305.05803, 2023c.
  62. An alternative to wsss? an empirical study of the segment anything model (sam) on weakly-supervised semantic segmentation problems, 2023a.
  63. Usd: Unknown sensitive detector empowered by decoupled objectness and segment anything model. arXiv preprint arXiv:2306.02275, 2023c.
  64. Segment anything is a good pseudo-label generator for weakly supervised semantic segmentation, 2023.
  65. Open-vocabulary semantic segmentation with mask-adapted clip, 2023.
  66. Sad: Segment any rgbd. arXiv preprint arXiv:2305.14207, 2023a.
  67. Matcher: Segment anything with one shot using all-purpose feature matching. arXiv preprint arXiv:2305.13310, 2023a.
  68. Personalize segment anything model with one shot. arXiv preprint arXiv:2305.03048, 2023d.
  69. Zero-shot 3d shape correspondence. arXiv preprint arXiv:2306.03253, 2023.
  70. Caption anything: Interactive image description with diverse multimodal controls. arXiv preprint arXiv:2305.02677, 2023c.
  71. Dense and aligned captions (dac) promote compositional reasoning in vl models. arXiv preprint arXiv:2305.19595, 2023.
  72. Segment anything also detect anything. Technical report, EasyChair, 2023d.
  73. Input augmentation with sam: Boosting medical image segmentation with segmentation foundation model. arXiv preprint arXiv:2304.11332, 2023e.
  74. Scaling-up remote sensing segmentation dataset with segment anything model. arXiv preprint arXiv:2305.02034, 2023e.
  75. Sea ice extraction via remote sensed imagery: Algorithms, datasets, applications and challenges. arXiv preprint arXiv:2306.00303, 2023b.
  76. Text2seg: Remote sensing image semantic segmentation via text-guided visual foundation models. arXiv preprint arXiv:2304.10597, 2023f.
  77. Dsec-mos: Segment any moving object with moving ego vehicle, 2023b.
  78. Learning to" segment anything" in thermal infrared images through knowledge distillation with a large scale dataset satir. arXiv preprint arXiv:2304.07969, 2023.
  79. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10684–10695, 2022.
  80. Edit everything: A text-guided generative system for images editing, 2023.
  81. Differential diffusion: Giving each pixel its strength. arXiv preprint arXiv:2306.00950, 2023.
  82. Inpaint anything: Segment anything meets image inpainting. arXiv preprint arXiv:2304.06790, 2023c.
  83. Internchat: Solving vision-centric tasks by interacting with chatbots beyond language. arXiv preprint arXiv:2305.05662, 2023b.
  84. Instructedit: Improving automatic masks for diffusion-based image editing with user instructions. arXiv preprint arXiv:2305.18047, 2023f.
  85. Boosting text-to-image diffusion models with fine-grained semantic rewards. arXiv preprint arXiv:2305.19599, 2023.
  86. IDEA-Research. Grounded segment anything, 2023. URL https://github.com/IDEA-Research/Grounded-Segment-Anything. GitHub repository.
  87. Grounding dino: Marrying dino with grounded pre-training for open-set object detection. arXiv preprint arXiv:2303.05499, 2023c.
  88. Any-to-any style transfer. arXiv preprint arXiv:2304.09728, 2023d.
  89. Restore anything pipeline: Segment anything meets image restoration. arXiv preprint arXiv:2305.13093, 2023.
  90. A dive into sam prior in image restoration. arXiv preprint arXiv:2305.13620, 2023.
  91. Can sam boost video super-resolution? arXiv preprint arXiv:2305.06524, 2023.
  92. Nerf: Representing scenes as neural radiance fields for view synthesis. Communications of the ACM, 65(1):99–106, 2021.
  93. Inpaintnerf360: Text-guided 3d inpainting on unbounded neural radiance fields. arXiv preprint arXiv:2305.15094, 2023g.
  94. Segment anything in 3d with nerfs, 2023b.
  95. Anything-3d: Towards single-view anything reconstruction in the wild. arXiv preprint arXiv:2304.10261, 2023.
  96. Zero-shot object manipulation with semantic 3d image augmentation for perceiver-actor. 2023.
  97. A survey of diminished reality: Techniques for visually concealing, eliminating, and seeing through real objects. IPSJ Transactions on Computer Vision and Applications, 9:1–14, 2017.
  98. What if we have meta gpt? from content singularity to human-metaverse interaction in aigc era. ArXiv, abs/2304.07521, 2023.
  99. ngthanhtin. owlvit segment anything, 2023. URL https://github.com/ngthanhtin/owlvit_segment_anything. GitHub repository.
  100. Matte anything: Interactive natural image matting with segment anything models. arXiv preprint arXiv:2306.04121, 2023.
  101. Fine-grained visual prompting. arXiv preprint arXiv:2306.04356, 2023b.
  102. Scalable mask annotation for video text spotting, 2023d.
  103. Xmem: Long-term video object segmentation with an atkinson-shiffrin memory model, 2022.
  104. Zongxin Yang and Yi Yang. Decoupling features in hierarchical propagation for video object segmentation. NeurIPS, 2022.
  105. Track anything: Segment anything meets videos. arXiv preprint arXiv:2304.11968, 2023c.
  106. Segment and track anything. arXiv preprint arXiv:2305.06558, 2023b.
  107. Uvosam: A mask-free paradigm for unsupervised video object segmentation via segment anything model. arXiv preprint arXiv:2305.12659, 2023g.
  108. Prompt learning for action recognition. arXiv preprint arXiv:2305.12437, 2023h.
  109. Sam3d: Zero-shot 3d object detection via segment anything model. arXiv preprint arXiv:2306.02245, 2023h.
  110. Sam3d: Segment anything in 3d scenes. arXiv preprint arXiv:2306.03908, 2023d.
  111. Pope: 6-dof promptable pose estimation of any object, in any scene, with one reference. arXiv preprint arXiv:2305.15727, 2023.
  112. 3d model-based zero-shot pose estimation pipeline. arXiv preprint arXiv:2305.17934, 2023d.
  113. Av-sam: Segment anything model meets audio-visual localization and segmentation. arXiv preprint arXiv:2305.01836, 2023.
  114. Explain any concept: Segment anything meets concept-based explanation. arXiv preprint arXiv:2305.10289, 2023b.
  115. Samscore: A semantic structural similarity metric for image translation evaluation. arXiv preprint arXiv:2305.15367, 2023b.
  116. Calib-anything: Zero-training lidar-camera extrinsic calibration method using segment anything. arXiv preprint arXiv:2306.02656, 2023.
  117. Samm (segment any medical model): A 3d slicer integration to sam. arXiv preprint arXiv:2304.05622, 2023e.
  118. Gazesam: What you see is what you segment, 2023i.
  119. Application of segment anything model for civil infrastructure defect assessment, 2023.
  120. Vision guided food assembly by robot teaching from target composition. 2023.
  121. Np-sam: Implementing the segment anything model for easy nanoparticle segmentation in electron microscopy images. 2023.
  122. Segment anything in non-euclidean domains: Challenges and opportunities. arXiv preprint arXiv:2304.11595, 2023.
  123. Panoptic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9404–9413, 2019.
  124. End-to-end training of object class detectors for mean average precision. In Computer Vision–ACCV 2016: 13th Asian Conference on Computer Vision, Taipei, Taiwan, November 20-24, 2016, Revised Selected Papers, Part V 13, pages 198–213. Springer, 2017.
  125. Benchmarking neural network robustness to common corruptions and perturbations. ICLR, 2019.
  126. Kevmo. magic-copy, 2023. URL https://github.com/kevmo314/magic-copy. GitHub repository.
  127. Feizc. Iea, 2023. URL https://github.com/feizc/IEA. GitHub repository.
  128. Gasvn. Editanything, 2023. URL https://github.com/sail-sg/EditAnything. GitHub repository.
  129. Grounded segment anything: From objects to parts. https://github.com/Cheems-Seminar/grounded-segment-any-parts, 2023c.
  130. Semantic-segment-anything, 2023e. URL https://github.com/fudan-zvg/Semantic-Segment-Anything. GitHub repository.
  131. Curt Park. segment anything with clip, 2023. URL https://github.com/Curt-Park/segment-anything-with-clip. GitHub repository.
  132. Vietanhdev. Anylabeling, 2023. URL https://github.com/vietanhdev/anylabeling. GitHub repository.
  133. RockeyCoss. Prompt-segment-anything, 2023. URL https://github.com/RockeyCoss/Prompt-Segment-Anything. GitHub repository.
  134. Amine. Sam-medical-imaging, 2023. URL https://github.com/amine0110/SAM-Medical-Imaging. GitHub repository.
  135. Karol. Segment anything model (sam) in napari, 2023. URL https://github.com/MIC-DKFZ/napari-sam. GitHub repository.
  136. Jo Okuma. napari-segment-anything, 2023. URL https://github.com/JoOkuma/napari-segment-anything. GitHub repository.
  137. Yukang Chen. 3d box segment anything, 2023. URL https://github.com/dvlab-research/3D-Box-Segment-Anything. GitHub repository.
Citations (54)

Summary

We haven't generated a summary for this paper yet.