Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

AutoProSAM: Automated Prompting SAM for 3D Multi-Organ Segmentation (2308.14936v4)

Published 28 Aug 2023 in cs.CV and cs.AI

Abstract: Segment Anything Model (SAM) is one of the pioneering prompt-based foundation models for image segmentation and has been rapidly adopted for various medical imaging applications. However, in clinical settings, creating effective prompts is notably challenging and time-consuming, requiring the expertise of domain specialists such as physicians. This requirement significantly diminishes SAM's primary advantage, its interactive capability with end users, in medical applications. Moreover, recent studies have indicated that SAM, originally designed for 2D natural images, performs suboptimally on 3D medical image segmentation tasks. This subpar performance is attributed to the domain gaps between natural and medical images and the disparities in spatial arrangements between 2D and 3D images, particularly in multi-organ segmentation applications. To overcome these challenges, we present a novel technique termed AutoProSAM. This method automates 3D multi-organ CT-based segmentation by leveraging SAM's foundational model capabilities without relying on domain experts for prompts. The approach utilizes parameter-efficient adaptation techniques to adapt SAM for 3D medical imagery and incorporates an effective automatic prompt learning paradigm specific to this domain. By eliminating the need for manual prompts, it enhances SAM's capabilities for 3D medical image segmentation and achieves state-of-the-art (SOTA) performance in CT-based multi-organ segmentation tasks. The code is in this {\href{https://github.com/ChengyinLee/AutoProSAM_2024}{link}}.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (50)
  1. Sam-med2d. arXiv preprint arXiv:2308.16184, 2023.
  2. Parameter-efficient fine-tuning of large-scale pre-trained language models. Nature Machine Intelligence, 5(3):220–235, 2023.
  3. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
  4. Desam: Decoupling segment anything model for generalizable medical image segmentation. arXiv preprint arXiv:2306.00499, 2023.
  5. Battle of the backbones: A large-scale comparison of pretrained models across computer vision tasks. arXiv preprint arXiv:2310.19909, 2023.
  6. 3dsam-adapter: Holistic adaptation of sam from 2d to 3d for promptable medical image segmentation. arXiv preprint arXiv:2306.13465, 2023.
  7. Parameter-efficient transfer learning with diff pruning. arXiv preprint arXiv:2012.07463, 2020.
  8. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149, 2015.
  9. Unetr: Transformers for 3d medical image segmentation. In Proceedings of the IEEE/CVF winter conference on applications of computer vision, pages 574–584, 2022.
  10. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2015.
  11. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861, 2017.
  12. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685, 2021.
  13. How to efficiently adapt large segmentation model (sam) to medical images. arXiv preprint arXiv:2306.13731, 2023.
  14. nnu-net: a self-configuring method for deep learning-based biomedical image segmentation. Nature methods, 18(2):203–211, 2021.
  15. Amos: A large-scale abdominal multi-organ benchmark for versatile medical image segmentation. Advances in Neural Information Processing Systems, 35:36722–36732, 2022.
  16. Visual prompt tuning. In European Conference on Computer Vision, pages 709–727. Springer, 2022.
  17. Self-supervised visual feature learning with deep neural networks: A survey. IEEE transactions on pattern analysis and machine intelligence, 43(11):4037–4058, 2020.
  18. Segment anything. arXiv preprint arXiv:2304.02643, 2023.
  19. Miccai multi-atlas labeling beyond the cranial vault–workshop and challenge. In Proc. MICCAI Multi-Atlas Labeling Beyond Cranial Vault—Workshop Challenge, page 12, 2015.
  20. Focalunetr: A focal transformer for boundary-aware segmentation of ct images, 2023.
  21. Covid-mobilexpert: On-device covid-19 patient triage and follow-up using chest x-rays. In 2020 IEEE international conference on bioinformatics and biomedicine (BIBM), pages 1063–1067. IEEE, 2020.
  22. Microsoft coco: Common objects in context. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13, pages 740–755. Springer, 2014.
  23. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys, 55(9):1–35, 2023.
  24. Segment anything in medical images. arXiv preprint arXiv:2304.12306, 2023.
  25. Segment anything model for medical image analysis: an experimental study. Medical Image Analysis, page 102918, 2023.
  26. Mobilevit: light-weight, general-purpose, and mobile-friendly vision transformer. arXiv preprint arXiv:2110.02178, 2021.
  27. Recent advances in natural language processing via large pre-trained language models: A survey. ACM Computing Surveys, 2021.
  28. Deep learning to achieve clinically applicable segmentation of head and neck anatomy for radiotherapy. arXiv preprint arXiv:1809.04430, 2018.
  29. Dinov2: Learning robust visual features without supervision. arXiv preprint arXiv:2304.07193, 2023.
  30. St-adapter: Parameter-efficient image-to-video transfer learning. Advances in Neural Information Processing Systems, 35:26462–26477, 2022.
  31. Tiny rnn model with certified robustness for text classification. In 2022 International Joint Conference on Neural Networks (IJCNN), pages 1–8. IEEE, 2022.
  32. Interpretability-aware vision transformer. arXiv preprint arXiv:2309.08035, 2023.
  33. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
  34. Ct-org, a new dataset for multiple organ segmentation in computed tomography. Scientific Data, 7(1):381, 2020.
  35. U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18, pages 234–241. Springer, 2015.
  36. Autosam: Adapting sam to medical images by overloading the prompt encoder. arXiv preprint arXiv:2306.06370, 2023.
  37. Efficientnet: Rethinking model scaling for convolutional neural networks. In International conference on machine learning, pages 6105–6114. PMLR, 2019.
  38. Self-supervised pre-training of swin transformers for 3d medical image analysis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20730–20740, 2022.
  39. Med-tuning: Exploring parameter-efficient transfer learning for medical volumetric segmentation. arXiv preprint arXiv:2304.10880, 2023.
  40. Contrastive learning with stronger augmentations. IEEE transactions on pattern analysis and machine intelligence, 45(5):5549–5560, 2022.
  41. Medical sam adapter: Adapting segment anything model for medical image segmentation. arXiv preprint arXiv:2304.12620, 2023.
  42. Bitfit: Simple parameter-efficient fine-tuning for transformer-based masked language-models. arXiv preprint arXiv:2106.10199, 2021.
  43. Faster segment anything: Towards lightweight sam for mobile applications. arXiv preprint arXiv:2306.14289, 2023a.
  44. A comprehensive survey on segment anything model for vision and beyond. arXiv preprint arXiv:2305.08196, 2023b.
  45. Customized segment anything model for medical image segmentation. arXiv preprint arXiv:2304.13785, 2023.
  46. How segment anything model (sam) boost medical image segmentation? arXiv preprint arXiv:2305.03678, 2023.
  47. Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 6881–6890, 2021.
  48. Scene parsing through ade20k dataset. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 633–641, 2017.
  49. nnformer: Volumetric medical image segmentation via a 3d transformer. IEEE Transactions on Image Processing, 2023.
  50. Segment everything everywhere all at once. arXiv preprint arXiv:2304.06718, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Chengyin Li (12 papers)
  2. Prashant Khanduri (29 papers)
  3. Yao Qiang (16 papers)
  4. Rafi Ibn Sultan (6 papers)
  5. Indrin Chetty (4 papers)
  6. Dongxiao Zhu (41 papers)
Citations (11)
Youtube Logo Streamline Icon: https://streamlinehq.com