Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

3DSAM-adapter: Holistic adaptation of SAM from 2D to 3D for promptable tumor segmentation (2306.13465v2)

Published 23 Jun 2023 in cs.CV

Abstract: Despite that the segment anything model (SAM) achieved impressive results on general-purpose semantic segmentation with strong generalization ability on daily images, its demonstrated performance on medical image segmentation is less precise and not stable, especially when dealing with tumor segmentation tasks that involve objects of small sizes, irregular shapes, and low contrast. Notably, the original SAM architecture is designed for 2D natural images, therefore would not be able to extract the 3D spatial information from volumetric medical data effectively. In this paper, we propose a novel adaptation method for transferring SAM from 2D to 3D for promptable medical image segmentation. Through a holistically designed scheme for architecture modification, we transfer the SAM to support volumetric inputs while retaining the majority of its pre-trained parameters for reuse. The fine-tuning process is conducted in a parameter-efficient manner, wherein most of the pre-trained parameters remain frozen, and only a few lightweight spatial adapters are introduced and tuned. Regardless of the domain gap between natural and medical data and the disparity in the spatial arrangement between 2D and 3D, the transformer trained on natural images can effectively capture the spatial patterns present in volumetric medical images with only lightweight adaptations. We conduct experiments on four open-source tumor segmentation datasets, and with a single click prompt, our model can outperform domain state-of-the-art medical image segmentation models on 3 out of 4 tasks, specifically by 8.25%, 29.87%, and 10.11% for kidney tumor, pancreas tumor, colon cancer segmentation, and achieve similar performance for liver tumor segmentation. We also compare our adaptation method with existing popular adapters, and observed significant performance improvement on most datasets.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (55)
  1. OpenAI. Gpt-4 technical report, 2023.
  2. Hierarchical text-conditional image generation with clip latents, 2022.
  3. Detecting everything in the open world: Towards universal object detection, 2023.
  4. Segment anything. arXiv:2304.02643, 2023.
  5. Segment anything model for medical image analysis: an experimental study. arXiv preprint arXiv:2304.10517, 2023.
  6. Jun Ma and Bo Wang. Segment anything in medical images. arXiv preprint arXiv:2304.12306, 2023.
  7. Segment anything model for medical images? arXiv preprint arXiv:2304.14660, 2023.
  8. Accuracy of segment-anything model (sam) in medical image segmentation tasks. arXiv preprint arXiv:2304.09324, 2023.
  9. How segment anything model (sam) boost medical image segmentation? arXiv preprint arXiv:2305.03678, 2023.
  10. Segment anything is not always perfect: An investigation of sam on different real-world applications. arXiv preprint arXiv:2304.05750, 2023.
  11. Sam on medical images: A comprehensive study on three prompt modes. arXiv preprint arXiv:2305.00035, 2023.
  12. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys, 55(9):1–35, 2023.
  13. Interactive segmentation of medical images through fully convolutional neural networks. arXiv preprint arXiv:1903.08205, 2019.
  14. Pro-tuning: Unified prompt tuning for vision tasks. arXiv preprint arXiv:2207.14381, 2022.
  15. Bitfit: Simple parameter-efficient fine-tuning for transformer-based masked language-models. arXiv preprint arXiv:2106.10199, 2021.
  16. Parameter-efficient transfer learning for nlp. In International Conference on Machine Learning, pages 2790–2799. PMLR, 2019.
  17. Amos: A large-scale abdominal multi-organ benchmark for versatile medical image segmentation. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, editors, Advances in Neural Information Processing Systems, volume 35, pages 36722–36732. Curran Associates, Inc., 2022.
  18. Aim: Adapting image models for efficient video action recognition. arXiv preprint arXiv:2302.03024, 2023.
  19. St-adapter: Parameter-efficient image-to-video transfer learning for action recognition. arXiv preprint arXiv:2206.13559, 2022.
  20. Med-tuning: Exploring parameter-efficient transfer learning for medical volumetric segmentation. arXiv preprint arXiv:2304.10880, 2023.
  21. Medical sam adapter: Adapting segment anything model for medical image segmentation. arXiv preprint arXiv:2304.12620, 2023.
  22. nnu-net: a self-configuring method for deep learning-based biomedical image segmentation. Nature methods, 18(2):203–211, 2021.
  23. Rethinking imagenet pre-training. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 4918–4927, 2019.
  24. Lora: Low-rank adaptation of large language models. In International Conference on Learning Representations, 2023.
  25. Emerging properties in self-supervised vision transformers. In Proceedings of the IEEE/CVF international conference on computer vision, pages 9650–9660, 2021.
  26. Self-supervised visual feature learning with deep neural networks: A survey. IEEE transactions on pattern analysis and machine intelligence, 43(11):4037–4058, 2020.
  27. Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems, 33:21271–21284, 2020.
  28. Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9729–9738, 2020.
  29. Conceptual captions: A cleaned, hypernymed, image alt-text dataset for automatic image captioning. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2556–2565, 2018.
  30. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
  31. Segment everything everywhere all at once. arXiv preprint arXiv:2304.06718, 2023.
  32. Seggpt: Segmenting everything in context. arXiv preprint arXiv:2304.03284, 2023.
  33. Parameter-efficient fine-tuning of large-scale pre-trained language models. Nature Machine Intelligence, pages 1–16, 2023.
  34. Visual prompt tuning. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXXIII, pages 709–727. Springer, 2022.
  35. Parameter-efficient transfer learning with diff pruning. arXiv preprint arXiv:2012.07463, 2020.
  36. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685, 2021.
  37. Automatic liver tumor segmentation in ct with fully convolutional neural networks and object-based postprocessing. Scientific reports, 8(1):15497, 2018.
  38. The liver tumor segmentation benchmark (lits). Medical Image Analysis, 84:102680, 2023.
  39. The state of the art in kidney and kidney tumor segmentation in contrast-enhanced ct imaging: Results of the kits19 challenge. Medical Image Analysis, page 101821, 2020.
  40. Generalizable pancreas segmentation modeling in ct imaging via meta-learning and latent-space feature flow generation. IEEE Journal of Biomedical and Health Informatics, 2022.
  41. A comprehensive review of deep learning in colon cancer. Computers in Biology and Medicine, 126:104003, 2020.
  42. Unetr++: Delving into efficient and accurate 3d medical image segmentation. arXiv preprint arXiv:2212.04497, 2022.
  43. 3d ux-net: A large kernel volumetric convnet modernizing hierarchical transformer for medical image segmentation. arXiv preprint arXiv:2209.15076, 2022.
  44. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
  45. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF international conference on computer vision, pages 10012–10022, 2021.
  46. Early convolutions help transformers see better. Advances in Neural Information Processing Systems, 34:30392–30400, 2021.
  47. Fourier features let networks learn high frequency functions in low dimensional domains. Advances in Neural Information Processing Systems, 33:7537–7547, 2020.
  48. Anti-oversmoothing in deep vision transformers via the fourier domain analysis: From theory to practice. arXiv preprint arXiv:2203.05962, 2022.
  49. Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 6881–6890, 2021.
  50. The medical segmentation decathlon. Nature communications, 13(1):4128, 2022.
  51. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017.
  52. Transbts: Multimodal brain tumor segmentation using transformer. In Medical Image Computing and Computer Assisted Intervention–MICCAI 2021: 24th International Conference, Strasbourg, France, September 27–October 1, 2021, Proceedings, Part I 24, pages 109–119. Springer, 2021.
  53. nnformer: Interleaved transformer for volumetric segmentation. arXiv preprint arXiv:2109.03201, 2021.
  54. Self-supervised pre-training of swin transformers for 3d medical image analysis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20730–20740, 2022.
  55. Uxnet: Searching multi-level feature aggregation for 3d medical image segmentation. In Medical Image Computing and Computer Assisted Intervention–MICCAI 2020: 23rd International Conference, Lima, Peru, October 4–8, 2020, Proceedings, Part I 23, pages 346–356. Springer, 2020.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Shizhan Gong (6 papers)
  2. Yuan Zhong (70 papers)
  3. Wenao Ma (11 papers)
  4. Jinpeng Li (67 papers)
  5. Zhao Wang (155 papers)
  6. Jingyang Zhang (58 papers)
  7. Pheng-Ann Heng (196 papers)
  8. Qi Dou (163 papers)
Citations (57)

Summary

Holistic Adaptation of SAM from 2D to 3D for Medical Image Segmentation

The paper presents an innovative approach to adapting the Segment Anything Model (SAM) from 2D semantic segmentation tasks to 3D medical image segmentation, particularly targeting tumor segmentation. Designed originally for 2D natural images, the SAM architecture exhibits limitations due to the distinct spatial features inherent in volumetric medical data which are essential for accurate tumor identification. The transition from 2D to 3D is not straightforward, given the volumetric nature of medical imaging modalities such as CT and MRI, necessitating a holistic architecture adaptation to extract 3D spatial information effectively.

The authors introduce a novel 3DSAM-adapter, an efficient parameter adaptation methodology that leverages the existing pre-trained SAM parameters, incorporating selective modifications to accommodate the 3D structure. This method maintains a parameter-efficient tuning process, reducing the necessity for comprehensive retraining by freezing the majority of the SAM's pre-existing parameters and integrating lightweight spatial adapters. This approach effectively steers the model to navigate the domain gap and the spatial complexities between 2D and 3D data representations.

Key performance evaluation on four open-access tumor segmentation datasets demonstrated promising results. The 3DSAM-adapter achieved superior performance compared to state-of-the-art medical image segmentation models on 3 out of 4 datasets, registering Dice score improvements of 8.25%, 29.87%, and 10.11% for kidney tumor, pancreas tumor, and colon cancer segmentation tasks, respectively. For liver tumors, it reached comparable performance levels. These results indicate a significant enhancement in segmentation accuracy, especially given the notable challenge posed by tumors' small size, irregular shapes, and low contrast.

The contributions emphasized in this paper are manifold. Firstly, the paper outlines a holistic framework for 2D-to-3D adaptation in segmentation models, adding only a marginal 7.79% increase in parameters while retaining most pre-trained weights. It also introduces an efficient fine-tuning method, tuning only 16.96% of the model's parameters, highlighting the potential for significant memory and computation savings without compromising accuracy. The inclusion of a multi-layer aggregation mechanism in the decoder further enhances the model's capability to leverage high-resolution textures necessary for delineating fine-grained tumor boundaries. Collectively, these findings advocate the viability of efficient adaptation strategies for domain-specific applications in medical imaging, suggesting broader implications for volumetric image simulation within the AI research community.

The results underscore the potential for this adapted SAM to become a crucial tool in the domain of medical image segmentation, opening avenues for further research on integrating domain-specific knowledge with generalized pre-trained models. The focus on efficient parameter reuse and fine-tuning is a promising avenue for future research and practical deployment, potentially extending to other fields requiring volumetric analysis. This approach heralds a shift in leveraging large foundational models for specialized applications, harnessing their widespread capabilities while addressing the inherent challenges posed by unique domain requirements. The tools and methodologies developed within this paper could redefine how foundational models are perceived and applied across diverse AI applications, particularly where volume-based data processing is crucial.