SurgicalSAM: Efficient Class Promptable Surgical Instrument Segmentation (2308.08746v2)
Abstract: The Segment Anything Model (SAM) is a powerful foundation model that has revolutionised image segmentation. To apply SAM to surgical instrument segmentation, a common approach is to locate precise points or boxes of instruments and then use them as prompts for SAM in a zero-shot manner. However, we observe two problems with this naive pipeline: (1) the domain gap between natural objects and surgical instruments leads to inferior generalisation of SAM; and (2) SAM relies on precise point or box locations for accurate segmentation, requiring either extensive manual guidance or a well-performing specialist detector for prompt preparation, which leads to a complex multi-stage pipeline. To address these problems, we introduce SurgicalSAM, a novel end-to-end efficient-tuning approach for SAM to effectively integrate surgical-specific information with SAM's pre-trained knowledge for improved generalisation. Specifically, we propose a lightweight prototype-based class prompt encoder for tuning, which directly generates prompt embeddings from class prototypes and eliminates the use of explicit prompts for improved robustness and a simpler pipeline. In addition, to address the low inter-class variance among surgical instrument categories, we propose contrastive prototype learning, further enhancing the discrimination of the class prototypes for more accurate class prompting. The results of extensive experiments on both EndoVis2018 and EndoVis2017 datasets demonstrate that SurgicalSAM achieves state-of-the-art performance while only requiring a small number of tunable parameters. The source code is available at https://github.com/wenxi-yue/SurgicalSAM.
- 2018 Robotic Scene Segmentation Challenge. arXiv:2001.11190.
- 2017 Robotic Instrument Segmentation Challenge. arXiv:1902.06426.
- MATIS: Masked-Attention Transformers for Surgical Instrument Segmentation. In ISBI, 1–5.
- From Forks to Forceps: A New Framework for Instance Segmentation of Surgical Instruments. In WACV, 6180–6190. IEEE.
- Virtual or Augmented Reality to Enhance Surgical Education and Surgical Planning. Thoracic Surgery Clinics, 29(3): 329–337.
- SAM-Adapter: Adapting Segment Anything in Underperformed Scenes. In ICCV Workshops, 3367–3375.
- Masked-attention Mask Transformer for Universal Image Segmentation. In CVPR, 1290–1299.
- SAM on Medical Images: A Comprehensive Study on Three Prompt Modes. arXiv:2305.00035.
- Segment Anything Model (SAM) for Digital Pathology: Assess Zero-shot Segmentation on Whole Slide Imaging. In Medical Imaging with Deep Learning, short paper track.
- An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In ICLR.
- ISINet: An Instance-Based Approach for Surgical Instrument Segmentation. In MICCAI, 595–605. Springer.
- Mask R-CNN. In ICCV, 2961–2969.
- Computer-Vision Benchmark Segment-Anything Model (SAM) in Medical Images: Accuracy in 12 Datasets. arXiv:2304.09324.
- Segment Anything Model for Medical Images? Medical Image Analysis, 103061.
- Multitask Learning for Video-based Surgical Skill Assessment. In DICTA, 1–8.
- Incorporating Temporal Prior from Motion Flow for Instrument Segmentation in Minimally Invasive Surgery Video. In MICCAI, 440–448. Springer.
- Temporal Memory Relation Network for Workflow Recognition From Surgical Video. IEEE Transactions on Medical Imaging, 40(7): 1911–1923.
- Segment Anything. In ICCV, 4015–4026.
- Machine Learning for Technical Skill Assessment in Surgery: A Systematic Review. NPJ Digital Medicine, 5(1): 24.
- RefSAM: Efficiently Adapting Segmenting Anything Model for Referring Video Object Segmentation. arXiv:2307.00997.
- Microsoft COCO: Common Objects in Context. In ECCV, 740–755. Springer.
- Towards Unified Surgical Skill Assessment. In CVPR, 9522–9531.
- Segment Anything in Medical Images. arXiv:2304.12306.
- Surgical Data Science – from Concepts toward Clinical Translation. Medical Image Analysis, 76: 102306.
- Augmented Reality in Surgical Navigation: A Review of Evaluation and Validation Metrics. Applied Sciences, 13(3): 1629.
- Segment Anything Model for Medical Image Analysis: An Experimental Study. Medical Image Analysis, 102918.
- V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation. In 3DV, 565–571. IEEE.
- Pyramid Attention Aggregation Network for Semantic Segmentation of Surgical Instruments. In AAAI, volume 34, 11782–11790.
- On Variational Bounds of Mutual Information. In ICML, 5171–5180. PMLR.
- Learning Transferable Visual Models From Natural Language Supervision. In ICML, 8748–8763. PMLR.
- Supervised Autonomous Robotic Soft Tissue Surgery. Science Translational Medicine, 8(337): 337ra64–337ra64.
- Automatic Instrument Segmentation in Robot-Assisted Surgery Using Deep Learning. In ICMLA, 624–628. IEEE.
- The Role of Augmented Reality in Surgical Training: A Systematic Review. Surgical Innovation, 30(3): 366–382.
- Representation Learning with Contrastive Predictive Coding. arXiv:1807.03748.
- SAM. MD: Zero-Shot Medical Image Segmentation Capabilities of the Segment Anything Model. In Medical Imaging with Deep Learning, short paper track.
- SAM Meets Robotic Surgery: An Empirical Study in Robustness Perspective. arXiv:2304.14674.
- SAM Meets Robotic Surgery: An Empirical Study on Generalization, Robustness and Adaptation. In MICCAI Workshops.
- SAMRS: Scaling-up Remote Sensing Segmentation Dataset with Segment Anything Model. In NeurIPS Datasets and Benchmarks Track.
- Medical SAM Adapter: Adapting Segment Anything Model for Medical Image Segmentation. arXiv:2304.12620.
- RingMo-SAM: A Foundation Model for Segment Anything in Multimodal Remote-Sensing Images. IEEE Transactions on Geoscience and Remote Sensing, 61: 1–16.
- Track Anything: Segment Anything Meets Videos. arXiv:2304.11968.
- Video Instance Segmentation. In ICCV, 5188–5197.
- Cascade Multi-Level Transformer Network for Surgical Workflow Analysis. IEEE Transactions on Medical Imaging.
- Empowering Things with Intelligence: A Survey of the Progress, Challenges, and Opportunities in Artificial Intelligence of Things. IEEE Internet of Things Journal, 8(10): 7789–7817.
- Customized Segment Anything Model for Medical Image Segmentation. arXiv:2304.13785.
- Personalize Segment Anything Model with One Shot. arXiv:2305.03048.
- Learning Motion Flows for Semi-supervised Instrument Segmentation from Robotic Surgical Video. In MICCAI, 679–689. Springer.
- TraSeTR: Track-to-Segment Transformer with Contrastive Query for Instance-level Instrument Segmentation in Robotic Surgery. In ICRA, 11186–11193. IEEE.
- Wenxi Yue (4 papers)
- Jing Zhang (731 papers)
- Kun Hu (61 papers)
- Yong Xia (141 papers)
- Jiebo Luo (355 papers)
- Zhiyong Wang (120 papers)