Papers
Topics
Authors
Recent
Search
2000 character limit reached

SurgicalPart-SAM: Part-to-Whole Collaborative Prompting for Surgical Instrument Segmentation

Published 22 Dec 2023 in cs.CV, cs.AI, and cs.RO | (2312.14481v2)

Abstract: The Segment Anything Model (SAM) exhibits promise in generic object segmentation and offers potential for various applications. Existing methods have applied SAM to surgical instrument segmentation (SIS) by tuning SAM-based frameworks with surgical data. However, they fall short in two crucial aspects: (1) Straightforward model tuning with instrument masks treats each instrument as a single entity, neglecting their complex structures and fine-grained details; and (2) Instrument category-based prompts are not flexible and informative enough to describe instrument structures. To address these problems, in this paper, we investigate text promptable SIS and propose SurgicalPart-SAM (SP-SAM), a novel SAM efficient-tuning approach that explicitly integrates instrument structure knowledge with SAM's generic knowledge, guided by expert knowledge on instrument part compositions. Specifically, we achieve this by proposing (1) Collaborative Prompts that describe instrument structures via collaborating category-level and part-level texts; (2) Cross-Modal Prompt Encoder that encodes text prompts jointly with visual embeddings into discriminative part-level representations; and (3) Part-to-Whole Adaptive Fusion and Hierarchical Decoding that adaptively fuse the part-level representations into a whole for accurate instrument segmentation in surgical scenarios. Built upon them, SP-SAM acquires a better capability to comprehend surgical instruments in terms of both overall structure and part-level details. Extensive experiments on both the EndoVis2018 and EndoVis2017 datasets demonstrate SP-SAM's state-of-the-art performance with minimal tunable parameters. The code will be available at https://github.com/wenxi-yue/SurgicalPart-SAM.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (48)
  1. 2017 robotic instrument segmentation challenge. arXiv preprint arXiv:1902.06426, 2019.
  2. 2018 robotic scene segmentation challenge. arXiv preprint arXiv:2001.11190, 2020.
  3. MATIS: Masked-attention transformers for surgical instrument segmentation. In ISBI, pages 1–5, 2023.
  4. Systematic review of multimodal human–computer interaction. In Informatics, page 13. MDPI, 2022.
  5. From forks to forceps: A new framework for instance segmentation of surgical instruments. In WACV, pages 6180–6190. IEEE, 2023.
  6. Utility of optical see-through head mounted displays in augmented reality-assisted surgery: A systematic review. Medical Image Analysis, 77:102361, 2022.
  7. Virtual or augmented reality to enhance surgical education and surgical planning. Thoracic Surgery Clinics, 29(3):329–337, 2019.
  8. SAM-adapter: Adapting segment anything in underperformed scenes. In ICCV, pages 3367–3375, 2023.
  9. Masked-attention mask transformer for universal image segmentation. In CVPR, pages 1290–1299, 2022.
  10. SAM on medical images: A comprehensive study on three prompt modes. arXiv preprint arXiv:2305.00035, 2023.
  11. Segment anything model (SAM) for digital pathology: Assess zero-shot segmentation on whole slide imaging. In Medical Imaging with Deep Learning, short paper track, 2023.
  12. Vision-language transformer and query generation for referring segmentation. In ICCV, pages 16321–16330, 2021.
  13. An image is worth 16x16 words: transformers for image recognition at scale. In ICLR, 2020.
  14. Encoder fusion network with co-attention embedding for referring image segmentation. In CVPR, pages 15506–15515, 2021.
  15. ISINet: an instance-based approach for surgical instrument segmentation. In MICCAI, pages 595–605. Springer, 2020.
  16. Mask r-cnn. In ICCV, pages 2961–2969, 2017.
  17. Computer-vision benchmark segment-anything model (SAM) in medical images: Accuracy in 12 datasets. arXiv preprint arXiv:2304.09324, 2023.
  18. Segmentation from natural language expressions. In ECCV, pages 108–124. Springer, 2016.
  19. Segment anything model for medical images? arXiv preprint arXiv:2304.14660, 2023.
  20. Incorporating temporal prior from motion flow for instrument segmentation in minimally invasive surgery video. In MICCAI, pages 440–448. Springer, 2019.
  21. ReSTR: Convolution-free referring image segmentation using transformers. In CVPR, pages 18145–18154, 2022.
  22. Segment anything. In ICCV, pages 4015–4026, 2023.
  23. Referring image segmentation via recurrent refinement networks. In CVPR, pages 5745–5753, 2018.
  24. Towards unified surgical skill assessment. In CVPR, pages 9522–9531, 2021.
  25. Image segmentation using text and image prompts. In CVPR, pages 7086–7096, 2022.
  26. Segment anything in medical images. arXiv preprint arXiv:2304.12306, 2023.
  27. Segment anything model for medical image analysis: an experimental study. Medical Image Analysis, page 102918, 2023.
  28. V-Net: Fully convolutional neural networks for volumetric medical image segmentation. In 3DV, pages 565–571. IEEE, 2016.
  29. Pyramid attention aggregation network for semantic segmentation of surgical instruments. In AAAI, pages 11782–11790, 2020.
  30. Learning transferable visual models from natural language supervision. In ICML, pages 8748–8763. PMLR, 2021.
  31. Key-word-aware network for referring expression image segmentation. In ECCV, pages 38–54, 2018.
  32. Automatic instrument segmentation in robot-assisted surgery using deep learning. In ICMLA, pages 624–628. IEEE, 2018.
  33. SAM. MD: Zero-shot medical image segmentation capabilities of the segment anything model. In Medical Imaging with Deep Learning, short paper track, 2023.
  34. SAM meets robotic surgery: An empirical study on generalization, robustness and adaptation. arXiv preprint arXiv:2308.07156, 2023.
  35. CRIS: Clip-driven referring image segmentation. In CVPR, pages 11686–11695, 2022.
  36. Towards robust referring image segmentation. arXiv preprint arXiv:2209.09554, 2022.
  37. Medical SAM adapter: Adapting segment anything model for medical image segmentation. arXiv preprint arXiv:2304.12620, 2023.
  38. Track anything: Segment anything meets videos. arXiv preprint arXiv:2304.11968, 2023.
  39. Video instance segmentation. In ICCV, pages 5188–5197, 2019.
  40. Lavt: Language-aware vision transformer for referring image segmentation. In CVPR, pages 18155–18165, 2022.
  41. Cross-modal self-attention network for referring image segmentation. In CVPR, pages 10502–10511, 2019.
  42. Surgicalsam: Efficient class promptable surgical instrument segmentation. arXiv preprint arXiv:2308.08746, 2023.
  43. An extremely fast and precise convolutional neural network for recognition and localization of cataract surgical tools. In MICCAI, pages 56–64. Springer, 2019.
  44. Customized segment anything model for medical image segmentation. arXiv preprint arXiv:2304.13785, 2023.
  45. Personalize segment anything model with one shot. arXiv preprint arXiv:2305.03048, 2023.
  46. Learning motion flows for semi-supervised instrument segmentation from robotic surgical video. In MICCAI, pages 679–689. Springer, 2020.
  47. TraSeTr: track-to-segment transformer with contrastive query for instance-level instrument segmentation in robotic surgery. In ICRA, pages 11186–11193. IEEE, 2022.
  48. Text promptable surgical instrument segmentation with vision-language models. In NeurIPS, 2023.
Citations (2)

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.