Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
124 tokens/sec
GPT-4o
8 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

MATIS: Masked-Attention Transformers for Surgical Instrument Segmentation (2303.09514v4)

Published 16 Mar 2023 in cs.CV

Abstract: We propose Masked-Attention Transformers for Surgical Instrument Segmentation (MATIS), a two-stage, fully transformer-based method that leverages modern pixel-wise attention mechanisms for instrument segmentation. MATIS exploits the instance-level nature of the task by employing a masked attention module that generates and classifies a set of fine instrument region proposals. Our method incorporates long-term video-level information through video transformers to improve temporal consistency and enhance mask classification. We validate our approach in the two standard public benchmarks, Endovis 2017 and Endovis 2018. Our experiments demonstrate that MATIS' per-frame baseline outperforms previous state-of-the-art methods and that including our temporal consistency module boosts our model's performance further.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (36)
  1. “Systems for tracking minimally invasive surgical instruments,” Minimally Invasive Therapy & Allied Technologies, vol. 16, no. 6, pp. 328–340, 2007.
  2. “Concurrent segmentation and localization for tracking of surgical instruments,” in MICCAI. Springer, 2017, pp. 664–672.
  3. “Articulated multi-instrument 2-d pose estimation using fully convolutional networks,” IEEE transactions on medical imaging, vol. 37, no. 5, pp. 1276–1287, 2018.
  4. “Data-centric multi-task surgical phase estimation with sparse scene segmentation,” International Journal of Computer Assisted Radiology and Surgery, vol. 17, no. 5, pp. 953–960, 2022.
  5. “ToolNet: Holistically-nested real-time segmentation of robotic surgical tools,” in IROS. sep 2017, IEEE.
  6. Sebastian Bodenstedt et. al., “Comparative evaluation of instrument segmentation and tracking methods in minimally invasive surgery,” 2018.
  7. Max Allan et. al, “2017 robotic instrument segmentation challenge,” 2019.
  8. Max Allan and Satoshi Kondo et. al., “2018 robotic scene segmentation challenge,” 2020.
  9. “Automatic instrument segmentation in robot-assisted surgery using deep learning,” in ICMLA. 2018, IEEE.
  10. “U-netplus: A modified encoder-decoder u-net architecture for semantic and instance segmentation of surgical instruments from laparoscopic images,” in EMBC, 2019, pp. 7205–7211.
  11. “Learning motion flows for semi-supervised instrument segmentation from robotic surgical video,” 2020.
  12. “Incorporating temporal prior from motion flow for instrument segmentation in minimally invasive surgery video,” 2019.
  13. “StreoScenNet: surgical stereo robotic scene segmentation,” in Medical Imaging 2019: Image-Guided Procedures, Robotic Interventions, and Modeling, 2019.
  14. “Learning where to look while tracking instruments in robot-assisted surgery,” 2019.
  15. “Scalable joint detection and segmentation of surgical instruments with weak supervision,” in MICCAI, 2021.
  16. “One to many: Adaptive instrument segmentation via meta learning and dynamic online adaptation in robotic surgical video,” 2021.
  17. “Synthetic and real inputs for tool segmentation in robotic surgery,” CoRR, 2020.
  18. “Self-supervised surgical tool segmentation using kinematic information,” CoRR, vol. abs/1902.04810, 2019.
  19. “Robotic instrument segmentation with image-to-image translation,” IEEE Robotics and Automation Letters, vol. 6, no. 2, pp. 935–942, 2021.
  20. “Isinet: An instance-based approach for surgical instrument segmentation,” 2020.
  21. “Mask r-cnn,” in ICCV, 2017, pp. 2980–2988.
  22. “Mask then classify: multi-instance segmentation for surgical instruments,” International Journal of Computer Assisted Radiology and Surgery, vol. 16, no. 7, pp. 1227–1236, Jul 2021.
  23. “Swin transformer: Hierarchical vision transformer using shifted windows,” 2021.
  24. “An image is worth 16x16 words: Transformers for image recognition at scale,” 2020.
  25. “End-to-end object detection with transformers,” in ECCV, 2020, pp. 213–229.
  26. “Per-pixel classification is not all you need for semantic segmentation,” in NeurIPS, 2021, vol. 34, pp. 17864–17875.
  27. “A parallel network utilizing local features and global representations for segmentation of surgical instruments,” International Journal of Computer Assisted Radiology and Surgery, vol. 17, no. 10, pp. 1903–1913, Oct 2022.
  28. “Trasetr: Track-to-segment transformer with contrastive query for instance-level instrument segmentation in robotic surgery,” 2022.
  29. “Deformable detr: Deformable transformers for end-to-end object detection,” in ICLR, 2021.
  30. “Masked-attention mask transformer for universal image segmentation,” CoRR, 2021.
  31. “Video swin transformer,” CVPR, pp. 3192–3201, 2022.
  32. “Multiscale vision transformers,” in ICCV, October 2021, pp. 6824–6835.
  33. “Exploring intra- and inter-video relation for surgical semantic scene segmentation,” IEEE Transactions on Medical Imaging, 2022.
  34. “Towards holistic surgical scene understanding,” in MICCAI, 2022.
  35. “Microsoft coco: Common objects in context,” in ECCV, 2014, pp. 740–755.
  36. “The kinetics human action video dataset,” 2017.
Citations (17)

Summary

We haven't generated a summary for this paper yet.