Surgical Scene Segmentation by Transformer With Asymmetric Feature Enhancement (2410.17642v1)
Abstract: Surgical scene segmentation is a fundamental task for robotic-assisted laparoscopic surgery understanding. It often contains various anatomical structures and surgical instruments, where similar local textures and fine-grained structures make the segmentation a difficult task. Vision-specific transformer method is a promising way for surgical scene understanding. However, there are still two main challenges. Firstly, the absence of inner-patch information fusion leads to poor segmentation performance. Secondly, the specific characteristics of anatomy and instruments are not specifically modeled. To tackle the above challenges, we propose a novel Transformer-based framework with an Asymmetric Feature Enhancement module (TAFE), which enhances local information and then actively fuses the improved feature pyramid into the embeddings from transformer encoders by a multi-scale interaction attention strategy. The proposed method outperforms the SOTA methods in several different surgical segmentation tasks and additionally proves its ability of fine-grained structure recognition. Code is available at https://github.com/cyuan-sjtu/ViT-asym.
- “3-d pose estimation of articulated instruments in robotic minimally invasive surgery,” IEEE transactions on medical imaging, vol. 37, no. 5, pp. 1204–1213, 2018.
- “Augmented reality guided laparoscopic surgery of the uterus,” IEEE Transactions on Medical Imaging, vol. 40, no. 1, pp. 371–380, 2020.
- “A deep learning framework for quality assessment and restoration in video endoscopy,” Medical image analysis, vol. 68, pp. 101900, 2021.
- “Deeplab_v3_plus-net for image semantic segmentation with channel compression,” in 2020 IEEE 20th International Conference on Communication Technology (ICCT). IEEE, 2020, pp. 1320–1324.
- “Space squeeze reasoning and low-rank bilinear feature fusion for surgical image segmentation,” IEEE Journal of Biomedical and Health Informatics, vol. 26, no. 7, pp. 3209–3217, 2022.
- “Lskanet: Long strip kernel attention network for robotic surgical scene segmentation,” IEEE Transactions on Medical Imaging, 2023.
- “End-to-end object detection with transformers,” in European conference on computer vision. Springer, 2020, pp. 213–229.
- “Exploring intra-and inter-video relation for surgical semantic scene segmentation,” IEEE Transactions on Medical Imaging, vol. 41, no. 11, pp. 2991–3002, 2022.
- “Per-pixel classification is not all you need for semantic segmentation,” Advances in neural information processing systems, vol. 34, pp. 17864–17875, 2021.
- “Mask dino: Towards a unified transformer-based framework for object detection and segmentation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 3041–3050.
- “Ovarian cancer prediction in proteomic data using stacked asymmetric convolution,” in Medical Image Computing and Computer Assisted Intervention–MICCAI 2020: 23rd International Conference, Lima, Peru, October 4–8, 2020, Proceedings, Part II 23. Springer, 2020, pp. 263–271.
- “The endoscapes dataset for surgical scene segmentation, object detection, and critical view of safety assessment: official splits and benchmark,” arXiv preprint arXiv:2312.12429, 2023.
- “2018 robotic scene segmentation challenge,” arXiv preprint arXiv:2001.11190, 2020.
- “Mask r-cnn,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 2961–2969.
- Z. Cai and N. Vasconcelos, “Cascade r-cnn: High quality object detection and instance segmentation,” IEEE transactions on pattern analysis and machine intelligence, vol. 43, no. 5, pp. 1483–1498, 2019.
- “U-net: Convolutional networks for biomedical image segmentation,” in Medical image computing and computer-assisted intervention–MICCAI 2015: 18th international conference, Munich, Germany, October 5-9, 2015, proceedings, part III 18. Springer, 2015, pp. 234–241.
- “Unified perceptual parsing for scene understanding,” in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 418–434.
- “Deep high-resolution representation learning for visual recognition,” IEEE transactions on pattern analysis and machine intelligence, vol. 43, no. 10, pp. 3349–3364, 2020.
- “Segformer: Simple and efficient design for semantic segmentation with transformers,” Advances in neural information processing systems, vol. 34, pp. 12077–12090, 2021.
- “Segnext: Rethinking convolutional attention design for semantic segmentation,” Advances in Neural Information Processing Systems, vol. 35, pp. 1140–1156, 2022.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.