Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
129 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Exploring Optical Flow Inclusion into nnU-Net Framework for Surgical Instrument Segmentation (2403.10216v1)

Published 15 Mar 2024 in cs.CV and cs.AI

Abstract: Surgical instrument segmentation in laparoscopy is essential for computer-assisted surgical systems. Despite the Deep Learning progress in recent years, the dynamic setting of laparoscopic surgery still presents challenges for precise segmentation. The nnU-Net framework excelled in semantic segmentation analyzing single frames without temporal information. The framework's ease of use, including its ability to be automatically configured, and its low expertise requirements, have made it a popular base framework for comparisons. Optical flow (OF) is a tool commonly used in video tasks to estimate motion and represent it in a single frame, containing temporal information. This work seeks to employ OF maps as an additional input to the nnU-Net architecture to improve its performance in the surgical instrument segmentation task, taking advantage of the fact that instruments are the main moving objects in the surgical field. With this new input, the temporal component would be indirectly added without modifying the architecture. Using CholecSeg8k dataset, three different representations of movement were estimated and used as new inputs, comparing them with a baseline model. Results showed that the use of OF maps improves the detection of classes with high movement, even when these are scarce in the dataset. To further improve performance, future work may focus on implementing other OF-preserving augmentations.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (11)
  1. Lee, E.-J., Plishker, W., Liu, X., Kane, T., Bhattacharyya, S. S., and Shekhar, R., “Segmentation of surgical instruments in laparoscopic videos: training dataset generation and deep-learning-based framework,” in [Medical Imaging 2019: Image-Guided Procedures, Robotic Interventions, and Modeling ], 10951, 461–469, SPIE (2019).
  2. Isensee, F., Jaeger, P. F., Kohl, S. A., Petersen, J., and Maier-Hein, K. H., “nnu-net: a self-configuring method for deep learning-based biomedical image segmentation,” Nature methods 18(2), 203–211 (2021).
  3. Baumgartner, M., Jäger, P. F., Isensee, F., and Maier-Hein, K. H., “nndetection: a self-configuring method for medical object detection,” in [Medical Image Computing and Computer Assisted Intervention–MICCAI 2021: 24th International Conference, Strasbourg, France, September 27–October 1, 2021, Proceedings, Part V 24 ], 530–539, Springer (2021).
  4. Isensee, F., Ulrich, C., Wald, T., and Maier-Hein, K. H., “Extending nnu-net is all you need,” in [BVM Workshop ], 12–17, Springer (2023).
  5. McConnell, N., Miron, A., Wang, Z., and Li, Y., “Integrating residual, dense, and inception blocks into the nnunet,” in [2022 IEEE 35th International Symposium on Computer-Based Medical Systems (CBMS) ], 217–222, IEEE (2022).
  6. Zhou, H.-Y., Guo, J., Zhang, Y., Han, X., Yu, L., Wang, L., and Yu, Y., “nnformer: volumetric medical image segmentation via a 3d transformer,” IEEE Transactions on Image Processing (2023).
  7. Liu, L., Zhang, J., He, R., Liu, Y., Wang, Y., Tai, Y., Luo, D., Wang, C., Li, J., and Huang, F., “Learning by analogy: Reliable supervision from transformations for unsupervised optical flow estimation,” in [Proceedings of the IEEE/CVF conference on computer vision and pattern recognition ], 6489–6498 (2020).
  8. Lai, H.-Y., Tsai, Y.-H., and Chiu, W.-C., “Bridging stereo matching and optical flow via spatiotemporal correspondence,” in [Proceedings of the IEEE/CVF conference on computer vision and pattern recognition ], 1890–1899 (2019).
  9. Rashed, H., Yogamani, S., El-Sallab, A., Krizek, P., and El-Helw, M., “Optical flow augmented semantic segmentation networks for automated driving,” arXiv preprint arXiv:1901.07355 (2019).
  10. Twinanda, A. P., Shehata, S., Mutter, D., Marescaux, J., De Mathelin, M., and Padoy, N., “Endonet: a deep architecture for recognition tasks on laparoscopic videos,” IEEE transactions on medical imaging 36(1), 86–97 (2016).
  11. Hong, W.-Y., Kao, C.-L., Kuo, Y.-H., Wang, J.-R., Chang, W.-L., and Shih, C.-S., “Cholecseg8k: a semantic segmentation dataset for laparoscopic cholecystectomy based on cholec80,” arXiv preprint arXiv:2012.12453 (2020).

Summary

We haven't generated a summary for this paper yet.