Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Rendezvous in Time: An Attention-based Temporal Fusion approach for Surgical Triplet Recognition (2211.16963v2)

Published 30 Nov 2022 in cs.CV

Abstract: One of the recent advances in surgical AI is the recognition of surgical activities as triplets of (instrument, verb, target). Albeit providing detailed information for computer-assisted intervention, current triplet recognition approaches rely only on single frame features. Exploiting the temporal cues from earlier frames would improve the recognition of surgical action triplets from videos. In this paper, we propose Rendezvous in Time (RiT) - a deep learning model that extends the state-of-the-art model, Rendezvous, with temporal modeling. Focusing more on the verbs, our RiT explores the connectedness of current and past frames to learn temporal attention-based features for enhanced triplet recognition. We validate our proposal on the challenging surgical triplet dataset, CholecT45, demonstrating an improved recognition of the verb and triplet along with other interactions involving the verb such as (instrument, verb). Qualitative results show that the RiT produces smoother predictions for most triplet instances than the state-of-the-arts. We present a novel attention-based approach that leverages the temporal fusion of video frames to model the evolution of surgical actions and exploit their benefits for surgical triplet recognition.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (25)
  1. T. Vercauteren, M. Unberath, N. Padoy, and N. Navab, “CAI4CAI: the rise of contextual artificial intelligence in computer-assisted interventions,” Proc. IEEE, vol. 108, no. 1, pp. 198–214, 2020.
  2. L. Maier-Hein, S. S. Vedula, S. Speidel, N. Navab, R. Kikinis, A. Park, M. Eisenmann, H. Feussner, G. Forestier, S. Giannarou et al., “Surgical data science for next-generation interventions,” Nature Biomedical Engineering, vol. 1, no. 9, pp. 691–696, 2017.
  3. A. P. Twinanda, S. Shehata, D. Mutter, J. Marescaux, M. de Mathelin, and N. Padoy, “Endonet: A deep architecture for recognition tasks on laparoscopic videos,” IEEE Trans. Medical Imaging, vol. 36, no. 1, pp. 86–97, 2017.
  4. A. Jin, S. Yeung, J. Jopling, J. Krause, D. Azagury, A. Milstein, and L. Fei-Fei, “Tool detection and operative skill assessment in surgical videos using region-based convolutional neural networks,” in WACV, 2018, pp. 691–699.
  5. C. I. Nwoye, “Deep learning methods for the detection and recognition of surgical tools and activities in laparoscopic videos,” Ph.D. dissertation, Université de Strasbourg, Nov 2021.
  6. M. Wagner, B.-P. Müller-Stich, A. Kisilenko, D. Tran, P. Heger, L. Mündermann, D. M. Lubotsky, B. Müller, T. Davitashvili, M. Capek et al., “Comparative validation of machine learning algorithms for surgical workflow and skill analysis with the heichole benchmark,” arXiv preprint arXiv:2109.14956, 2021.
  7. D. Katić, A.-L. Wekerle, F. Gärtner, H. Kenngott, B. P. Müller-Stich, R. Dillmann, and S. Speidel, “Knowledge-driven formalization of laparoscopic surgeries for rule-based intraoperative context-aware assistance,” in IPCAI, 2014, pp. 158–167.
  8. C. I. Nwoye, C. Gonzalez, T. Yu, P. Mascagni, D. Mutter, J. Marescaux, and N. Padoy, “Recognition of instrument-tissue interactions in endoscopic videos via action triplets,” in MICCAI, 2020, pp. 364–374.
  9. C. I. Nwoye, T. Yu, C. Gonzalez, B. Seeliger, P. Mascagni, D. Mutter, J. Marescaux, and N. Padoy, “Rendezvous: Attention mechanisms for the recognition of surgical action triplets in endoscopic videos,” Medical Image Analysis, vol. 78, p. 102433, 2022.
  10. T. Czempiel, M. Paschali, M. Keicher, W. Simson, H. Feussner, S. T. Kim, and N. Navab, “Tecno: Surgical phase recognition with multi-stage temporal convolutional networks,” in MICCAI, 2020, pp. 343–352.
  11. Y. Jin, H. Li, Q. Dou, H. Chen, J. Qin, C.-W. Fu, and P.-A. Heng, “Multi-task recurrent convolutional network with correlation loss for surgical video analysis,” Medical Image Analysis, vol. 59, p. 101572, 2020.
  12. C. I. Nwoye, D. Mutter, J. Marescaux, and N. Padoy, “Weakly supervised convolutional LSTM approach for tool tracking in laparoscopic videos,” IJCARS, vol. 14, no. 6, pp. 1059–1067, 2019.
  13. O. Dergachyova, D. Bouget, A. Huaulmé, X. Morandi, and P. Jannin, “Automatic data-driven real-time segmentation and recognition of surgical workflow,” IJCARS, pp. 1081–1089, 2016.
  14. I. Funke, A. Jenke, S. T. Mees, J. Weitz, S. Speidel, and S. Bodenstedt, “Temporal coherence-based self-supervised learning for laparoscopic workflow analysis,” in Lecture Notes in Computer Science, vol. 11041, 2018, pp. 85–93.
  15. X. Gao, Y. Jin, Y. Long, Q. Dou, and P.-A. Heng, “Trans-svnet: Accurate phase recognition from surgical videos via hybrid embedding aggregation transformer,” in MICCAI, 2021, pp. 593–603.
  16. R. DiPietro, N. Ahmidi, A. Malpani, M. Waldram, G. I. Lee, M. R. Lee, S. S. Vedula, and G. D. Hager, “Segmenting and classifying activities in robot-assisted surgery with recurrent neural networks,” IJCARS, pp. 2005–2020, 2019.
  17. S. Ramesh, D. Dall’Alba, C. Gonzalez, T. Yu, P. Mascagni, D. Mutter, J. Marescaux, P. Fiorini, and N. Padoy, “Multi-task temporal convolutional networks for joint recognition of surgical phases and steps in gastric bypass procedures,” IJCARS, pp. 1111–1119, 2021.
  18. V. S. Bawa, G. Singh, F. KapingA, I. Skarga-Bandurova, E. Oleari, A. Leporini, C. Landolfo, P. Zhao, X. Xiang, G. Luo et al., “The saras endoscopic surgeon action detection (esad) dataset: Challenges and methods,” arXiv preprint arXiv:2104.03178, 2021.
  19. W. Lin, Y. Hu, L. Hao, D. Zhou, M. Yang, H. Fu, C. Chui, and J. Liu, “Instrument-tissue interaction quintuple detection in surgery videos,” in MICCAI, 2022, pp. 399–409.
  20. C. I. Nwoye, D. Alapatt, T. Yu, A. Vardazaryan, F. Xia, Z. Zhao, T. Xia, F. Jia, Y. Yang, H. Wang et al., “Cholectriplet2021: A benchmark challenge for surgical action triplet recognition,” arXiv preprint arXiv:2204.04746, 2022.
  21. Y. Jin, Y. Long, X. Gao, D. Stoyanov, Q. Dou, and P.-A. Heng, “Trans-svnet: hybrid embedding aggregation transformer for surgical workflow analysis,” IJCARS, pp. 1–10, 2022.
  22. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in CVPR, 2016, pp. 770–778.
  23. A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, and L. Fei-Fei, “Large-scale video classification with convolutional neural networks,” in CVPR, 2014, pp. 1725–1732.
  24. C. I. Nwoye and N. Padoy, “Data splits and metrics for method benchmarking on surgical action triplet datasets,” arXiv preprint arXiv:2204.05235, 2022.
  25. X. Wang, R. Girshick, A. Gupta, and K. He, “Non-local neural networks,” in CVPR, 2018, pp. 7794–7803.
Citations (15)

Summary

We haven't generated a summary for this paper yet.