Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
126 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Towards Generalizable Surgical Activity Recognition Using Spatial Temporal Graph Convolutional Networks (2001.03728v4)

Published 11 Jan 2020 in cs.CV

Abstract: Modeling and recognition of surgical activities poses an interesting research problem. Although a number of recent works studied automatic recognition of surgical activities, generalizability of these works across different tasks and different datasets remains a challenge. We introduce a modality that is robust to scene variation, and that is able to infer part information such as orientational and relative spatial relationships. The proposed modality is based on spatial temporal graph representations of surgical tools in videos, for surgical activity recognition. To explore its effectiveness, we model and recognize surgical gestures with the proposed modality. We construct spatial graphs connecting the joint pose estimations of surgical tools. Then, we connect each joint to the corresponding joint in the consecutive frames forming inter-frame edges representing the trajectory of the joint over time. We then learn hierarchical spatial temporal graph representations using Spatial Temporal Graph Convolutional Networks (ST-GCN). Our experiments show that learned spatial temporal graph representations perform well in surgical gesture recognition even when used individually. We experiment with the Suturing task of the JIGSAWS dataset where the chance baseline for gesture recognition is 10%. Our results demonstrate 68% average accuracy which suggests a significant improvement. Learned hierarchical spatial temporal graph representations can be used either individually, in cascades or as a complementary modality in surgical activity recognition, therefore provide a benchmark for future studies. To our knowledge, our paper is the first to use spatial temporal graph representations of surgical tools, and pose-based skeleton representations in general, for surgical activity recognition.

Citations (18)

Summary

We haven't generated a summary for this paper yet.