- The paper introduces SurgeonAssist-Net, a lightweight CNN-RNN framework that enables near real-time, context-aware surgical guidance on OST-HMDs.
- It employs EfficientNet-Lite-B0 for spatial feature extraction and a GRU for temporal modeling, achieving 7.4× fewer parameters and a threefold speed increase over prior models.
- The research underscores the framework's potential in enhancing surgical training and evaluations through portable, ONNX-based AR assistance in clinical settings.
Overview of "SurgeonAssist-Net: Towards Context-Aware Head-Mounted Display-Based Augmented Reality for Surgical Guidance"
The paper presents SurgeonAssist-Net, an efficient and lightweight framework designed to facilitate context-aware surgical guidance via augmented reality (AR) on optical see-through head-mounted displays (OST-HMDs), such as the Microsoft HoloLens 2. The system aims to provide virtual assistance for predefined surgical tasks by integrating computer vision and machine learning techniques. This approach promises to enhance the usability and effectiveness of AR-guided surgical interventions while minimizing distractions traditionally associated with OST-HMDs.
Methodology
SurgeonAssist-Net leverages a combination of a convolutional neural network (CNN) and a recurrent neural network (RNN) to achieve context-aware predictions of surgical tasks. The CNN component utilizes EfficientNet-Lite-B0, an optimized version for mobile CPUs, to extract spatial features from video input. In parallel, a gated recurrent unit (GRU) RNN is employed to model temporal dependencies, capturing sequential data vital for recognizing surgical phases.
The paper details the system's implementation on the HoloLens 2, enabling near real-time inference and visualization of surgical task predictions directly on the display. The integration uses Windows Machine Learning and Open Neural Net Exchange (ONNX) libraries to process the CNN and RNN models efficiently.
Results
The operational efficiency of SurgeonAssist-Net was benchmarked using the Cholec80 dataset, a widely recognized resource for evaluating surgical task recognition systems. The framework demonstrated comparable prediction accuracy to state-of-the-art models while significantly reducing parameter count and computational burden. Specifically, it achieved 7.4 times fewer parameters and delivered a threefold speed increase in CPU inference compared to SV-RCNet, highlighting its suitability for applications within a computationally constrained OST-HMD environment.
In a user-centric evaluation, SurgeonAssist-Net maintained comparable performance post-ONNX conversion, underscoring the portability and adaptability of the model for real-world deployment scenarios.
Implications and Future Directions
The SurgeonAssist-Net framework addresses critical challenges in AR-assisted surgical guidance by aligning the presentation of virtual data with the current surgical context. Its application could extend to various domains, including surgical training and procedure evaluation, thereby fostering more intuitive and adaptive surgical environments.
For future research, the authors highlight the potential advantages of augmenting the dataset with a broader range of surgical tasks and incorporating user studies to further quantify SurgeonAssist-Net's impact on surgical performance and user satisfaction. Expanding the system's capability to handle more diverse procedures and integrating feedback from clinical deployments will be crucial steps toward widespread adoption.
In summary, this work exhibits a pragmatic approach to enhancing AR-guided surgery, leveraging efficient neural architectures to enable context-aware and lightweight implementations on commercially available OST-HMDs, thus propelling the integration of AR technology into mainstream surgical practice.