Vision-Based Neurosurgical Guidance: Unsupervised Localization and Camera-Pose Prediction (2405.09355v1)

Published 15 May 2024 in cs.CV and cs.AI

Abstract: Localizing oneself during endoscopic procedures can be problematic due to the lack of distinguishable textures and landmarks, as well as difficulties due to the endoscopic device such as a limited field of view and challenging lighting conditions. Expert knowledge shaped by years of experience is required for localization within the human body during endoscopic procedures. In this work, we present a deep learning method based on anatomy recognition, that constructs a surgical path in an unsupervised manner from surgical videos, modelling relative location and variations due to different viewing angles. At inference time, the model can map an unseen video's frames on the path and estimate the viewing angle, aiming to provide guidance, for instance, to reach a particular destination. We test the method on a dataset consisting of surgical videos of transsphenoidal adenomectomies, as well as on a synthetic dataset. An online tool that lets researchers upload their surgical videos to obtain anatomy detections and the weights of the trained YOLOv7 model are available at: https://surgicalvision.bmic.ethz.ch.

Summary

The paper presents an unsupervised deep learning method to localize surgical trajectories and predict endoscopic camera poses in real time.
It employs a transformer-based autoencoder integrated with YOLOv7 detection to encode 3D surgical paths without reliance on preoperative imaging.
Results include precise pitch and yaw predictions with a Pearson correlation of 0.97, demonstrating enhanced mapping accuracy for neurosurgical applications.

Vision-Based Neurosurgical Guidance: Unsupervised Localization and Camera-Pose Prediction

This paper outlines a vision-based guidance system for neurosurgical procedures, addressing the complex challenges associated with intraoperative navigation. Specifically, it proposes a novel deep learning approach for unsupervised localization and endoscopic camera-pose prediction to facilitate surgical procedures without relying on traditionally fixed landmarks and preoperative imaging. The work circumvents challenges of low texture, feature indistinctness, and non-rigid deformations inherent in surgical video environments.

The proposed method integrates an Autoencoder (AE) architecture to learn surgical path trajectories and camera orientations from annotated surgical videos, particularly focusing on transsphenoidal adenectomy—a procedure marked by a relatively straightforward anatomical path. The central innovation lies in its ability to encode relative positional and angular data without supervision, offering real-time guidance with the potential for adaptability across various surgical contexts.

Methods

The methodological framework is structured around embedding video frame sequences into a compressed 3D latent space utilizing a transformer-based AE. The encoding process captures the surgical path and camera orientation (pitch and yaw angles), leveraging object detection via YOLOv7. Detection results aid in generating bounding boxes around anatomical structures to drive the input for the AE, negating the need for traditional 3D mapping techniques or explicit landmark tracking.

The authors employ a loss function combining classification accuracy and bounding box predictions, refining it with a unique constraint that enforces a "centered view" for increased model interpretation. Specifically, this entails adjusting predicted camera angles to a standardized reference point, enabling robust surgical path mapping while accommodating motion-related disturbances such as bleeding and flushing.

Results

The system achieves a mean average precision of 53.4% for anatomical detections, tested on a dataset of 166 transsphenoidal adenectomy videos. On a synthetic dataset developed in Blender, the authors report pitch and yaw prediction errors of 0.43 and 0.69 degrees, respectively, illustrating the approach’s precision under controlled experimental conditions. Furthermore, a Pearson correlation coefficient of 0.97, compared to 0.94 in earlier models, demonstrates the enhanced mapping accuracy of the surgical trajectory—even when factors like camera rotation are accounted for.

Implications and Future Work

This research contributes a potentially scalable framework for enhancing neurosurgical guidance, avoiding the costly requirement of specialized imaging equipment. Practically, it offers an immediate application in real-time anatomical navigation and could enhance surgical precision, potentially reducing operative time and associated risks.

The work signals a shift in the use of artificial intelligence and computer vision in neurosurgery, with promising implications for integrating AI-based guidance systems as standard practice. Future explorations indicated by the authors include expanding the applicability to other neurosurgical procedures, integrating structured-light applications like SLAM, and enhancing spatial orientation capabilities with preoperative MRI data. Addressing these areas could further solidify the viability of the approach and support more generalized deployment.

In summation, this paper presents a technically proficient method to aid neurosurgeries through unsupervised vision-based systems, reinforcing the intersection of computer vision advancements and medical imaging technology in operational settings.

PDF Markdown

Related Papers

Tweets

https://twitter.com/CSVisionPapers/status/1791205326097678357

YouTube

Show All Videos