Next-generation Surgical Navigation: Marker-less Multi-view 6DoF Pose Estimation of Surgical Instruments (2305.03535v2)
Abstract: State-of-the-art research of traditional computer vision is increasingly leveraged in the surgical domain. A particular focus in computer-assisted surgery is to replace marker-based tracking systems for instrument localization with pure image-based 6DoF pose estimation using deep-learning methods. However, state-of-the-art single-view pose estimation methods do not yet meet the accuracy required for surgical navigation. In this context, we investigate the benefits of multi-view setups for highly accurate and occlusion-robust 6DoF pose estimation of surgical instruments and derive recommendations for an ideal camera system that addresses the challenges in the operating room. The contributions of this work are threefold. First, we present a multi-camera capture setup consisting of static and head-mounted cameras, which allows us to study the performance of pose estimation methods under various camera configurations. Second, we publish a multi-view RGB-D video dataset of ex-vivo spine surgeries, captured in a surgical wet lab and a real operating theatre and including rich annotations for surgeon, instrument, and patient anatomy. Third, we evaluate three state-of-the-art single-view and multi-view methods for the task of 6DoF pose estimation of surgical instruments and analyze the influence of camera configurations, training data, and occlusions on the pose accuracy and generalization ability. The best method utilizes five cameras in a multi-view pose optimization and achieves an average position and orientation error of 1.01 mm and 0.89\deg for a surgical drill as well as 2.79 mm and 3.33\deg for a screwdriver under optimal conditions. Our results demonstrate that marker-less tracking of surgical instruments is becoming a feasible alternative to existing marker-based systems.
- Image based surgical instrument pose estimation with multi-class labelling and optical flow, in: Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part I 18, Springer. pp. 331–338.
- 2018 robotic scene segmentation challenge. arXiv preprint arXiv:2001.11190 .
- Progressive-x: Efficient, anytime, multi-model fitting algorithm, in: Proceedings of the IEEE/CVF international conference on computer vision, pp. 3780–3788.
- Vision-based and marker-less surgical tool detection and tracking: a review of the literature. Medical image analysis 35, 633–654.
- Markerless suture needle 6d pose tracking with robust uncertainty estimation for autonomous minimally invasive robotic surgery, in: 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE. pp. 5286–5292.
- Locally optimized ransac, in: Joint Pattern Recognition Symposium, Springer. pp. 236–243.
- Blenderproc2: A procedural pipeline for photorealistic rendering. Journal of Open Source Software 8, 4901. URL: https://doi.org/10.21105/joss.04901.
- Hmd-egopose: Head-mounted display-based egocentric marker-less tool and hand pose estimation for augmented surgical guidance. International Journal of Computer Assisted Radiology and Surgery 17, 2253–2262.
- You only look at one sequence: Rethinking transformer in vision through object detection. Advances in Neural Information Processing Systems 34, 26183–26197.
- First in man in-situ augmented reality pedicle screw navigation. North American Spine Society Journal (NASSJ) 6, 100065.
- Surgery 4.0: the natural culmination of the industrial revolution? Innovative Surgical Sciences 2, 105–108.
- Machine learning for surgical phase recognition: a systematic review. Annals of surgery 273, 684–693.
- The placement of lumbar pedicle screws using computerised stereotactic guidance. The Journal of Bone & Joint Surgery British Volume 81, 825–829.
- Robot-assisted minimally invasive surgery—surgical robotics in the data age. Proceedings of the IEEE 110, 835–846.
- Worldwide survey on the use of navigation in spine surgery. World neurosurgery 79, 162–172.
- Detection, segmentation, and 3d pose estimation of surgical tools using convolutional neural networks and algebraic geometry. Medical Image Analysis 70, 101994.
- Surfemb: Dense and continuous correspondence distributions for object pose estimation with learnt surface embeddings, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6749–6758.
- Spyropose: Se (3) pyramids for object pose distribution estimation, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2082–2091.
- Multi-view object pose estimation from correspondence distributions and epipolar geometry, in: 2023 IEEE International Conference on Robotics and Automation (ICRA), IEEE. pp. 1786–1792.
- Towards markerless surgical tool and hand pose estimation. International journal of computer assisted radiology and surgery 16, 799–808.
- Model based training, detection and pose estimation of texture-less 3d objects in heavily cluttered scenes, in: Asian conference on computer vision, Springer. pp. 548–562.
- Bop: Benchmark for 6d object pose estimation, in: Proceedings of the European conference on computer vision (ECCV), pp. 19–34.
- Wide-depth-range 6d object pose estimation in space, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15870–15879.
- Computer aided orthopaedic surgery: incremental shift or paradigm change? Medical image analysis 33, 84–90.
- Pelphix: Surgical phase recognition from x-ray images in percutaneous pelvic fixation, in: International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer. pp. 133–143.
- i3posnet: instrument pose estimation from x-ray in temporal bone surgery. International journal of computer assisted radiology and surgery 15, 1137–1145.
- Cosypose: Consistent multi-view multi-object 6d pose estimation, in: European Conference on Computer Vision, Springer. pp. 574–591.
- Machine learning for technical skill assessment in surgery: a systematic review. NPJ digital medicine 5, 1–16.
- Multi-modal imaging, model-based tracking, and mixed reality visualisation for orthopaedic surgery. Healthcare technology letters 4, 168–173.
- Dynamic hyperbolic attention network for fine hand-object reconstruction, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 14894–14904.
- Automatic registration with continuous pose updates for marker-less surgical navigation in spine surgery. Medical Image Analysis , 103027.
- Spinedepth: A multi-modal data collection approach for automatic labelling and intraoperative spinal shape reconstruction based on rgb-d data. Journal of Imaging 7, 164.
- Gen6d: Generalizable model-free 6-dof object pose estimation from rgb images, in: European Conference on Computer Vision, Springer. pp. 298–315.
- Comparison of navigated versus non-navigated pedicle screw placement in 260 patients and 1434 screws. Journal of Spinal Disorders and Techniques 28, E298–E303.
- Surgical data science–from concepts toward clinical translation. Medical image analysis 76, 102306.
- Computer vision in surgery: from potential to clinical value. npj Digital Medicine 5, 163.
- Nerf: Representing scenes as neural radiance fields for view synthesis. Communications of the ACM 65, 99–106.
- How useful is photo-realistic rendering for visual learning?, in: Computer Vision–ECCV 2016 Workshops: Amsterdam, The Netherlands, October 8-10 and 15-16, 2016, Proceedings, Part III 14, Springer. pp. 202–217.
- Holistic or domain modeling: a semantic scene graph approach. International Journal of Computer Assisted Radiology and Surgery , 1–9.
- 4d-or: Semantic scene graphs for or domain modeling, in: Medical Image Computing and Computer Assisted Intervention–MICCAI 2022: 25th International Conference, Singapore, September 18–22, 2022, Proceedings, Part VII, Springer. pp. 475–485.
- Accuracy of current techniques for placement of pedicle screws in the spine: a comprehensive systematic review and meta-analysis of 51,161 screws. World neurosurgery 126, 664–678.
- Real-time surgical instrument tracking in robot-assisted surgery using multi-domain convolutional neural network. Healthcare technology letters 6, 159–164.
- Accuracy requirements for image-guided spinal pedicle screw placement. Spine 26, 352–359.
- Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767 .
- Camera selection for occlusion-less surgery recording via training with an egocentric camera. IEEE Access 9, 138307–138322.
- Detection and localization of robotic tools in robot-assisted surgery videos using deep neural networks for region proposal and detection. IEEE transactions on medical imaging 36, 1542–1549.
- Dpodv2: Dense correspondence-based 6 dof pose estimation. IEEE transactions on pattern analysis and machine intelligence 44, 7417–7435.
- A disciplined approach to neural network hyper-parameters: Part 1–learning rate, batch size, momentum, and weight decay. arXiv preprint arXiv:1803.09820 .
- Endoscopic Vision Challenge 2023. URL: https://doi.org/10.5281/zenodo.8315050, doi:10.5281/zenodo.8315050.
- Zebrapose: Coarse to fine surface encoding for 6dof object pose estimation, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6738–6748.
- A deeper look at dataset bias. Domain adaptation in computer vision applications , 37–55.
- Unbiased look at dataset bias, in: CVPR 2011, IEEE. pp. 1521–1528.
- Sparf: Neural radiance fields from sparse and noisy poses, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4190–4200.
- Navigation in minimally invasive spine surgery. Journal of Spine Surgery 5, S25.
- Densefusion: 6d object pose estimation by iterative dense fusion, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 3343–3352.
- GDR-Net: Geometry-guided direct regression network for monocular 6d object pose estimation, in: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 16611–16621.
- Pov-surgery: A dataset for egocentric hand and tool pose estimation during surgical activities, in: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 440–450.
- Posecnn: A convolutional neural network for 6d object pose estimation in cluttered scenes.
- A flexible new technique for camera calibration. IEEE Transactions on pattern analysis and machine intelligence 22, 1330–1334.
- Real-time tracking of surgical instruments based on spatio-temporal context and deep learning. Computer Assisted Surgery 24, 20–29.