Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

CudaSIFT-SLAM: multiple-map visual SLAM for full procedure mapping in real human endoscopy (2405.16932v1)

Published 27 May 2024 in cs.RO and cs.CV

Abstract: Monocular visual simultaneous localization and mapping (V-SLAM) is nowadays an irreplaceable tool in mobile robotics and augmented reality, where it performs robustly. However, human colonoscopies pose formidable challenges like occlusions, blur, light changes, lack of texture, deformation, water jets or tool interaction, which result in very frequent tracking losses. ORB-SLAM3, the top performing multiple-map V-SLAM, is unable to recover from them by merging sub-maps or relocalizing the camera, due to the poor performance of its place recognition algorithm based on ORB features and DBoW2 bag-of-words. We present CudaSIFT-SLAM, the first V-SLAM system able to process complete human colonoscopies in real-time. To overcome the limitations of ORB-SLAM3, we use SIFT instead of ORB features and replace the DBoW2 direct index with the more computationally demanding brute-force matching, being able to successfully match images separated in time for relocation and map merging. Real-time performance is achieved thanks to CudaSIFT, a GPU implementation for SIFT extraction and brute-force matching. We benchmark our system in the C3VD phantom colon dataset, and in a full real colonoscopy from the Endomapper dataset, demonstrating the capabilities to merge sub-maps and relocate in them, obtaining significantly longer sub-maps. Our system successfully maps in real-time 88 % of the frames in the C3VD dataset. In a real screening colonoscopy, despite the much higher prevalence of occluded and blurred frames, the mapping coverage is 53 % in carefully explored areas and 38 % in the full sequence, a 70 % improvement over ORB-SLAM3.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (23)
  1. C. Campos, R. Elvira, J. J. G. Rodríguez, J. M. M. Montiel, and J. D. Tardós, “ORB-SLAM3: An accurate open-source library for visual, visual–inertial, and multimap SLAM,” IEEE Transactions on Robotics, vol. 37, no. 6, pp. 1874–1890, 2021.
  2. D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” International journal of computer vision, vol. 60, pp. 91–110, 2004.
  3. T. L. Bobrow, M. Golhar, R. Vijayan, V. S. Akshintala, J. R. Garcia, and N. J. Durr, “Colonoscopy 3D video dataset with paired depth from 2D-3D registration,” Medical Image Analysis, p. 102956, 2023.
  4. Azagra, P., Sostres, C., Ferrandez, Á., Riazuelo, L., Tomasini, C., Barbed, O. L., Morlana, J., Recasens, D., Batlle, V. M., Gómez-Rodríguez, J. J., Elvira, R., López, J., Oriol, C., Civera, J., Tardós, J. D., Murillo, A. C., Lanas, A., and Montiel, J. M. M., “Endomapper dataset of complete calibrated endoscopy procedures,” Scientific Data, vol. 10, no. 1, p. 671, 2023.
  5. Y. Wang, L. Zhao, L. Gong, X. Chen, and S. Zuo, “A monocular SLAM system based on SIFT features for gastroscope tracking,” Medical & Biological Engineering & Computing, vol. 61, no. 2, pp. 511–523, 2023.
  6. N. Mahmoud, T. Collins, A. Hostettler, L. Soler, C. Doignon, and J. M. M. Montiel, “Live tracking and dense reconstruction for handheld monocular endoscopy,” IEEE transactions on medical imaging, vol. 38, no. 1, pp. 79–89, 2018.
  7. P. Mountney, D. Stoyanov, and G.-Z. Yang, “Three-dimensional tissue deformation recovery and tracking,” IEEE Signal Processing Magazine, vol. 27, no. 4, pp. 14–24, 2010.
  8. X. Liu, Z. Li, M. Ishii, G. D. Hager, R. H. Taylor, and M. Unberath, “SAGE: SLAM with appearance and geometry prior for endoscopy,” in 2022 International conference on robotics and automation (ICRA), pp. 5587–5593, IEEE, 2022.
  9. J. J. Gomez-Rodriguez, J. M. M. Montiel, and J. D. Tardos, “NR-SLAM: Non-rigid monocular SLAM,” arXiv preprint arXiv:2308.04036, 2023.
  10. R. Ma, R. Wang, Y. Zhang, S. Pizer, S. K. McGill, J. Rosenman, and J.-M. Frahm, “RNNSLAM: Reconstructing the 3D colon to visualize missing regions during a colonoscopy,” Medical image analysis, vol. 72, p. 102100, 2021.
  11. J. Totz, P. Mountney, D. Stoyanov, and G.-Z. Yang, “Dense surface reconstruction for enhanced navigation in mis,” in Medical Image Computing and Computer-Assisted Intervention–MICCAI 2011, pp. 89–96, Springer, 2011.
  12. J. Song, J. Wang, L. Zhao, S. Huang, and G. Dissanayake, “MIS-SLAM: Real-time large-scale dense deformable SLAM system in minimal invasive surgery based on heterogeneous computing,” IEEE Robotics and Automation Letters, vol. 3, no. 4, pp. 4068–4075, 2018.
  13. J. Song, R. Zhang, Q. Zhu, J. Lin, and M. Ghaffari, “BDIS-SLAM: a lightweight CPU-based dense stereo SLAM for surgery,” International Journal of Computer Assisted Radiology and Surgery, pp. 1–10, 2024.
  14. H. Zhou and J. Jayender, “EMDQ-SLAM: Real-time high-resolution reconstruction of soft tissue surface from stereo laparoscopy videos,” in Medical Image Computing and Computer Assisted Intervention – MICCAI 2021, pp. 331–340, Springer, 2021.
  15. D. J. Mirota, H. Wang, R. H. Taylor, M. Ishii, G. L. Gallia, and G. D. Hager, “A system for video-based navigation for endoscopic endonasal skull base surgery,” IEEE transactions on medical imaging, vol. 31, no. 4, pp. 963–976, 2011.
  16. R. Mur-Artal, J. M. M. Montiel, and J. D. Tardós, “ORB-SLAM: a versatile and accurate monocular SLAM system,” IEEE Transactions on Robotics, vol. 31, no. 5, pp. 1147–1163, 2015.
  17. J. Shi and C. Tomasi, “Good features to track,” in IEEE conference on computer vision and pattern recognition, pp. 593–600, IEEE, 1994.
  18. B. D. Lucas and T. Kanade, “An iterative image registration technique with an application to stereo vision,” in IJCAI’81: 7th international joint conference on Artificial intelligence, vol. 2, pp. 674–679, 1981.
  19. J. Engel, V. Koltun, and D. Cremers, “Direct sparse odometry,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 40, no. 3, pp. 611–625, 2018.
  20. B. K. P. Horn, “Closed-form solution of absolute orientation using unit quaternions,” JOSA A, vol. 4, no. 4, pp. 629–642, 1987.
  21. J. Kannala and S. S. Brandt, “A generic camera model and calibration method for conventional, wide-angle, and fish-eye lenses,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 28, no. 8, pp. 1335–1340, 2006.
  22. H. C. Longuet-Higgins, “A computer algorithm for reconstructing a scene from two projections,” Nature, vol. 293, no. 5828, pp. 133–135, 1981.
  23. J. L. Schönberger and J.-M. Frahm, “Structure-from-motion revisited,” in Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Richard Elvira (4 papers)
  2. Juan D. Tardós (23 papers)
  3. José M. M. Montiel (9 papers)
Citations (2)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com