Bimodal Camera Pose Prediction for Endoscopy (2204.04968v2)
Abstract: Deducing the 3D structure of endoscopic scenes from images is exceedingly challenging. In addition to deformation and view-dependent lighting, tubular structures like the colon present problems stemming from their self-occluding and repetitive anatomical structure. In this paper, we propose SimCol, a synthetic dataset for camera pose estimation in colonoscopy, and a novel method that explicitly learns a bimodal distribution to predict the endoscope pose. Our dataset replicates real colonoscope motion and highlights the drawbacks of existing methods. We publish 18k RGB images from simulated colonoscopy with corresponding depth and camera poses and make our data generation environment in Unity publicly available. We evaluate different camera pose prediction methods and demonstrate that, when trained on our data, they generalize to real colonoscopy sequences, and our bimodal approach outperforms prior unimodal work.
- M. F. Kaminski et al., “Quality indicators for colonoscopy and the risk of interval cancer,” New England Journal of Medicine, vol. 362, no. 19, pp. 1795–1803, 2010.
- D. K. Rex, “Polyp detection at colonoscopy: Endoscopist and technical factors,” Best Practice & Research Clinical Gastroenterology, vol. 31, no. 4, pp. 425–433, 2017.
- J. L. Schönberger and J.-M. Frahm, “Structure-from-motion revisited,” in Conference on Computer Vision and Pattern Recognition, 2016.
- R. Mur-Artal, J. M. M. Montiel, and J. D. Tardos, “Orb-slam: a versatile and accurate monocular slam system,” IEEE transactions on robotics, vol. 31, no. 5, pp. 1147–1163, 2015.
- N. Mahmoud, A. Hostettler, T. Collins, L. Soler, C. Doignon, and J. Montiel, “Slam based quasi dense reconstruction for minimally invasive surgery scenes,” arXiv preprint arXiv:1705.09107, 2017.
- O. L. Barbed, F. Chadebecq, J. Morlana, J. M. Montiel, and A. C. Murillo, “Superpoint features in endoscopy,” in MICCAI Workshop on Imaging Systems for GI Endoscopy, pp. 45–55, Springer, 2022.
- M. A. Armin, N. Barnes, J. Alvarez, H. Li, F. Grimpen, and O. Salvado, “Learning camera pose from optical colonoscopy frames through deep convolutional neural network (cnn),” in Computer assisted and robotic endoscopy and clinical image-based procedures, pp. 50–59, Springer, 2017.
- G. Bae, I. Budvytis, C.-K. Yeung, and R. Cipolla, “Deep multi-view stereo for dense 3d reconstruction from monocular endoscopic video,” in International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 774–783, Springer, 2020.
- D. Freedman, Y. Blau, L. Katzir, A. Aides, I. Shimshoni, D. Veikherman, T. Golany, A. Gordon, G. Corrado, Y. Matias, et al., “Detecting deficient coverage in colonoscopies,” IEEE Transactions on Medical Imaging, vol. 39, no. 11, pp. 3451–3462, 2020.
- M. J. Fulton, J. M. Prendergast, E. R. DiTommaso, and M. E. Rentschler, “Comparing visual odometry systems in actively deforming simulated colon environments,” in 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4988–4995, IEEE, 2020.
- R. Ma, R. Wang, S. Pizer, J. Rosenman, S. K. McGill, and J.-M. Frahm, “Real-time 3d reconstruction of colonoscopic surfaces for determining missing regions,” in International Conference on Medical Image Computing and Computer-Assisted Intervention, 2019.
- K. B. Ozyoruk et al., “Endoslam dataset and an unsupervised monocular visual odometry and depth estimation approach for endoscopic videos,” Medical image analysis, vol. 71, p. 102058, 2021.
- M. Turan et al., “Unsupervised odometry and depth learning for endoscopic capsule robots,” in 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1801–1807, IEEE, 2018.
- A. R. Widya, Y. Monno, M. Okutomi, S. Suzuki, T. Gotoda, and K. Miki, “Whole stomach 3d reconstruction and frame localization from monocular endoscope video,” IEEE journal of translational engineering in health and medicine, vol. 7, pp. 1–10, 2019.
- S. Zhang, L. Zhao, S. Huang, M. Ye, and Q. Hao, “A template-based 3d reconstruction of colon structures and textures from stereo colonoscopic images,” IEEE Transactions on Medical Robotics and Bionics, vol. 3, no. 1, pp. 85–95, 2020.
- C. Tang and P. Tan, “Ba-net: Dense bundle adjustment network,” arXiv preprint arXiv:1806.04807, 2018.
- C. Zhao, L. Sun, P. Purkait, T. Duckett, and R. Stolkin, “Learning monocular visual odometry with dense 3d mapping from dense 3d flow,” in 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 6864–6871, IEEE, 2018.
- F. Mahmood, R. Chen, and N. J. Durr, “Unsupervised reverse domain adaptation for synthetic medical images via adversarial training,” IEEE transactions on medical imaging, vol. 37, no. 12, pp. 2572–2581, 2018.
- A. Rau et al., “Implicit domain adaptation with conditional generative adversarial networks for depth prediction in endoscopy,” International journal of computer assisted radiology and surgery, vol. 14, no. 7, pp. 1167–1176, 2019.
- S. Mathew, S. Nadeem, S. Kumari, and A. Kaufman, “Augmenting colonoscopy using extended and directional cyclegan for lossy image translation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4696–4705, 2020.
- H. Yao, R. W. Stidham, Z. Gao, J. Gryak, and K. Najarian, “Motion-based camera localization system in colonoscopy videos,” Medical image analysis, vol. 73, p. 102180, 2021.
- A. R. Widya, Y. Monno, M. Okutomi, S. Suzuki, T. Gotoda, and K. Miki, “Learning-based depth and pose estimation for monocular endoscope with loss generalization,” in 2021 43rd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), pp. 3547–3552, IEEE, 2021.
- M. Turan, Y. Almalioglu, H. Araujo, E. Konukoglu, and M. Sitti, “Deep endovo: A recurrent convolutional neural network (rcnn) based visual odometry approach for endoscopic capsule robots,” Neurocomputing, vol. 275, pp. 1861–1870, 2018.
- K. B. Ozyoruk et al., “Endoslam dataset,” 2021. Data retrieved from Mendeley Data, https://data.mendeley.com/datasets/cd2rtzm23r/1 on 27.10.2021.
- R. Hartley and A. Zisserman, Multiple view geometry in computer vision. Cambridge university press, 2003.
- J.-L. Blanco, “A tutorial on se (3) transformation parameterizations and on-manifold optimization,” University of Malaga, Tech. Rep, vol. 3, p. 6, 2010.
- Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE transactions on image processing, vol. 13, no. 4, pp. 600–612, 2004.
- T. Zhou, M. Brown, N. Snavely, and D. G. Lowe, “Unsupervised learning of depth and ego-motion from video,” in Conference on computer vision and pattern recognition, pp. 1851–1858, 2017.
- C. B. Williams, “Insertion technique,” in Colonoscopy: principles and practice, pp. 537–559, Blackwell Publishing Ltd, Chichester, UK, 2009.
- D. K. Rex, “Missed neoplasms and optimal colonoscopic withdrawal technique,” in Colonoscopy principles and practice, pp. 339–350, Blackwell, Massachusetts, 2003.
- S. Mahendran, H. Ali, and R. Vidal, “A mixed classification-regression framework for 3d pose estimation from 2d images,” arXiv preprint arXiv:1805.03225, 2018.
- I. Rocco, M. Cimpoi, R. Arandjelović, A. Torii, T. Pajdla, and J. Sivic, “Neighbourhood consensus networks,” Advances in neural information processing systems, vol. 31, 2018.
- X. Ding, Y. Wang, L. Tang, Y. Jiao, and R. Xiong, “Improving the generalization of network based relative pose regression: dimension reduction as a regularizer,” arXiv preprint arXiv:2010.12796, 2020.
- S. Brahmbhatt, J. Gu, K. Kim, J. Hays, and J. Kautz, “Geometry-aware learning of maps for camera localization,” in Conference on Computer Vision and Pattern Recognition, pp. 2616–2625, 2018.
- A. Kendall and R. Cipolla, “Geometric loss functions for camera pose regression with deep learning,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5974–5983, 2017.
- M. O. Turkoglu, E. Brachmann, K. Schindler, G. J. Brostow, and A. Monszpart, “Visual camera re-localization using graph neural networks and relative pose supervision,” in 2021 International Conference on 3D Vision (3DV), pp. 145–155, IEEE, 2021.
- D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” International journal of computer vision, vol. 60, no. 2, pp. 91–110, 2004.
- “Opencv feature matching,” 2022. https://docs.opencv.org/3.4/dc/dc3/tutorial_py_matcher.html. Website accessed on 24.05.2022.
- E. Rublee, V. Rabaud, K. Konolige, and G. Bradski, “Orb: An efficient alternative to sift or surf,” in 2011 International conference on computer vision, pp. 2564–2571, Ieee, 2011.
- J. Sturm, N. Engelhard, F. Endres, W. Burgard, and D. Cremers, “A benchmark for the evaluation of rgb-d slam systems,” in 2012 IEEE/RSJ international conference on intelligent robots and systems, pp. 573–580, IEEE, 2012.
- P. Azagra, C. Sostres, Á. Ferrandez, L. Riazuelo, C. Tomasini, O. L. Barbed, J. Morlana, D. Recasens, V. M. Batlle, J. J. Gómez-Rodríguez, et al., “Endomapper dataset of complete calibrated endoscopy procedures,” arXiv preprint arXiv:2204.14240, 2022.