Salient Sparse Visual Odometry With Pose-Only Supervision (2404.04677v1)
Abstract: Visual Odometry (VO) is vital for the navigation of autonomous systems, providing accurate position and orientation estimates at reasonable costs. While traditional VO methods excel in some conditions, they struggle with challenges like variable lighting and motion blur. Deep learning-based VO, though more adaptable, can face generalization problems in new environments. Addressing these drawbacks, this paper presents a novel hybrid visual odometry (VO) framework that leverages pose-only supervision, offering a balanced solution between robustness and the need for extensive labeling. We propose two cost-effective and innovative designs: a self-supervised homographic pre-training for enhancing optical flow learning from pose-only labels and a random patch-based salient point detection strategy for more accurate optical flow patch extraction. These designs eliminate the need for dense optical flow labels for training and significantly improve the generalization capability of the system in diverse and challenging environments. Our pose-only supervised method achieves competitive performance on standard datasets and greater robustness and generalization ability in extreme and unseen scenarios, even compared to dense optical flow-supervised state-of-the-art methods.
- J. Fuentes-Pacheco, J. Ruiz-Ascencio, and J. M. Rendón-Mancha, “Visual simultaneous localization and mapping: a survey,” AIJ, vol. 43, pp. 55–81, 2015.
- J. Engel, V. Koltun, and D. Cremers, “Direct sparse odometry,” IEEE TPAMI, vol. 40, no. 3, pp. 611–625, 2017.
- C. Campos, R. Elvira, J. J. G. Rodríguez, J. M. Montiel, and J. D. Tardós, “Orb-slam3: An accurate open-source library for visual, visual–inertial, and multimap slam,” IEEE TRO, vol. 37, no. 6, pp. 1874–1890, 2021.
- C. Forster, M. Pizzoli, and D. Scaramuzza, “Svo: Fast semi-direct monocular visual odometry,” in ICRA, pp. 15–22, IEEE, 2014.
- J. P. Company-Corcoles, E. Garcia-Fidalgo, and A. Ortiz, “Msc-vo: Exploiting manhattan and structural constraints for visual odometry,” IEEE RAL, vol. 7, no. 2, pp. 2803–2810, 2022.
- H. Lim, J. Jeon, and H. Myung, “Uv-slam: Unconstrained line-based slam using vanishing points for structural mapping,” IEEE RAL, vol. 7, no. 2, pp. 1518–1525, 2022.
- R. Mur-Artal and J. D. Tardós, “Visual-inertial monocular slam with map reuse,” IEEE RAL, vol. 2, no. 2, pp. 796–803, 2017.
- S. Wang, R. Clark, H. Wen, and N. Trigoni, “Deepvo: Towards end-to-end visual odometry with deep recurrent convolutional neural networks,” in ICRA, pp. 2043–2050, IEEE, 2017.
- W. Wang, Y. Hu, and S. Scherer, “Tartanvo: A generalizable learning-based vo,” in CoRL, pp. 1761–1772, PMLR, 2021.
- F. Xue, X. Wang, J. Wang, and H. Zha, “Deep visual odometry with adaptive memory,” IEEE TPAMI, vol. 44, no. 2, pp. 940–954, 2020.
- B. Wagstaff, V. Peretroukhin, and J. Kelly, “Self-supervised deep pose corrections for robust visual odometry,” in ICRA, pp. 2331–2337, IEEE, 2020.
- R. Li, S. Wang, Z. Long, and D. Gu, “Undeepvo: Monocular visual odometry through unsupervised deep learning,” in ICRA, pp. 7286–7291, IEEE, 2018.
- T. Zhang, N. Li, G. Gong, C. Yang, G. Hou, and X. Lin, “Ccvo: Cascaded cnns for fast monocular visual odometry towards the dynamic environment,” IEEE RAL, vol. 8, no. 5, pp. 2938–2945, 2022.
- H. Zhan, C. S. Weerasekera, J.-W. Bian, and I. Reid, “Visual odometry revisited: What should be learnt?,” in ICRA, pp. 4203–4210, IEEE, 2020.
- Z. Teed and J. Deng, “DROID-SLAM: Deep Visual SLAM for Monocular, Stereo, and RGB-D Cameras,” NeurIPS, 2021.
- Z. Teed, L. Lipson, and J. Deng, “Deep patch visual odometry,” arXiv preprint arXiv:2208.04726, 2022.
- C. Tang and P. Tan, “Ba-net: Dense bundle adjustment network,” arXiv preprint arXiv:1806.04807, 2018.
- M. Menze, C. Heipke, and A. Geiger, “Object scene flow,” ISPRS P&RS, vol. 140, pp. 60–76, 2018.
- T.-M. Nguyen, S. Yuan, M. Cao, Y. Lyu, T. H. Nguyen, and L. Xie, “Ntu viral: A visual-inertial-ranging-lidar dataset, from an aerial vehicle viewpoint,” IJRR, vol. 41, no. 3, pp. 270–280, 2022.
- N. Thien-Minh, Y. Shenghai, N. Thien Hoang, Y. Pengyu, C. Haozhi, X. Lihua, W. Maciej, J. Patric, T. Marko, Z. Justin, and B. Noel, “Mcd: Diverse large-scale multi-campus dataset for robot perception,” in CVPR, 2024.
- Z. Liu, E. Malis, and P. Martinet, “A new dense hybrid stereo visual odometry approach,” in IROS, pp. 6998–7003, IEEE, 2022.
- A. De Maio and S. Lacroix, “Simultaneously learning corrections and error models for geometry-based visual odometry methods,” IEEE RAL, vol. 5, no. 4, pp. 6536–6543, 2020.
- W. Ye, X. Yu, X. Lan, Y. Ming, J. Li, H. Bao, Z. Cui, and G. Zhang, “Deflowslam: Self-supervised scene motion decomposition for dynamic dense slam,” arXiv preprint arXiv:2207.08794, 2022.
- P.-E. Sarlin, A. Unagar, M. Larsson, H. Germain, C. Toft, V. Larsson, M. Pollefeys, V. Lepetit, L. Hammarstrand, F. Kahl, et al., “Back to the feature: Learning robust camera localization from pixels to pose,” in CVPR, pp. 3247–3257, 2021.
- T. Fu, S. Su, and C. Wang, “islam: Imperative slam,” arXiv preprint arXiv:2306.07894, 2023.
- E. Rosten and T. Drummond, “Machine learning for high-speed corner detection,” in ECCV, pp. 430–443, Springer, 2006.
- M. Dusmanu, I. Rocco, T. Pajdla, M. Pollefeys, J. Sivic, A. Torii, and T. Sattler, “D2-Net: A Trainable CNN for Joint Detection and Description of Local Features,” in CVPR, 2019.
- D. DeTone, T. Malisiewicz, and A. Rabinovich, “Superpoint: Self-supervised interest point detection and description,” in CVPRW, pp. 224–236, 2018.
- P.-E. Sarlin, D. DeTone, T. Malisiewicz, and A. Rabinovich, “Superglue: Learning feature matching with graph neural networks,” in CVPR, pp. 4938–4947, 2020.
- P. Liu, M. Lyu, I. King, and J. Xu, “Selflow: Self-supervised learning of optical flow,” in CVPR, pp. 4571–4580, 2019.
- S. Dong, S. Wang, D. Barath, J. Kannala, M. Pollefeys, and B. Chen, “Keypoint matching via random network consensus,” 2023.
- Q. Dong, C. Cao, and Y. Fu, “Rethinking optical flow from geometric matching consistent perspective,” in CVPR, 2023.
- H. Germain, G. Bourmaud, and V. Lepetit, “S2dnet: Learning image features for accurate sparse-to-dense matching,” in ECCV, 2020.
- S. Umeyama, “Least-squares estimation of transformation parameters between two point patterns,” IEEE TPAMI, vol. 13, no. 04, pp. 376–380, 1991.
- J. L. Schonberger and J.-M. Frahm, “Structure-from-motion revisited,” in CVPR, pp. 4104–4113, 2016.
- Z. Li and N. Snavely, “Megadepth: Learning single-view depth prediction from internet photos,” in CVPR, pp. 2041–2050, 2018.
- S. Nathan, H. Derek, K. Pushmeet, and R. Fergus, “Indoor segmentation and support inference from rgbd images,” in ECCV, 2012.
- W. Wang, D. Zhu, X. Wang, Y. Hu, Y. Qiu, C. Wang, Y. Hu, A. Kapoor, and S. Scherer, “Tartanair: A dataset to push the limits of visual slam,” in IROS, pp. 4909–4916, IEEE, 2020.
- M. Grupp, “evo: Python package for the evaluation of odometry and slam..” https://github.com/MichaelGrupp/evo, 2017.
- M. Burri, J. Nikolic, P. Gohl, T. Schneider, J. Rehder, S. Omari, M. W. Achtelik, and R. Siegwart, “The euroc micro aerial vehicle datasets,” IJRR, vol. 35, no. 10, pp. 1157–1163, 2016.
- J. Sturm, N. Engelhard, F. Endres, W. Burgard, and D. Cremers, “A benchmark for the evaluation of rgb-d slam systems,” in IROS, pp. 573–580, IEEE, 2012.
- M. Kasper, S. McGuire, and C. Heckman, “A benchmark for visual-inertial odometry systems employing onboard illumination,” in IROS, pp. 5256–5263, IEEE, 2019.
- N. Yang, L. v. Stumberg, R. Wang, and D. Cremers, “D3vo: Deep depth, deep pose and deep uncertainty for monocular visual odometry,” in CVPR, pp. 1281–1292, 2020.
- L. Yu, E. Yang, and B. Yang, “Afe-orb-slam: Robust monocular vslam based on adaptive fast threshold and image enhancement for complex lighting environments,” JINT, vol. 105, no. 2, p. 26, 2022.
- Siyu Chen (105 papers)
- Kangcheng Liu (21 papers)
- Chen Wang (600 papers)
- Shenghai Yuan (92 papers)
- Jianfei Yang (78 papers)
- Lihua Xie (212 papers)