DBA-Fusion: Tightly Integrating Deep Dense Visual Bundle Adjustment with Multiple Sensors for Large-Scale Localization and Mapping (2403.13714v1)
Abstract: Visual simultaneous localization and mapping (VSLAM) has broad applications, with state-of-the-art methods leveraging deep neural networks for better robustness and applicability. However, there is a lack of research in fusing these learning-based methods with multi-sensor information, which could be indispensable to push related applications to large-scale and complex scenarios. In this paper, we tightly integrate the trainable deep dense bundle adjustment (DBA) with multi-sensor information through a factor graph. In the framework, recurrent optical flow and DBA are performed among sequential images. The Hessian information derived from DBA is fed into a generic factor graph for multi-sensor fusion, which employs a sliding window and supports probabilistic marginalization. A pipeline for visual-inertial integration is firstly developed, which provides the minimum ability of metric-scale localization and mapping. Furthermore, other sensors (e.g., global navigation satellite system) are integrated for driftless and geo-referencing functionality. Extensive tests are conducted on both public datasets and self-collected datasets. The results validate the superior localization performance of our approach, which enables real-time dense mapping in large-scale environments. The code has been made open-source (https://github.com/GREAT-WHU/DBA-Fusion).
- K. Wang, S. Ma, J. Chen, F. Ren and J. Lu, “Approaches, Challenges, and Applications for Deep Visual Odometry: Toward Complicated and Emerging Areas,” IEEE Trans. Cogn. Develop. Syst., vol. 14, no. 1, pp. 35-49, March 2022.
- J. Czarnowski, T. Laidlow, R. Clark and A. J. Davison, “DeepFactors: Real-Time Probabilistic Dense Monocular SLAM,” IEEE Robot. Automat. Lett., vol. 5, no. 2, pp. 721-728, 2020.
- Z. Teed and J. Deng. “Droid-slam: Deep visual slam for monocular, stereo, and rgb-d cameras,” in NeurIPS 2021, pp. 16558-16569.
- T. Qin, P. Li, and S. Shen, “Vins-mono: A robust and versatile monocular visual-inertial state estimator,” IEEE Trans. Robot., vol. 34, no. 4, pp. 1004-1020, 2018.
- K. Wu, C. Guo, G. Georgiou, and S. I. Roumeliotis. “Vins on wheels,” in ICRA 2017, pp. 5155-5162.
- S. Yang and S. Scherer, “Cubeslam: Monocular 3-d object slam,” IEEE Trans. Robot., vol. 35, no. 4, pp. 925-938, 2019.
- J. Zhang, M. Henein, R. Mahony, and V. Ila, “VDO-SLAM: a visual dynamic object-aware SLAM system,” arXiv:2005.11052, 2020.
- S. Wang, R. Clark, H. Wen, and N. Trigoni, “Deepvo: Towards end-to-end visual odometry with deep recurrent convolutional neural networks,” in ICRA 2017, pp. 2043-2050..
- J. Jiao, J. Jiao, Y. Mo, W. Liu and Z. Deng, “MagicVO: An End-to-End Hybrid CNN and Bi-LSTM Method for Monocular Visual Odometry,” IEEE Access, vol. 7, pp. 94118-94127, 2019.
- H. Wang, J. Wang, and L. Agapito, “Co-SLAM: Joint Coordinate and Sparse Parametric Encodings for Neural Real-Time SLAM,” in CVPR 2023, pp. 13293-13302. 2023.
- A. I. Mourikis and S. I. Roumeliotis, “A Multi-State Constraint Kalman Filter for Vision-aided Inertial Navigation,” in ICRA 2007, pp. 3565-3572.
- S. Leutenegger, S. Lynen, M. Bosse, R. Siegwart, P. Furgale, “Keyframe-based visual-inertial odometry using nonlinear optimization,” International J. Robot. Research, vol. 34, no. 3, pp. 314-334, 2014.
- M. Li and A. I. Mourikis, “Improving the accuracy of EKF-based visual-inertial odometry,” in ICRA 2012, pp. 828-835.
- L. Stumberg, V. Usenko, and D. Cremers, “Direct sparse visual-inertial odometry using dynamic marginalization,” in ICRA 2018, pp. 2510-2517.
- C. Campos, R. Elvira, J. J. G. Rodríguez, J. M. M. Montiel and J. D. Tardós, “ORB-SLAM3: An Accurate Open-Source Library for Visual, Visual-Inertial, and Multimap SLAM,” IEEE Trans. Robot., vol. 37, no. 6, pp. 1874-1890, Dec. 2021.
- L. Stumberg and D. Cremers, “Dm-vio: Delayed marginalization visual-inertial odometry,” IEEE Robot. and Automat. Lett., vol. 7, no. 2, pp. 1408-1415, 2022.
- W. Zhang, S. Wang, X. Dong, R. Guo, and N. Haala, “BAMF-SLAM: Bundle Adjusted Multi-Fisheye Visual-Inertial SLAM Using Recurrent Field Transforms,” arXiv:2306.01173 , 2023.
- M. Zhang, X. Zuo, Y. Chen, Y. Liu and M. Li, “Pose Estimation for Ground Robots: On Manifold Representation, Integration, Reparameterization, and Optimization,” IEEE Trans. Robot., vol. 37, no. 4, pp. 1081-1099, Aug. 2021.
- T. Qin, S. Cao, J. Pan, and S. Shen, “A general optimization-based framework for global pose estimation with multiple sensors,” arXiv:1901.03642, 2019.
- Z. Teed and J. Deng, “Raft: Recurrent all-pairs field transforms for optical flow,” in ECCV 2020, pp. 402-419.
- C. Forster, L. Carlone, F. Dellaert and D. Scaramuzza, “On-manifold preintegration for real-time visual–inertial odometry,” IEEE Trans. Robot., vol. 33, no. 1, pp. 1-21, Feb. 2017.
- Y. Liao, J. Xie, and A. Geiger, “KITTI-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 45, no. 3, pp. 3292-3310, 2022.
- M. Grupp, “evo: Python package for the evaluation of odometry and SLAM,” url: https://github.com/MichaelGrupp/evo, 2017.
- A. Geiger, P. Lenz and R. Urtasun, “Are we ready for autonomous driving? The KITTI vision benchmark suite,” in CVPR 2012, pp. 3354-3361.