- The paper presents a novel VI-SLAM approach integrating probabilistic depth fusion with occupancy mapping to enhance localization and mapping accuracy.
- It leverages stereo and multi-view depth predictions with learned uncertainty within a visual-inertial factor graph for robust submapping.
- Evaluations on EuRoC and Hilti-Oxford benchmarks show reduced trajectory errors and improved mesh accuracy, setting a new performance standard.
Uncertainty-Aware Visual-Inertial SLAM with Volumetric Occupancy Mapping
Introduction
The paper presents a novel approach to visual-inertial simultaneous localization and mapping (VI-SLAM), integrating uncertainty-aware methods into a volumetric occupancy mapping framework. This work advances the state of the art by fusing depth predictions obtained from deep neural networks in a fully probabilistic manner, enhancing the precision of both localization and mapping. Specifically, the method employs depth and uncertainty predictions from a stereo rig, and combines them with motion stereo over various baselines, thus significantly improving mapping accuracy.
Methodology
Visual-Inertial Estimator
The proposed VI-SLAM system is built upon the OKVIS2 framework, which integrates visual and inertial measurements into a probabilistic factor graph optimization. Key innovations include:
- Depth Fusion and Uncertainty Management: Depth estimates from stereo and multi-view stereo (MVS) networks are probabilistically fused, considering their respective uncertainties. This probabilistic fusion ensures more accurate and reliable depth maps, which are crucial for constructing the volumetric occupancy grid.
- Occupancy Submapping: The system employs Supereight2 for occupancy submapping, using both static and dynamic stereo to update the volumetric map. The depth uncertainty, rigorously learned from neural networks, informs the integration of depth measurements into the submaps.
- Occupancy-to-Point Factors: These factors align the dense submaps in the global map by using the fused depth information and predicted uncertainties. This alignment is optimized via a nonlinear least squares estimator, ensuring globally consistent geometry.
Results
Numerical Performance
The method was tested on the EuRoC and Hilti-Oxford benchmark datasets, demonstrating superior accuracy in both localization and mapping:
- EuRoC Dataset: The proposed method achieved an average absolute trajectory error (ATE) RMSE of 0.041m in causal evaluation and 0.030m in non-causal evaluation, outperforming state-of-the-art systems like VINS-Fusion, DVI-SLAM, and ORB-SLAM3.
- Hilti-Oxford Dataset: The system ranked first among published methods in both causal and non-causal localizations, with a mean position error of 13.3cm and 6.1cm respectively.
Mapping Accuracy
In terms of reconstruction:
- The proposed method achieved an average mesh accuracy improvement from 0.144m to 0.057m compared to a baseline monocular-inertial set up.
- Completeness, defined as the fraction of ground-truth vertices within 0.2m of estimated vertices, increased from 50.71% to 55.40%.
Implications and Future Work
The approach significantly enhances the accuracy and usability of VI-SLAM in real-time applications, making it viable for tasks requiring high fidelity in mapped environments, such as autonomous navigation and robotic planning. By employing learned depth uncertainty, the method moves beyond the traditional constant disparity uncertainty models, allowing for more precise and reliable mapping.
Future work is oriented towards incorporating epistemic uncertainty into the depth models, which could further improve robustness. Additionally, integrating this VI-SLAM approach with real-time navigation and control systems could close the loop, making the system more adaptive and reliable in dynamic and uncertain environments.
Conclusion
This paper presents a sophisticated VI-SLAM system that seamlessly integrates uncertainty-aware depth fusion and volumetric occupancy mapping. The system demonstrates significant improvements in localization and mapping accuracy, setting a new benchmark in the field. These advancements pave the way for more reliable and precise autonomous systems capable of operating in complex environments with real-time constraints.