Uncertainty-Aware Visual-Inertial SLAM with Volumetric Occupancy Mapping (2409.12051v3)

Published 18 Sep 2024 in cs.RO

Abstract: We propose visual-inertial simultaneous localization and mapping that tightly couples sparse reprojection errors, inertial measurement unit pre-integrals, and relative pose factors with dense volumetric occupancy mapping. Hereby depth predictions from a deep neural network are fused in a fully probabilistic manner. Specifically, our method is rigorously uncertainty-aware: first, we use depth and uncertainty predictions from a deep network not only from the robot's stereo rig, but we further probabilistically fuse motion stereo that provides depth information across a range of baselines, therefore drastically increasing mapping accuracy. Next, predicted and fused depth uncertainty propagates not only into occupancy probabilities but also into alignment factors between generated dense submaps that enter the probabilistic nonlinear least squares estimator. This submap representation offers globally consistent geometry at scale. Our method is thoroughly evaluated in two benchmark datasets, resulting in localization and mapping accuracy that exceeds the state of the art, while simultaneously offering volumetric occupancy directly usable for downstream robotic planning and control in real-time.

Summary

The paper presents a novel VI-SLAM approach integrating probabilistic depth fusion with occupancy mapping to enhance localization and mapping accuracy.
It leverages stereo and multi-view depth predictions with learned uncertainty within a visual-inertial factor graph for robust submapping.
Evaluations on EuRoC and Hilti-Oxford benchmarks show reduced trajectory errors and improved mesh accuracy, setting a new performance standard.

Uncertainty-Aware Visual-Inertial SLAM with Volumetric Occupancy Mapping

Introduction

The paper presents a novel approach to visual-inertial simultaneous localization and mapping (VI-SLAM), integrating uncertainty-aware methods into a volumetric occupancy mapping framework. This work advances the state of the art by fusing depth predictions obtained from deep neural networks in a fully probabilistic manner, enhancing the precision of both localization and mapping. Specifically, the method employs depth and uncertainty predictions from a stereo rig, and combines them with motion stereo over various baselines, thus significantly improving mapping accuracy.

Methodology

Visual-Inertial Estimator

The proposed VI-SLAM system is built upon the OKVIS2 framework, which integrates visual and inertial measurements into a probabilistic factor graph optimization. Key innovations include:

Depth Fusion and Uncertainty Management: Depth estimates from stereo and multi-view stereo (MVS) networks are probabilistically fused, considering their respective uncertainties. This probabilistic fusion ensures more accurate and reliable depth maps, which are crucial for constructing the volumetric occupancy grid.
Occupancy Submapping: The system employs Supereight2 for occupancy submapping, using both static and dynamic stereo to update the volumetric map. The depth uncertainty, rigorously learned from neural networks, informs the integration of depth measurements into the submaps.
Occupancy-to-Point Factors: These factors align the dense submaps in the global map by using the fused depth information and predicted uncertainties. This alignment is optimized via a nonlinear least squares estimator, ensuring globally consistent geometry.

Results

Numerical Performance

The method was tested on the EuRoC and Hilti-Oxford benchmark datasets, demonstrating superior accuracy in both localization and mapping:

EuRoC Dataset: The proposed method achieved an average absolute trajectory error (ATE) RMSE of 0.041m in causal evaluation and 0.030m in non-causal evaluation, outperforming state-of-the-art systems like VINS-Fusion, DVI-SLAM, and ORB-SLAM3.
Hilti-Oxford Dataset: The system ranked first among published methods in both causal and non-causal localizations, with a mean position error of 13.3cm and 6.1cm respectively.

Mapping Accuracy

In terms of reconstruction:

The proposed method achieved an average mesh accuracy improvement from 0.144m to 0.057m compared to a baseline monocular-inertial set up.
Completeness, defined as the fraction of ground-truth vertices within 0.2m of estimated vertices, increased from 50.71% to 55.40%.

Implications and Future Work

The approach significantly enhances the accuracy and usability of VI-SLAM in real-time applications, making it viable for tasks requiring high fidelity in mapped environments, such as autonomous navigation and robotic planning. By employing learned depth uncertainty, the method moves beyond the traditional constant disparity uncertainty models, allowing for more precise and reliable mapping.

Future work is oriented towards incorporating epistemic uncertainty into the depth models, which could further improve robustness. Additionally, integrating this VI-SLAM approach with real-time navigation and control systems could close the loop, making the system more adaptive and reliable in dynamic and uncertain environments.

Conclusion

This paper presents a sophisticated VI-SLAM system that seamlessly integrates uncertainty-aware depth fusion and volumetric occupancy mapping. The system demonstrates significant improvements in localization and mapping accuracy, setting a new benchmark in the field. These advancements pave the way for more reliable and precise autonomous systems capable of operating in complex environments with real-time constraints.

PDF Markdown

Related Papers

Tweets

https://twitter.com/zhenjun_zhao/status/1836600666070093940

https://twitter.com/OWW/status/1836962593279201455

YouTube

Show All Videos