A Hybrid Sparse-Dense Monocular SLAM System for Autonomous Driving (2108.07736v1)

Published 17 Aug 2021 in cs.CV

Abstract: In this paper, we present a system for incrementally reconstructing a dense 3D model of the geometry of an outdoor environment using a single monocular camera attached to a moving vehicle. Dense models provide a rich representation of the environment facilitating higher-level scene understanding, perception, and planning. Our system employs dense depth prediction with a hybrid mapping architecture combining state-of-the-art sparse features and dense fusion-based visual SLAM algorithms within an integrated framework. Our novel contributions include design of hybrid sparse-dense camera tracking and loop closure, and scale estimation improvements in dense depth prediction. We use the motion estimates from the sparse method to overcome the large and variable inter-frame displacement typical of outdoor vehicle scenarios. Our system then registers the live image with the dense model using whole-image alignment. This enables the fusion of the live frame and dense depth prediction into the model. Global consistency and alignment between the sparse and dense models are achieved by applying pose constraints from the sparse method directly within the deformation of the dense model. We provide qualitative and quantitative results for both trajectory estimation and surface reconstruction accuracy, demonstrating competitive performance on the KITTI dataset. Qualitative results of the proposed approach are illustrated in https://youtu.be/Pn2uaVqjskY. Source code for the project is publicly available at the following repository https://github.com/robotvisionmu/DenseMonoSLAM.

Citations (15)

View on Semantic Scholar

Summary

The paper introduces a hybrid sparse-dense monocular SLAM system combining sparse feature tracking with dense depth prediction for accurate 3D reconstruction in outdoor environments.
The system features enhanced depth prediction and scale estimation to handle large vehicle movements, achieving competitive trajectory estimation accuracy on the KITTI benchmark.
This approach provides dense 3D reconstruction from a single camera, enabling cost-effective autonomous driving applications with near real-time processing capabilities.

Summary of "A Hybrid Sparse-Dense Monocular SLAM System for Autonomous Driving"

This paper presents an integration of monocular visual Simultaneous Localization and Mapping (SLAM) systems to achieve dense 3D reconstruction in outdoor environments, specifically tailored for autonomous driving. The system innovatively aligns dense depth prediction techniques with sparse feature tracking in a hybrid architecture that leverages the strengths of both methodologies.

The principal contributions lie in combining a state-of-the-art sparse SLAM approach, ORB-SLAM3, with dense monocular depth prediction networks and surfel-based fusion systems similar to ElasticFusion. The authors introduce novel enhancements, such as improved camera tracking and scale estimation, which allow their method to effectively handle the large and variable inter-frame displacement inherent in vehicular mobility. By using a single RGB camera for depth prediction and employing structure-from-motion concepts, the system is capable of reconstructing metric space models with high fidelity and accuracy.

The numerical results reported in this paper demonstrate competitive performance on the KITTI benchmark dataset. The proposed system achieves a notable accuracy in trajectory estimation, showing relative translational errors that compare favorably with existing state-of-the-art visual odometry frameworks like ORB-SLAM2 and D3VO. The authors provided substantial qualitative results, reinforcing the system's robustness in tracking and mapping outdoors, where traditional RGB-D sensors often struggle.

Key Features and Results:

Sparse-Dense Coupling: The approach facilitates consistent dense surface reconstructions synchronized with sparse trajectory estimates. By loosely coupling sparse pose constraints with dense reconstruction nodes, the system ensures global consistency and alignment.
Enhancements in Depth Prediction: By employing improved self-supervised losses and scale correction methods, depth predictions closely approximate metric depths, serving as pivotal inputs for dense fusion.
System Flexibility and Performance: Significant computational efficiencies are reported, with latency adequate for near real-time processing at frequencies just below 10 Hz on standard hardware.

Implications and Future Directions:

The implementation and validation of this SLAM system exemplify a crucial advancement toward fully exploiting monocular vision setups in autonomous driving contexts. The dense depth recovery from monocular inputs closes the gap towards LiDAR-level detail without the associated cost and weight penalties, marking a shift towards more ubiquitous use of cameras in autonomous systems.

Future research may focus on enhancing the robustness of such hybrid systems under varying conditions like diverse lighting or rapid environmental changes. Further advancements could explore machine-learning-driven adaptive systems that dynamically optimize parameterization tailored to specific vehicular dynamics or environments.

Overall, this paper offers valuable insights and a significant step in evolving dense monocular SLAM solutions for practical and scalable autonomous driving applications.

Related Papers

GitHub

GitHub - robotvisionmu/DenseMonoSLAM: Dense monocular SLAM system suitable for outdoor operation and fast camera motion (47 stars)

YouTube

Show All Videos