Advancing Self-supervised Monocular Depth Learning with Sparse LiDAR (2109.09628v4)

Published 20 Sep 2021 in cs.CV and cs.LG

Abstract: Self-supervised monocular depth prediction provides a cost-effective solution to obtain the 3D location of each pixel. However, the existing approaches usually lead to unsatisfactory accuracy, which is critical for autonomous robots. In this paper, we propose FusionDepth, a novel two-stage network to advance the self-supervised monocular dense depth learning by leveraging low-cost sparse (e.g. 4-beam) LiDAR. Unlike the existing methods that use sparse LiDAR mainly in a manner of time-consuming iterative post-processing, our model fuses monocular image features and sparse LiDAR features to predict initial depth maps. Then, an efficient feed-forward refine network is further designed to correct the errors in these initial depth maps in pseudo-3D space with real-time performance. Extensive experiments show that our proposed model significantly outperforms all the state-of-the-art self-supervised methods, as well as the sparse-LiDAR-based methods on both self-supervised monocular depth prediction and completion tasks. With the accurate dense depth prediction, our model outperforms the state-of-the-art sparse-LiDAR-based method (Pseudo-LiDAR++) by more than 68% for the downstream task monocular 3D object detection on the KITTI Leaderboard. Code is available at https://github.com/AutoAILab/FusionDepth

Citations (26)

View on Semantic Scholar

Summary

The paper proposes FusionDepth, which fuses 2D monocular image features with 3D sparse LiDAR data to produce accurate dense depth maps.
It introduces a two-stage model with a RefineNet that corrects initial depth maps using a pseudo-3D space framework, significantly reducing error metrics.
Enhanced depth prediction improves downstream tasks such as 3D object detection, achieving over a 68% accuracy improvement on the KITTI dataset.

An Evaluation of FusionDepth: Enhanced Monocular Depth Prediction via Sparse LiDAR Integration

The paper presents FusionDepth, a self-supervised framework that advances monocular depth learning by integrating sparse LiDAR data. This approach addresses the limitations of existing self-supervised monocular depth prediction methods, which struggle with dynamic environments and occlusions, by effectively leveraging low-cost and sparse (such as 4-beam) LiDAR data.

FusionDepth operates as a two-stage network. The first stage synthesizes image data and sparse LiDAR features to produce initial depth maps. In the subsequent refinement stage, the model introduces a RefineNet to correct errors in these initial maps using a pseudo-3D space framework. This dual approach improves both efficiency and accuracy for real-time applications, crucial in autonomous robot guidance.

Key Contributions

Feature Fusion: The model intricately combines 2D monocular image features with 3D sparse LiDAR points, allowing the system to generate more accurate dense depth maps. This fusion occurs at both the feature and prediction level, enabling the network to harness complementary information and compensate for data sparsity.
Pseudo Dense Representation (PDR): To mitigate the vast sparsity of LiDAR data, sparse LiDAR points are transformed into a pseudo dense format. This transformation facilitates more effective encoding within neural networks, improving feature extraction and subsequent depth prediction.
Improvements in Monocular 3D Object Detection: The enhanced depth prediction directly impacts downstream tasks, particularly monocular 3D object detection. The model significantly exceeds the performance of existing sparse-LiDAR-based methods such as Pseudo-LiDAR++, improving detection accuracy on the KITTI dataset by over 68%.

Empirical Evaluation

Extensive testing shows FusionDepth surpasses current benchmarks in self-supervised monocular depth prediction and completion tasks. Compared with methods reliant on sparse LiDAR, the model achieves state-of-the-art results across numerous evaluations. Specifically, the paper reports significant reductions in metrics such as absolute relative error (Abs Rel) and root mean square error (RMSE), confirming the enhanced accuracy of depth maps inferred by FusionDepth.

Theoretical and Practical Implications

FusionDepth demonstrates an optimized approach to monocular depth inference by utilizing sparse supplementary sensor data. Practically, it offers a feasible solution for integrating lower-cost LiDARs, which broadens the accessibility of such technologies in automated driving and broader robotic applications. The outcome is heightened applicability for autonomous systems in complex, dynamic environments.

From a theoretical perspective, the paper introduces a paradigm for incorporating multiple data modalities within self-supervised learning frameworks, highlighting the interdependencies and potential for performance enhancement within cross-domain fusion strategies.

Future Directions

The next wave of research could explore optimizing the balance between accuracy and computational demand, expanding FusionDepth's potential real-time capabilities. Future explorations might evaluate the robustness of these methodologies across more diverse environmental conditions, further advancing the deployment of low-cost sensor fusion systems in multifaceted application domains.

In summary, FusionDepth represents a noteworthy advancement in the field of self-supervised depth estimation, underscoring the importance of integration between emerging hardware (sparse LiDAR) and sophisticated learning models in achieving more refined, real-time visual perception capabilities.

PDF Markdown

Related Papers

GitHub

GitHub - AutoAILab/FusionDepth: Official implementation for paper "Advancing Self-supervised Monocular Depth Learning with Sparse LiDAR" (83 stars)