Self-supervised Sparse-to-Dense: Self-supervised Depth Completion from LiDAR and Monocular Camera (1807.00275v2)

Published 1 Jul 2018 in cs.CV, cs.AI, cs.LG, and cs.RO

Abstract: Depth completion, the technique of estimating a dense depth image from sparse depth measurements, has a variety of applications in robotics and autonomous driving. However, depth completion faces 3 main challenges: the irregularly spaced pattern in the sparse depth input, the difficulty in handling multiple sensor modalities (when color images are available), as well as the lack of dense, pixel-level ground truth depth labels. In this work, we address all these challenges. Specifically, we develop a deep regression model to learn a direct mapping from sparse depth (and color images) to dense depth. We also propose a self-supervised training framework that requires only sequences of color and sparse depth images, without the need for dense depth labels. Our experiments demonstrate that our network, when trained with semi-dense annotations, attains state-of-the- art accuracy and is the winning approach on the KITTI depth completion benchmark at the time of submission. Furthermore, the self-supervised framework outperforms a number of existing solutions trained with semi- dense annotations.

Citations (397)

View on Semantic Scholar

Summary

The paper presents a novel self-supervised training framework that eliminates the need for dense ground truth annotations.
It leverages an advanced encoder-decoder network with skip connections and residual blocks to achieve superior performance on the KITTI benchmark.
The method demonstrates improved accuracy in depth prediction from sparse LiDAR data, paving the way for scalable applications in robotics and automotive systems.

Self-Supervised Sparse-to-Dense Depth Completion

The paper "Self-Supervised Sparse-to-Dense: Self-Supervised Depth Completion from Lidar and Monocular Camera" addresses the pertinent challenge of depth completion within the context of robotics and autonomous driving. The task involves estimating a dense depth image from sparse LiDAR measurements, which is particularly valuable due to the prohibitive costs and limited availability of dense depth sensors.

Overview of the Contributions

This work makes significant contributions in two key areas: network architecture and training methodology. Firstly, the authors propose a deep regression model to map sparse depth data from LiDAR and optionally available color images to a dense depth image. This model achieves a noteworthy performance benchmark on the KITTI depth completion dataset, attaining higher accuracy than contemporaneous approaches. The core of their model is an encoder-decoder network architecture enhanced by the use of skip connections and residual blocks, facilitating improved prediction accuracy and sharper object boundaries in the depth outputs.

Secondly, the paper introduces a novel self-supervised training framework that eschews the need for dense pixel-level ground truth. Instead, it relies on easily obtainable sequences of sparse depth images and synchronized color images, thus offering a scalable solution to the training of depth completion networks. The framework utilizes sparse depth supervision together with photometric losses derived from transformations of the input data, achieving scale-accurate pose estimation via a model-based approach.

Evaluation and Results

Quantitative evaluation on the KITTI dataset reveals the proposed method outperforms existing techniques trained with semi-dense annotations, with a lower root-mean-square error (RMSE) and enhanced performance metrics (\mae, \irmse, \imae). The paper's supervised network shows marked improvement over previously reported models, indicating robustness in handling the sparse and irregular lidar information. Importantly, experiments reveal that the predictive accuracy follows a power function relative to the input data's sparsity, offering practical guidance in sensor deployment.

Implications and Future Directions

This research contributes toward a more efficient and accessible means of deploying depth completion frameworks in practical robotics and automotive systems. The self-supervised approach significantly reduces dependence on costly annotation processes, paving the way for broader applications. Moreover, the adaptability of the model to sparse input conditions enriches its usability across different hardware configurations.

The findings suggest multiple avenues for future research, notably in refining the self-supervised methodology for more dynamic environments or integrating advanced loss functions that account for occlusions and motion artifacts. Optimizing the use of supplementary RGB information during training through the self-supervised framework remains another promising area for exploration. Such advancements could further enhance the real-world applicability of depth completion models, reinforcing their relevance in evolving AI landscapes.

In sum, this paper exemplifies a strong methodological advancement in depth completion by marrying robust network design with an innovative training paradigm, holding significant implications for both theoretical exploration and practical implementations within AI-driven technologies.

Related Papers

YouTube

Show All Videos