Monocular Depth Prediction through Continuous 3D Loss (2003.09763v2)

Published 21 Mar 2020 in cs.CV

Abstract: This paper reports a new continuous 3D loss function for learning depth from monocular images. The dense depth prediction from a monocular image is supervised using sparse LIDAR points, which enables us to leverage available open source datasets with camera-LIDAR sensor suites during training. Currently, accurate and affordable range sensor is not readily available. Stereo cameras and LIDARs measure depth either inaccurately or sparsely/costly. In contrast to the current point-to-point loss evaluation approach, the proposed 3D loss treats point clouds as continuous objects; therefore, it compensates for the lack of dense ground truth depth due to LIDAR's sparsity measurements. We applied the proposed loss in three state-of-the-art monocular depth prediction approaches DORN, BTS, and Monodepth2. Experimental evaluation shows that the proposed loss improves the depth prediction accuracy and produces point-clouds with more consistent 3D geometric structures compared with all tested baselines, implying the benefit of the proposed loss on general depth prediction networks. A video demo of this work is available at https://youtu.be/5HL8BjSAY4Y.

Authors (7)

Minghan Zhu (16 papers)
Maani Ghaffari (70 papers)
Yuanxin Zhong (7 papers)
Pingping Lu (5 papers)
Zhong Cao (17 papers)
Ryan M. Eustice (18 papers)
Huei Peng (39 papers)

Citations (4)

View on Semantic Scholar

Summary

Monocular Depth Prediction through Continuous 3D Loss

The paper introduces a continuous 3D loss function to enhance monocular depth prediction using sparse LIDAR points as supervision. This research addresses the challenge of bridging the gap between dense image pixel predictions and the inherently sparse ground truth depth data obtainable from LIDAR sensors. In many practical scenarios, obtaining dense and accurate depth measurements is difficult due to the limitations and high costs of current depth sensing technologies, such as LIDAR and stereo cameras. The proposed 3D loss function is designed to improve the fidelity of monocular depth predictions, providing a viable alternative by transforming point clouds into continuous functions and aligning them within an inner product space.

Methodology

The proposed methodology diverges from traditional point-to-point loss functions by considering point clouds as continuous objects rather than discrete sets of points. This is achieved by utilizing a continuous 3D loss that leverages the structure of reproducing kernel Hilbert spaces (RKHS), a framework that allows the authors to transform discrete point clouds into continuous functions.

The construction of these functions originates from a point cloud by associating each 3D point with features embedded in a vector space. An exponential kernel is applied as part of the RKHS to measure similarities between two point clouds—one from the sparse LIDAR input and the other from the predicted depth data. By doing so, the function alignment in terms of both depth and surface normals can be enforced, thus enhancing the geometric consistency of the resulting depth predictions.

Experimental Evaluation

The experimental evaluation of the proposed loss function was conducted by integrating it into three existing state-of-the-art monocular depth prediction models: DORN, BTS, and Monodepth2. Each of these models was presented as a baseline, and the continuous 3D loss was added. Results demonstrated that the integration of the proposed loss function led to more accurate depth predictions and the production of point clouds with more consistent 3D geometric structures when compared to the original baselines.

Quantitative analysis using the KITTI dataset showed significant improvements in various metrics, such as absolute relative error, root mean square error, and threshold accuracy, demonstrating the viability of the continuous 3D loss function for enhancing monocular depth prediction. Qualitative results further exemplified the capability of the proposed approach to overcome challenges posed by surfaces that traditional point-based losses struggle with, such as transparent or reflective materials.

Implications and Future Work

The implications of this work are extensive in the domain of autonomous vehicle navigation and robotics. Accurate monocular depth prediction can significantly improve tasks that rely heavily on environmental understanding, such as obstacle avoidance and SLAM, without necessitating expensive sensor systems.

The proposed approach does not necessitate modifications to the model architectures, making it an attractive plugin module for various depth prediction frameworks. The authors suggest future explorations could include designing learning-based representations for the feature space to further improve the model's robustness and accuracy. Additionally, they suggest potential advancements in adapting these improved depth predictions for 3D object detection tasks, further enhancing the utility of monocular vision systems in real-world applications.

PDF Markdown

Related Papers

YouTube

Show All Videos