Deep RGB-D Canonical Correlation Analysis For Sparse Depth Completion (1906.08967v3)

Published 21 Jun 2019 in cs.CV

Abstract: In this paper, we propose our Correlation For Completion Network (CFCNet), an end-to-end deep learning model that uses the correlation between two data sources to perform sparse depth completion. CFCNet learns to capture, to the largest extent, the semantically correlated features between RGB and depth information. Through pairs of image pixels and the visible measurements in a sparse depth map, CFCNet facilitates feature-level mutual transformation of different data sources. Such a transformation enables CFCNet to predict features and reconstruct data of missing depth measurements according to their corresponding, transformed RGB features. We extend canonical correlation analysis to a 2D domain and formulate it as one of our training objectives (i.e. 2d deep canonical correlation, or "2D2CCA loss"). Extensive experiments validate the ability and flexibility of our CFCNet compared to the state-of-the-art methods on both indoor and outdoor scenes with different real-life sparse patterns. Codes are available at: https://github.com/choyingw/CFCNet.

Citations (37)

View on Semantic Scholar

Summary

The paper introduces CFCNet, a novel deep learning model utilizing 2D$^2$CCA to enhance sparse depth completion by effectively correlating RGB and depth data.
The proposed 2D$^2$CCA adapts canonical correlation analysis to 2D data, enabling robust feature extraction and improved performance on datasets like KITTI and NYUv2, especially in high-sparsity conditions.
This framework demonstrates competitive performance across datasets and metrics (RMSE, MAE), opening avenues for multimodal data fusion in computer vision and applications like autonomous driving.

Deep RGB-D Canonical Correlation Analysis For Sparse Depth Completion

The paper introduces a novel approach to the task of sparse depth completion through a deep learning model named Correlation For Completion Network (CFCNet). Sparse depth completion involves predicting a complete depth map from sparse input data, which has significant implications in computer vision applications, such as robotics, autonomous driving, and augmented reality. This work seeks to optimize this process by leveraging the correlation between RGB and sparse depth data sources.

Summary of Approach

CFCNet employs a unique method that focuses on the transformation and utilization of semantically correlated features from RGB and depth information. The model is constructed around the premise that a dense, completed depth map can be derived from a sparse depth input and its corresponding RGB image. By extending canonical correlation analysis (CCA) into the two-dimensional space (referred to as 2D $^2$ CCA in the paper), the authors aim to capture and harness the mutual relationships between RGB and depth modalities more effectively.

The proposed 2D $^2$ CCA is pivotal for feature extraction, as it overcomes the limitations of traditional CCA when handling high-dimensional data with small sample sizes. This extension allows CFCNet to achieve robust learning of non-linear projections, maximizing the correlation between RGB and depth feature representations. The adaptation of CCA to work in two dimensions within a deep learning framework is a distinctive aspect of this research, offering advancements in the integration of heterogeneous data sources.

Experimental Validation

Extensive experiments conducted on prominent datasets, including KITTI, Cityscape, NYUv2, and SLAM RGBD, validate the efficacy of CFCNet. The results consistently demonstrate that the model achieves competitive performance with state-of-the-art methods across various conditions and datasets. Notably, the network shows substantial improvement in high-sparsity scenarios, a critical aspect for real-world applications where data collection may be limited or obstructed.

The paper provides numerical results underpinning the performance of CFCNet via metrics such as root mean square error (RMSE) and mean absolute error (MAE). The model also shows superior accuracy in terms of $\delta$ metrics, which denote the percentage of predicted pixels within a certain error threshold relative to ground truth.

Implications and Future Directions

CFCNet's innovative integration of deep learning with canonical correlation analysis opens new avenues for multimodal data fusion in computer vision. This framework not only improves depth completion capabilities but also signifies the potential to enhance other tasks that involve merging disparate sources of information.

Practically, the adoption of 2D $^2$ CCA and the composition of dense depth maps from sparse inputs can facilitate credible progress in fields that utilize depth sensing technology. The theoretical contribution of 2D $^2$ CCA invites further research into more advanced models that could extend the scope of canonical correlation analysis even further within the AI domain.

Future developments may involve exploring the adaptability of similar methodologies to other types of sensor data or refining the loss functions to elevate the predictive and interpretative power of the models being deployed. Moreover, there is potential for exploring hybrid approaches that combine the strengths of CFCNet with other cutting-edge depth completion techniques to push the boundaries of what's achievable in computer vision tasks.

PDF Markdown

Related Papers

GitHub

GitHub - choyingw/CFCNet: NeurIPS 2019: Deep RGB-D Canonical Correlation Analysis For Sparse Depth Completion (37 stars)

YouTube

Show All Videos