Occlusion-Aware Depth Estimation with Adaptive Normal Constraints (2004.00845v4)

Published 2 Apr 2020 in cs.CV

Abstract: We present a new learning-based method for multi-frame depth estimation from a color video, which is a fundamental problem in scene understanding, robot navigation or handheld 3D reconstruction. While recent learning-based methods estimate depth at high accuracy, 3D point clouds exported from their depth maps often fail to preserve important geometric feature (e.g., corners, edges, planes) of man-made scenes. Widely-used pixel-wise depth errors do not specifically penalize inconsistency on these features. These inaccuracies are particularly severe when subsequent depth reconstructions are accumulated in an attempt to scan a full environment with man-made objects with this kind of features. Our depth estimation algorithm therefore introduces a Combined Normal Map (CNM) constraint, which is designed to better preserve high-curvature features and global planar regions. In order to further improve the depth estimation accuracy, we introduce a new occlusion-aware strategy that aggregates initial depth predictions from multiple adjacent views into one final depth map and one occlusion probability map for the current reference view. Our method outperforms the state-of-the-art in terms of depth estimation accuracy, and preserves essential geometric features of man-made indoor scenes much better than other algorithms.

Citations (58)

View on Semantic Scholar

Summary

The paper introduces a novel method for multi-frame depth estimation using adaptive normal constraints and occlusion-aware aggregation.
The Combined Normal Map (CNM) constraint uses a differentiable module to combine local and global normal information, preserving features like edges and planes.
The occlusion-aware aggregation strategy refines multi-view depth maps and uses weighted loss for improved accuracy, achieving state-of-the-art results on standard datasets.

Occlusion-Aware Depth Estimation with Adaptive Normal Constraints

The presented paper introduces a novel approach for multi-frame depth estimation from color video, enhancing the preservation of geometric features such as corners, edges, and planes in 3D point clouds derived from depth maps. The primary contribution lies in the introduction of a Combined Normal Map (CNM) constraint and an occlusion-aware strategy for multi-view depth estimation that aggregates depth predictions from multiple views. Specifically, these advancements address the limitations seen in conventional learning-based methods, which often neglect significant geometric inconsistencies owing to pixel-wise depth errors.

Methodology Overview

Combined Normal Map Constraint: The CNM constraint is designed to enforce the preservation of high-curvature features and global planar regions within the depth estimation process. Unlike existing methods that predominantly use either local surface normals or virtual normal constraints, the CNM combines these approaches. This method utilizes a differentiable least squares module for normal computation, improving the fidelity of depth estimations by focusing on global structural preservation and local feature accuracy simultaneously.
Occlusion-Aware Depth Aggregation: Integrating depth predictions from adjacent views, this strategy introduces a mechanism for refining depth maps and generating an occlusion probability map for the reference view. The network employs an occlusion-aware loss, assigning differential weighting to occluded versus non-occluded regions, thereby enhancing depth accuracy in non-occluded areas without needing explicit occlusion ground truth.

Numerical Results

The experimental results indicate that the proposed method surpasses state-of-the-art alternatives in depth estimation accuracy across several metrics on datasets such as 7Scenes and SUN3D. For instance, the paper reports improved scale invariant error and relative RMSE over existing methods, demonstrating superior capability in maintaining geometric consistency in reconstructed scenes.

Implications and Future Directions

The practical implications of the research are extensive, especially in fields such as augmented reality (AR), robot navigation, and autonomous systems where high-fidelity depth estimation is imperative. The occlusion-aware approach paves the way for more robust depth estimation in dynamic environments with complex occlusions. The CNM constraint can potentially guide future advancements in depth estimation by offering a blueprint for combining local and global constraints effectively.

Future studies may explore enhancements in CNM generation, potentially integrating machine learning-based segmentation protocols to identify planar regions with higher accuracy. Additionally, the integration of end-to-end systems for 3D modeling directly from video input without relying on additional fusion steps could further streamline processes in real-time applications.

In conclusion, the paper significantly contributes to the depth estimation domain, setting a foundation for future research to build upon these advanced concepts in occlusion handling and normal constraints. While challenges remain, particularly in segmentation precision and computational efficiency, the presented methodologies offer a promising pathway to overcoming existing limitations in depth reconstruction from video.

Related Papers

YouTube

Show All Videos