Geometry-Aware Symmetric Domain Adaptation for Monocular Depth Estimation (1904.01870v1)

Published 3 Apr 2019 in cs.CV

Abstract: Supervised depth estimation has achieved high accuracy due to the advanced deep network architectures. Since the groundtruth depth labels are hard to obtain, recent methods try to learn depth estimation networks in an unsupervised way by exploring unsupervised cues, which are effective but less reliable than true labels. An emerging way to resolve this dilemma is to transfer knowledge from synthetic images with ground truth depth via domain adaptation techniques. However, these approaches overlook specific geometric structure of the natural images in the target domain (i.e., real data), which is important for high-performing depth prediction. Motivated by the observation, we propose a geometry-aware symmetric domain adaptation framework (GASDA) to explore the labels in the synthetic data and epipolar geometry in the real data jointly. Moreover, by training two image style translators and depth estimators symmetrically in an end-to-end network, our model achieves better image style transfer and generates high-quality depth maps. The experimental results demonstrate the effectiveness of our proposed method and comparable performance against the state-of-the-art. Code will be publicly available at: https://github.com/sshan-zhao/GASDA.

Authors (4)

Shanshan Zhao (39 papers)
Huan Fu (21 papers)
Mingming Gong (135 papers)
Dacheng Tao (829 papers)

Citations (173)

View on Semantic Scholar

Summary

Overview of "Geometry-Aware Symmetric Domain Adaptation for Monocular Depth Estimation"

The paper "Geometry-Aware Symmetric Domain Adaptation for Monocular Depth Estimation" introduces GASDA, a novel framework that integrates domain adaptation with geometry awareness for the task of monocular depth estimation. This paper addresses the challenge of obtaining reliable depth predictions without relying solely on the expensive acquisition of ground truth depth data from real images. GASDA leverages synthetic datasets, where labels are more accessible, alongside unsupervised cues from real stereo images to enhance depth estimation performance.

Key Concepts and Methodology

The central innovation in this work is the combination of bidirectional style transfer with geometry consistency constraints. Traditional depth estimation models struggle due to domain discrepancies between synthetic (source) and real-world (target) data. GASDA counters these issues through:

Bidirectional Style Transfer: GASDA employs symmetric style translators to map synthetic images to real domains and vice versa. This bidirectional approach improves the alignment between source and target distributions, crucial for domain adaptation.
Monocular Depth Estimation Networks: Two depth estimation models are trained—one with synthetic data and another with translated synthetic data—ensuring complementary learning and reducing biases from domain shift.
Geometry Consistency via Epipolar Constraints: The framework incorporates geometric cues, utilizing the epipolar geometry inherent in stereo images. This constraint encourages models to respect the structural layout of scenes, thus refining depth predictions.
Iterative End-to-End Training: The framework is designed for joint optimization of depth estimation and style translation networks, allowing them to reinforce each other iteratively through end-to-end learning.

Experimental Evaluation

Experiments conducted on the KITTI dataset demonstrate GASDA's effectiveness compared to state-of-the-art monocular depth estimation methods, with noted improvements over both supervised and unsupervised approaches. Moreover, GASDA exhibits strong generalization capabilities on the Make3D dataset, validating its practical applicability across different real-world scenarios.

Quantitative results show GASDA's superiority, with improvements noted across standard error and accuracy metrics such as Abs Rel, Sq Rel, RMSE, and accuracy thresholds. The ability to generate high-quality depth maps without domain-specific tuning underlines the robustness of the GASDA framework.

Implications and Future Directions

GASDA's methodology provides valuable insights for developing depth estimation models that do not rely heavily on cumbersome real-world annotations. The paper's approach to leveraging synthetic data through domain adaptation holds promise for various applications in autonomous driving, robotics, and augmented reality, where depth understanding is critical.

Future research could extend GASDA's framework by integrating other geometric constraints or enhancing bidirectional style transfer with more advanced translation techniques. Exploration of its application in real-time systems and further reducing computational overhead for deployment on resource-constrained devices could make the framework more widespread.

Overall, GASDA represents a significant step towards more robust unsupervised monocular depth estimation, bringing closer the integration of synthetic datasets with real-world applicability in computer vision tasks.