Improving Semi-Supervised and Domain-Adaptive Semantic Segmentation with Self-Supervised Depth Estimation (2108.12545v1)

Published 28 Aug 2021 in cs.CV

Abstract: Training deep networks for semantic segmentation requires large amounts of labeled training data, which presents a major challenge in practice, as labeling segmentation masks is a highly labor-intensive process. To address this issue, we present a framework for semi-supervised and domain-adaptive semantic segmentation, which is enhanced by self-supervised monocular depth estimation (SDE) trained only on unlabeled image sequences. In particular, we utilize SDE as an auxiliary task comprehensively across the entire learning framework: First, we automatically select the most useful samples to be annotated for semantic segmentation based on the correlation of sample diversity and difficulty between SDE and semantic segmentation. Second, we implement a strong data augmentation by mixing images and labels using the geometry of the scene. Third, we transfer knowledge from features learned during SDE to semantic segmentation by means of transfer and multi-task learning. And fourth, we exploit additional labeled synthetic data with Cross-Domain DepthMix and Matching Geometry Sampling to align synthetic and real data. We validate the proposed model on the Cityscapes dataset, where all four contributions demonstrate significant performance gains, and achieve state-of-the-art results for semi-supervised semantic segmentation as well as for semi-supervised domain adaptation. In particular, with only 1/30 of the Cityscapes labels, our method achieves 92% of the fully-supervised baseline performance and even 97% when exploiting additional data from GTA. The source code is available at https://github.com/lhoyer/improving_segmentation_with_selfsupervised_depth.

PDF Abstract

Improving Semi-Supervised and Domain-Adaptive Semantic Segmentation with Self-Supervised Depth Estimation

This essay reviews a paper that presents a novel framework for enhancing semantic segmentation within semi-supervised and domain-adaptive contexts by leveraging self-supervised depth estimation (SDE). The authors propose several innovative methods that collectively utilize the SDE paradigm to address the challenge of requiring large amounts of labeled data, which is typically a significant bottleneck in training deep semantic segmentation networks.

Key Contributions and Methodology

The crux of the paper lies in four pivotal contributions, each addressing different facets of the training pipeline for semantic segmentation:

Automatic Data Selection: This method alleviates the need for extensive human annotation by substituting traditional active learning with a fully automated proxy using SDE. It employs a combination of diversity and uncertainty sampling, based on depth estimation, to judiciously select training samples that maximize the learning efficiency. This approach eliminates human-in-the-loop requirements, significantly enhancing scalability.
DepthMix Data Augmentation: Drawing inspiration from CutMix\cite{yun2019cutmix} and its derivatives, the DepthMix augmentation approach integrates SDE to generate geometrically consistent syntactic samples. This method mitigates unrealistic occlusions often produced by other data-mixing techniques by respecting depth estimates while mixing two images, creating a more robust training dataset.
Multi-Task Learning Framework: This component exploits the joint estimation of depth and semantic segmentation to achieve superior feature learning. By implementing transfer learning of SDE features and leveraging them through multi-task learning, the framework improves the accuracy of segmentation tasks. This is particularly beneficial for classes with significant depth discontinuities.
Domain Adaptation through Cross-Domain DepthMix: Extending the DepthMix concept, the paper tackles domain adaptation by aligning synthetic and real data through Cross-Domain DepthMix, enhanced with Matching Geometry Sampling. This approach ensures that depth-induced domain discrepancies are minimized, facilitating better generalization across domain boundaries.

Experimental Results and Implications

The framework was rigorously tested on datasets such as Cityscapes, CamVid, GTA5, and Synthia, demonstrating its efficacy. Notably, the full integration of the proposed methodologies resulted in achieving 92% of state-of-the-art supervised performance with only 1/30 of the annotated samples from Cityscapes. Additionally, domain adaptation experiments on GTA5 to Cityscapes further underscored the capability of this approach, achieving almost 97% of a fully supervised baseline with minimal target domain annotations.

These numerical results underscore several significant claims: the combined use of SDE and semantic segmentation diminishes the dependency on labeled data, enhances the representation quality of deep networks, and systematically bridges the domain gap in domain adaptation tasks. The impact is visible not only in decreased annotation costs (a critical practical concern) but also in enhanced performance, particularly in challenging environments.

Future Directions

The implications of this research broaden horizons for future work in semantic segmentation and related fields. The integration of self-supervised tasks presents an opportunity to explore further unsupervised and self-supervised strategies to enhance robustness in varying visual conditions. Future research could examine the scalability of such frameworks across diverse architectures or explore the potential of other auxiliary tasks that could synergize similarly with segmentation tasks.

In conclusion, the paper contributes substantial advancements to the field of semantic segmentation, especially in environments where data annotations are a constraining factor. By leveraging self-supervised depth estimation to enrich and augment the semantic segmentation framework, the authors open avenues for efficient and effective model training with significantly reduced manual labor. Such innovation not only propels the capabilities of semantic segmentation forward but also aligns closely with broader trends towards self-supervised learning in computer vision.

PDF Markdown Bookmark Chat (Pro)

Authors (5)

Lukas Hoyer (21 papers)
Dengxin Dai (99 papers)
Qin Wang (142 papers)
Yuhua Chen (35 papers)
Luc Van Gool (569 papers)

Citations (29)

View on Semantic Scholar

Improving Semi-Supervised and Domain-Adaptive Semantic Segmentation with Self-Supervised Depth Estimation (2108.12545v1)