ADVENT: Adversarial Entropy Minimization for Domain Adaptation in Semantic Segmentation (1811.12833v2)

Published 30 Nov 2018 in cs.CV

Abstract: Semantic segmentation is a key problem for many computer vision tasks. While approaches based on convolutional neural networks constantly break new records on different benchmarks, generalizing well to diverse testing environments remains a major challenge. In numerous real world applications, there is indeed a large gap between data distributions in train and test domains, which results in severe performance loss at run-time. In this work, we address the task of unsupervised domain adaptation in semantic segmentation with losses based on the entropy of the pixel-wise predictions. To this end, we propose two novel, complementary methods using (i) entropy loss and (ii) adversarial loss respectively. We demonstrate state-of-the-art performance in semantic segmentation on two challenging "synthetic-2-real" set-ups and show that the approach can also be used for detection.

Citations (1,197)

View on Semantic Scholar

Summary

The paper introduces dual entropy minimization strategies to overcome domain discrepancies in semantic segmentation.
It employs direct loss and adversarial training to ensure confident, structured pixel-level predictions without relying on pseudo-labels.
Experiments on GTA5-to-Cityscapes and SYNTHIA-to-Cityscapes demonstrate significant mIoU improvements over previous state-of-the-art methods.

ADVENT: Adversarial Entropy Minimization for Domain Adaptation in Semantic Segmentation

The paper "ADVENT: Adversarial Entropy Minimization for Domain Adaptation in Semantic Segmentation" presents innovative methodologies designed to address the challenges posed by unsupervised domain adaptation (UDA) in semantic segmentation tasks. Given the significant discrepancies between training and testing data distributions in real-world applications, this paper introduces two novel approaches to bridge this gap effectively: direct entropy minimization and adversarial entropy minimization.

Overview of Approaches

The core idea behind this research lies in minimizing the entropy of pixel-wise predictions to improve the generalization of segmentation models across diverse domains. The two complementary methods proposed are:

Direct Entropy Minimization: This approach introduces an entropy loss that penalizes low-confidence predictions on the target domain.
Adversarial Entropy Minimization: In this method, adversarial training is employed to align the entropy distributions of the source and target domains in the weighted self-information space, leveraging structural dependencies in the semantic layouts.

Methodology

Direct Entropy Minimization

For the direct minimization of entropy, the paper adapts the Shannon Entropy concept to the UDA scenario. The entropy map for a given target image is constructed by computing the entropy of the pixel-wise predictions. The entropy loss is then formulated to sum up these pixel entropies, which effectively pushes the model towards more confident predictions for the target domain. This approach bypasses the need for pseudo-labels and complex scheduling used in self-training methods.

Mathematically, the entropy loss $\mathcal{L}_{ent}$ for a target image $x_t$ is defined as: $\mathcal{L}_{ent}(x_t) = \sum_{h,w} E_{x_t}^{(h,w)}$ where $E_{x_t}^{(h,w)}$ is the entropy at pixel $(h,w)$ .

Adversarial Entropy Minimization

To better capture the structural consistency across domains, the adversarial approach employs a discriminator network that distinguishes between the weighted self-information maps of source and target domains. By aligning these distributions, indirect entropy minimization is achieved. The adversarial loss $\mathcal{L}_{adv}$ enforces this alignment while the segmentation network is trained to fool the discriminator, thus reducing the cross-domain discrepancy.

Formally, the objective for training the discriminator is: $\min_{\theta_D} \frac{1}{|X_s|} \sum_{x_s} \mathcal{L}_{D}(I_{x_s}, 1) + \frac{1}{|X_t|} \sum_{x_t} \mathcal{L}_{D}(I_{x_t}, 0)$ where $I_{x}$ represents the weighted self-information maps.

Experimental Results

The proposed methodologies were validated using two challenging synthetic-to-real UDA scenarios:

GTA5 to Cityscapes
SYNTHIA to Cityscapes

The results demonstrate that both methods surpass existing state-of-the-art techniques in terms of mean-Intersection-over-Union (mIoU). For instance, the Adversarial Entropy Minimization model achieved 43.8% mIoU on the GTA5 to Cityscapes benchmark, outperforming previous methods like Adapt-SegMap.

Implementation and Ablation Studies

The experiments used the Deeplab-V2 architecture with both VGG-16 and ResNet-101 as backbone networks. The models incorporated additional practices such as training on specific entropy ranges and utilizing class-ratio priors to further enhance performance in certain setups. Notably, training on high-entropy pixels in the target domain yielded better results for ResNet-101-based models in the GTA5 to Cityscapes scenario.

Implications and Future Work

The findings suggest that entropy-based domain adaptation techniques can robustly improve semantic segmentation models' performance in varied and unlabelled target environments. The proposed methods not only advance the theoretical understanding of entropy in UDA but also provide practical solutions for real-world applications such as autonomous driving.

Future developments could explore the extension of these entropy minimization techniques to other computer vision tasks, such as object detection, as preliminary results indicate promising improvements. Additionally, incorporating more sophisticated generative models and exploring multi-modal UDA could further enhance domain adaptation frameworks.

In conclusion, ADVENT introduces viable and effective strategies for entropy minimization that push the boundaries of current UDA capabilities in semantic segmentation, showcasing significant improvements across challenging benchmarks. These insights pave the way for more resilient and adaptive computer vision systems.