Prototypical Pseudo Label Denoising and Target Structure Learning for Domain Adaptive Semantic Segmentation (2101.10979v2)

Published 26 Jan 2021 in cs.CV

Abstract: Self-training is a competitive approach in domain adaptive segmentation, which trains the network with the pseudo labels on the target domain. However inevitably, the pseudo labels are noisy and the target features are dispersed due to the discrepancy between source and target domains. In this paper, we rely on representative prototypes, the feature centroids of classes, to address the two issues for unsupervised domain adaptation. In particular, we take one step further and exploit the feature distances from prototypes that provide richer information than mere prototypes. Specifically, we use it to estimate the likelihood of pseudo labels to facilitate online correction in the course of training. Meanwhile, we align the prototypical assignments based on relative feature distances for two different views of the same target, producing a more compact target feature space. Moreover, we find that distilling the already learned knowledge to a self-supervised pretrained model further boosts the performance. Our method shows tremendous performance advantage over state-of-the-art methods. We will make the code publicly available.

Citations (449)

View on Semantic Scholar

Summary

The paper's main contribution is the introduction of a prototype-based method that dynamically refines noisy pseudo labels during training.
It develops a target structure learning strategy that aligns feature distributions across augmentations for more consistent segmentation.
The method achieves remarkable mIoU improvements, recording 57.5 on the GTA5-to-Cityscapes task and 55.5 on SYNTHIA-to-Cityscapes.

An Overview of Prototypical Pseudo Label Denoising and Target Structure Learning for Domain Adaptive Semantic Segmentation

The paper presents a novel approach named ProDA (Prototypical Domain Adaptation) for improving the performance of unsupervised domain adaptation (UDA) in semantic segmentation tasks. The proposed method leverages self-training with pseudo labels, addressing two major challenges: the noisiness of pseudo labels and the dispersed nature of target domain features.

Key Contributions and Methodology

ProDA introduces a method for online denoising of pseudo labels and enhances the target structure through an unsupervised learning approach. The pivotal innovation lies in the use of prototypes, which are the feature centroids of classes, for rectifying pseudo labels and learning the compact feature space in the target domain. The core ideas and contributions are as follows:

Prototypical Pseudo Label Denoising:
- The methodology revolves around calculating the likelihood of pseudo labels based on feature distances from prototypes. This estimation allows for the correction of pseudo labels during the ongoing training process.
- Prototypes are dynamically computed as the moving averages of the cluster centroids within mini-batches, promoting a gradual refinement of pseudo labels.
Target Structure Learning:
- Inspired by approaches such as Deepcluster, ProDA promotes a compact target feature space. It aligns prototypical assignments across different augmentations of the same target, aiding in the convergence of target features.
- The algorithm ensures that relative feature distances are consistent under varied views, thus creating a more condensed feature distribution that is beneficial for segmentation.
Knowledge Distillation:
- The paper highlights the performance improvement achieved by distilling learned knowledge to a self-supervised pretrained model. This step involves leveraging the teacher model's knowledge and further refining the student model's learning process, thereby achieving high performance.

Numerical Results and Performance

ProDA demonstrates significant performance improvements over existing state-of-the-art methods in domain adaptive semantic segmentation. Specifically, the model achieves a mean Intersection over Union (mIoU) of 57.5 when adapting from the GTA5 to Cityscapes dataset, and 55.5 from the SYNTHIA to Cityscapes dataset. Compared to previous methods, ProDA exhibits large mIoU gains, predominantly benefiting hard classes such as poles, traffic signs, and small objects.

The empirical results showcase ProDA's robustness and effectiveness, providing up to 52.6% and 58.5% improvements in adaptation gains relative to models without domain adaptation.

Implications and Future Developments

ProDA's advancement in unsupervised domain adaptation addresses critical issues of noisy pseudo labels and target feature dispersion, illustrating a path towards more reliable and practical domain adaptive segmentation models. The use of prototypes for denoising pseudo labels introduces a framework that can be extended to other tasks involving domain adaptation and noisy label learning.

In future developments, the methodology could be extended to consider multi-source domain adaptation and incorporate multi-scale feature representations to further enhance segmentation tasks. Additionally, exploring different architectures and considering temporal information could offer further advances in full-fledged real-time domain adaptation scenarios.

Overall, the proposed ProDA framework represents a substantial contribution to the field, exhibiting a sophisticated application of self-training, pseudo label refinement, and consistent learning strategies, underscoring the potential for its adoption and adaptation in various domain adaptation challenges within artificial intelligence and computer vision.

PDF Markdown