Self-supervised Augmentation Consistency for Adapting Semantic Segmentation (2105.00097v1)

Published 30 Apr 2021 in cs.CV and cs.LG

Abstract: We propose an approach to domain adaptation for semantic segmentation that is both practical and highly accurate. In contrast to previous work, we abandon the use of computationally involved adversarial objectives, network ensembles and style transfer. Instead, we employ standard data augmentation techniques $-$ photometric noise, flipping and scaling $-$ and ensure consistency of the semantic predictions across these image transformations. We develop this principle in a lightweight self-supervised framework trained on co-evolving pseudo labels without the need for cumbersome extra training rounds. Simple in training from a practitioner's standpoint, our approach is remarkably effective. We achieve significant improvements of the state-of-the-art segmentation accuracy after adaptation, consistent both across different choices of the backbone architecture and adaptation scenarios.

Citations (212)

View on Semantic Scholar

Summary

The paper presents Self-supervised Augmentation Consistency (SAC), a novel unsupervised domain adaptation method for semantic segmentation that enforces consistency across augmented data transformations.
The SAC method simplifies UDA by using a momentum network and class-based thresholding for stable pseudo-labels, achieving state-of-the-art results on standard benchmarks without adversarial training.
This simple approach reduces training instability, is computationally efficient compared to adversarial methods, and shows potential for adaptation to other dense prediction tasks.

Self-supervised Augmentation Consistency for Adapting Semantic Segmentation

This paper presents a novel approach to unsupervised domain adaptation (UDA) for semantic segmentation by leveraging self-supervised learning techniques. The authors propose a method termed "Self-supervised Augmentation Consistency" (SAC), which eschews complex adversarial training frameworks and style transfer in favor of using standard data augmentation techniques. The core idea is to enforce consistency in semantic predictions across various image transformations, such as photometric noise, flipping, and scaling.

Methodological Overview

Framework Design: The framework consists of a segmentation network and a momentum network, a slowly evolving replica of the segmentation network. This momentum network provides stable targets for self-supervision. The approach maintains model simplicity by utilizing standard augmentation techniques and does not require additional network modules that are commonly used in existing methods, such as style transfer networks or adversarial discriminators.
Training Schema: The key principle is enforcing semantic consistency across augmented images. The process involves generating multiple random crop scales and flips from the target domain images, which are processed through both the networks. The averaged predictions from the momentum network serve as pseudo-labels for training the segmentation network.
Pseudo-Label Generation: The method innovatively introduces a class-based thresholding strategy to create self-supervised pseudo-labels dynamically during training. This step employs an exponentially moving class prior to adjust confidence thresholds, thus improving segmentation quality for rare classes effectively.
Loss Function: The framework uses a focal loss with confidence regularization to enhance the robustness of training. This loss is aligned with the moving class priors, emphasizing under-sampled classes without incurring additional sampling costs.

Experimental Evaluation

The authors demonstrate the efficacy of their method on standard benchmarks, GTA5 and SYNTHIA as synthetic source domains adapted to the real-world Cityscapes dataset. The SAC approach attains state-of-the-art results across both VGG-16 and ResNet-101 backbones, validating its effectiveness and simplicity compared to more intricate methods.

Results: SAC achieves significant improvements in mean Intersection-over-Union (mIoU), surpassing existing methods that use multiple training paradigms. Notably, SAC does so without incurring the computational burden typical of adversarial-based adaptation approaches.

Implications

The methodological simplifications offered by SAC have profound implications:

Practicality: This approach provides a robust domain adaptation method that requires minimal additional computational resources beyond those needed for training a single model. The absence of adversarial components reduces training instability and enhances reproducibility.
Adaptability: By focusing on consistency checks and self-supervised pseudo-labeling, SAC can be potentially adapted to other dense prediction tasks, such as depth estimation or instance segmentation, leveraging its augmentation-based consistency framework.

Future Directions

Potential future research avenues include:

Broader Adaptation: Extending the SAC methodology to encompass further applications beyond semantic segmentation, particularly in tasks where labelled data is scarce or expensive.
Alternative Domains: Exploration of SAC's effectiveness in different domain contexts, such as medical imaging or other fields where domain shifts are substantial and challenging to address.
Incorporation of Domain-Specific Priors: Investigating the incorporation of domain-specific priors to further boost adaptation accuracy, particularly for highly specialized domains.

Conclusion

This paper successfully advances the field of semantic segmentation by proposing an efficient, scalable, and easy-to-implement framework for unsupervised domain adaptation. SAC leverages the potential of self-supervised learning and data augmentation to achieve outstanding results. It opens up new pathways for domain adaptation strategies that are efficient both in terms of computational resources and the ease of integration into existing pipelines.

PDF Markdown

Related Papers

YouTube

Show All Videos