PixMatch: Unsupervised Domain Adaptation via Pixelwise Consistency Training (2105.08128v1)

Published 17 May 2021 in cs.CV and cs.AI

Abstract: Unsupervised domain adaptation is a promising technique for semantic segmentation and other computer vision tasks for which large-scale data annotation is costly and time-consuming. In semantic segmentation, it is attractive to train models on annotated images from a simulated (source) domain and deploy them on real (target) domains. In this work, we present a novel framework for unsupervised domain adaptation based on the notion of target-domain consistency training. Intuitively, our work is based on the idea that in order to perform well on the target domain, a model's output should be consistent with respect to small perturbations of inputs in the target domain. Specifically, we introduce a new loss term to enforce pixelwise consistency between the model's predictions on a target image and a perturbed version of the same image. In comparison to popular adversarial adaptation methods, our approach is simpler, easier to implement, and more memory-efficient during training. Experiments and extensive ablation studies demonstrate that our simple approach achieves remarkably strong results on two challenging synthetic-to-real benchmarks, GTA5-to-Cityscapes and SYNTHIA-to-Cityscapes. Code is available at: https://github.com/lukemelas/pixmatch

Authors (2)

Luke Melas-Kyriazi (22 papers)
Arjun K. Manrai (5 papers)

Citations (110)

View on Semantic Scholar

Summary

Unsupervised Domain Adaptation for Semantic Segmentation

The paper PixMatch: Unsupervised Domain Adaptation via Pixelwise Consistency Training introduces a novel framework for enhancing unsupervised domain adaptation (UDA) in semantic segmentation tasks. Traditional semantic segmentation methodologies rely heavily on large datasets with pixel-level annotations, which are often costly and labor-intensive to produce. To address this challenge, UDA seeks to leverage available annotated datasets from simulated environments (source domain) and apply them to real-world settings (target domain).

PixMatch is built upon the principle of enforcing pixelwise consistency in the target domain. Unlike adversarial approaches that have historically dominated the UDA landscape, PixMatch introduces a simpler, more stable training mechanism that prioritizes consistency over the typical domain-invariant representation. This is achieved through a new loss term that compares model predictions from a target image and its perturbed variants, promoting pixelwise consistency.

Experimental Findings

The authors test PixMatch across two benchmarks: GTA5-to-Cityscapes and SYNTHIA-to-Cityscapes. Through experimental evaluations, PixMatch displays comparable or superior performance against various state-of-the-art adversarial and self-training methods. Key numerical results include:

For GTA5-to-Cityscapes, PixMatch achieves a mean Intersection-over-Union (mIoU) of 48.3, and when combined with self-training, it improves to 50.3.
On SYNTHIA-to-Cityscapes, PixMatch averages an mIoU of 51.1 with Fourier-based consistency perturbations.

The results highlight PixMatch's efficacy in improving semantic segmentation performance without complex architectures or adversarial losses.

Methodological Highlights

PixMatch simplifies the UDA process by introducing consistency training—this approach encourages models to produce consistent segmentation upon jittered input. The paper explores multiple perturbation functions inspired by advancements in semi/self-supervised learning and domain adaptation, including data augmentation, CutMix, Fourier transform, and style transfer.

Key features of PixMatch include:

Simplicity in Implementation: Its design requires minimal hyperparameters, facilitating integration into existing pipelines.
Stability and Efficiency: PixMatch offers a robust training environment, reducing the sensitivity typically encountered with adversarial loss functions.

Implications and Future Directions

Conceptually, PixMatch shifts the focus from complex multi-network setups to robust consistency models, which can have numerous practical applications in environments where data variability and cost are significant challenges. Theoretically, PixMatch offers a new direction in UDA research, emphasizing loss-driven consistency over representation learning.

Future research avenues could explore the interactions between different perturbation functions, exploring stochastic applications, or adapting PixMatch principles to tasks beyond segmentation, such as object detection and video interpretation. Additionally, further exploration into the scalability and optimization of PixMatch in larger datasets, varying resolutions, or multi-domain adaptations may provide deeper insights into its capabilities and limitations.

In summary, PixMatch introduces a pivotal approach in UDA for semantic segmentation, simplifying complexity, ensuring stability, and delivering compelling performance across challenging benchmarks. It presents a solid foundation for further exploration and application in the field of computer vision.

PDF Markdown

Related Papers

Find Related Papers

GitHub

GitHub - lukemelas/pixmatch (36 stars)