MIC: Masked Image Consistency for Context-Enhanced Domain Adaptation (2212.01322v2)

Published 2 Dec 2022 in cs.CV

Abstract: In unsupervised domain adaptation (UDA), a model trained on source data (e.g. synthetic) is adapted to target data (e.g. real-world) without access to target annotation. Most previous UDA methods struggle with classes that have a similar visual appearance on the target domain as no ground truth is available to learn the slight appearance differences. To address this problem, we propose a Masked Image Consistency (MIC) module to enhance UDA by learning spatial context relations of the target domain as additional clues for robust visual recognition. MIC enforces the consistency between predictions of masked target images, where random patches are withheld, and pseudo-labels that are generated based on the complete image by an exponential moving average teacher. To minimize the consistency loss, the network has to learn to infer the predictions of the masked regions from their context. Due to its simple and universal concept, MIC can be integrated into various UDA methods across different visual recognition tasks such as image classification, semantic segmentation, and object detection. MIC significantly improves the state-of-the-art performance across the different recognition tasks for synthetic-to-real, day-to-nighttime, and clear-to-adverse-weather UDA. For instance, MIC achieves an unprecedented UDA performance of 75.9 mIoU and 92.8% on GTA-to-Cityscapes and VisDA-2017, respectively, which corresponds to an improvement of +2.1 and +3.0 percent points over the previous state of the art. The implementation is available at https://github.com/lhoyer/MIC.

View on arXiv

Authors (4)

Lukas Hoyer (21 papers)
Dengxin Dai (99 papers)
Haoran Wang (142 papers)
Luc Van Gool (570 papers)

Citations (180)

View on Semantic Scholar

Summary

Analysis of "MIC: Masked Image Consistency for Context-Enhanced Domain Adaptation"

In the paper "MIC: Masked Image Consistency for Context-Enhanced Domain Adaptation," the authors address a significant challenge in unsupervised domain adaptation (UDA): the difficulty of accurately recognizing classes that have similar visual appearances in the target domain without access to ground truth annotations. The work introduces a novel method called Masked Image Consistency (MIC) that enhances UDA by effectively utilizing spatial context relations in the target domain to improve visual recognition tasks.

Contribution and Methodology

The paper's primary contribution is the MIC module, designed to integrate into existing UDA frameworks to leverage spatial context for improved recognition in tasks such as image classification, semantic segmentation, and object detection. MIC enforces consistency between predictions of masked target images, where portions of the image are withheld, and pseudo-labels generated from the complete image by an exponential moving average (EMA) teacher model. This approach encourages the network to infer the semantics of masked regions using surrounding contextual clues, thereby improving its ability to differentiate between visually similar classes in the target domain.

The concept is straightforward yet universal, allowing MIC to be applied across various UDA methods and visual recognition tasks. Remarkably, its integration demonstrated significant performance improvements, setting new state-of-the-art results in multiple UDA benchmarks such as synthetic-to-real (e.g., GTA $\to$ Cityscapes), day-to-nighttime, and clear-to-adverse-weather adaptations.

Results

The experimental results showcase the effectiveness of MIC. For instance, on the GTA $\to$ Cityscapes and VisDA-2017 benchmarks, MIC achieved mIoU and classification accuracy improvements of 75.9% and 92.8%, respectively, which marked improvements over previous state-of-the-art methods by up to 4.3 percentage points. These results underscore MIC's capability to substantially bridge the performance gap between UDA and fully-supervised approaches.

MIC demonstrates its utility by effectively resolving ambiguities in visual recognition, as shown in scenarios where context plays a crucial role, such as distinguishing roads from sidewalks or recognizing vehicles in adverse weather conditions. The method proves particularly beneficial for classes that typically present adaptation challenges, due to their reliance on subtle contextual clues.

Implications and Future Directions

The introduction of MIC has important implications for the field of UDA. By improving contextual learning in the target domain, MIC not only enhances recognition performance but also brings UDA applications closer to their supervised learning counterparts. This advancement could lead to more practical applications where collecting labeled data in target domains is infeasible, such as autonomous driving in various environmental conditions or synthetic-to-real deployments in industrial settings.

Looking forward, future research could expand on MIC by exploring its integration with further advances in network architectures, particularly involving different types of Transformers. Additionally, expansive studies into the role of context in other domain adaptation scenarios beyond visual recognition could further refine the mechanisms by which context is leveraged to enhance learning and performance in target domains.

In conclusion, the paper advances our understanding and capabilities in UDA with the introduction of MIC—providing a pragmatic tool that leverages context more effectively to address one of the core challenges in adapting models across different domains. While the approach is currently applied to visual tasks, its underlying principles may pave the way for broader applications that require domain adaptation combined with contextual reasoning.

PDF Markdown

MIC: Masked Image Consistency for Context-Enhanced Domain Adaptation (2212.01322v2)

Summary

Analysis of "MIC: Masked Image Consistency for Context-Enhanced Domain Adaptation"

Contribution and Methodology

Results

Implications and Future Directions

Related Papers

GitHub

YouTube