Domain Adaptive and Generalizable Network Architectures and Training Strategies for Semantic Image Segmentation (2304.13615v2)

Published 26 Apr 2023 in cs.CV

Abstract: Unsupervised domain adaptation (UDA) and domain generalization (DG) enable machine learning models trained on a source domain to perform well on unlabeled or even unseen target domains. As previous UDA&DG semantic segmentation methods are mostly based on outdated networks, we benchmark more recent architectures, reveal the potential of Transformers, and design the DAFormer network tailored for UDA&DG. It is enabled by three training strategies to avoid overfitting to the source domain: While (1) Rare Class Sampling mitigates the bias toward common source domain classes, (2) a Thing-Class ImageNet Feature Distance and (3) a learning rate warmup promote feature transfer from ImageNet pretraining. As UDA&DG are usually GPU memory intensive, most previous methods downscale or crop images. However, low-resolution predictions often fail to preserve fine details while models trained with cropped images fall short in capturing long-range, domain-robust context information. Therefore, we propose HRDA, a multi-resolution framework for UDA&DG, that combines the strengths of small high-resolution crops to preserve fine segmentation details and large low-resolution crops to capture long-range context dependencies with a learned scale attention. DAFormer and HRDA significantly improve the state-of-the-art UDA&DG by more than 10 mIoU on 5 different benchmarks. The implementation is available at https://github.com/lhoyer/HRDA.

References (84)

Citations (20)

View on Semantic Scholar

Summary

The paper presents DAFormer and HRDA, novel architectures that leverage multi-scale feature fusion to enhance domain adaptation in semantic segmentation.
It utilizes transformer-based context fusion and a multi-resolution framework to significantly improve mIoU scores, including a +16.3 boost on GTA→Cityscapes.
Incorporating strategies such as Rare Class Sampling and learning rate warmup, the work advances robust, domain-agnostic training for real-world applications.

Domain Adaptive and Generalizable Network Architectures for Semantic Image Segmentation

The paper, "Domain Adaptive and Generalizable Network Architectures and Training Strategies for Semantic Image Segmentation," presents a comprehensive approach to tackle the challenges of domain adaptation and generalization in semantic image segmentation. These tasks are pivotal in enabling models to effectively perform in environments or domains that are unseen or unlabeled, an area of immense interest given its potential applications in fields such as autonomous driving.

Key Contributions

The authors introduce two main contributions: DAFormer and HRDA, which are network architectures and training strategies designed to enhance domain robustness.

DAFormer Network Architecture: This architecture uses a Transformer-based approach, leveraging recent advancements in this area. Transformers are known for their robustness and ability to generalize across domains more efficiently than traditional CNNs. DAFormer incorporates context-aware feature fusion, utilizing multi-level features to improve domain adaptation and generalization performance. This fusion is particularly crucial given the structure of semantic segmentation tasks, which require both detailed and contextual understanding of image content.
HRDA Multi-resolution Framework: HRDA addresses the memory-intensive nature of unsupervised domain adaptation (UDA) by proposing a multi-resolution approach. It combines small high-resolution crops to capture fine segmentation details and large low-resolution crops to leverage long-range context, utilizing a learned scale attention to adaptively fuse these predictions.

Training Strategies and Regularization

The paper also discusses training strategies that mitigate overfitting, a common challenge when models are trained extensively on source domains. These strategies include:

Rare Class Sampling (RCS): This approach addresses class imbalances by frequently sampling images containing rare classes, thereby promoting the learning of robust domain-invariant representations.
Thing-Class ImageNet Feature Distance (FD): Leveraging ImageNet pretraining, this feature distillation helps maintain expressive features from pretraining, particularly for thing-classes, which are typically underrepresented in the synthetic training datasets.
Learning Rate Warmup: Common in training robust models, this technique stabilizes training by gradually increasing the learning rate.

Results and Implications

The proposed methods, DAFormer and HRDA, were validated across various benchmarks, showing significant improvements in mean Intersection over Union (mIoU) scores, notably establishing new performance standards in synthetic-to-real and adverse condition domain adaptations. For example, on the GTA $\rightarrow$ Cityscapes task, HRDA achieved a remarkable increase of +16.3 mIoU over previous state-of-the-art methods.

The implications of this research extend beyond improvement in performance metrics. The method's reliance on more generalizable network architectures like Transformers opens avenues for further exploration in domain-agnostic training regimes and potentially reduces the reliance on extensive domain-specific fine-tuning. The proposed methods and the provision of open-source implementations offer a robust framework for researchers aiming to develop domain-agnostic models.

Future Directions

Future research could explore even more refined fusion strategies in multi-resolution settings and further investigate the interplay between resolution and scale in domain robustness. Additionally, integrating domain adaptation techniques in unsupervised and self-supervised learning paradigms may provide further enhancement in generalization capabilities across a broader set of tasks and domains.

Overall, this work represents a significant stride in advancing the field of domain adaptive semantic segmentation, offering practical tools and insights for ongoing challenges in machine learning applications across diverse environments.

PDF Markdown

GitHub

GitHub - lhoyer/HRDA: [ECCV22] Official Implementation of HRDA: Context-Aware High-Resolution Domain-Adaptive Semantic Segmentation (232 stars)