Unbiased Mean Teacher for Cross-domain Object Detection (2003.00707v2)

Published 2 Mar 2020 in cs.CV

Abstract: Cross-domain object detection is challenging, because object detection model is often vulnerable to data variance, especially to the considerable domain shift between two distinctive domains. In this paper, we propose a new Unbiased Mean Teacher (UMT) model for cross-domain object detection. We reveal that there often exists a considerable model bias for the simple mean teacher (MT) model in cross-domain scenarios, and eliminate the model bias with several simple yet highly effective strategies. In particular, for the teacher model, we propose a cross-domain distillation method for MT to maximally exploit the expertise of the teacher model. Moreover, for the student model, we alleviate its bias by augmenting training samples with pixel-level adaptation. Finally, for the teaching process, we employ an out-of-distribution estimation strategy to select samples that most fit the current model to further enhance the cross-domain distillation process. By tackling the model bias issue with these strategies, our UMT model achieves mAPs of 44.1%, 58.1%, 41.7%, and 43.1% on benchmark datasets Clipart1k, Watercolor2k, Foggy Cityscapes, and Cityscapes, respectively, which outperforms the existing state-of-the-art results in notable margins. Our implementation is available at https://github.com/kinredon/umt.

Citations (254)

View on Semantic Scholar

Summary

The paper presents the Unbiased Mean Teacher model that mitigates source bias by employing cross-domain distillation with style-translated teacher inputs.
It utilizes pixel-level adaptation to augment training data, exposing the student model to both source and target-like images for diversified learning.
The approach incorporates out-of-distribution estimation to selectively refine target predictions, significantly boosting detection performance across benchmarks.

Overview of the Unbiased Mean Teacher for Cross-domain Object Detection

The paper "Unbiased Mean Teacher for Cross-domain Object Detection" offers a comprehensive exploration into the challenges and solutions associated with cross-domain object detection. This field is ever-relevant as the adaptability of object detection models across diverse domains remains a significant limitation in practical applications, such as autonomous driving in varying environmental conditions.

Key Contributions and Methods

This work presents a novel approach named the Unbiased Mean Teacher (UMT), an advancement over the Mean Teacher (MT) model initially used for semi-supervised learning. The authors identify a critical barrier in the conventional MT model: considerable model bias toward the source domain when dealing with cross-domain scenarios. To address this, the paper introduces a set of strategic improvements specifically targeting this bias, leading to an enhanced UMT model that achieves superior cross-domain performances.

1. Cross-Domain Distillation: The UMT employs a cross-domain distillation method. By translating target images into a source-like style using methods such as CycleGAN, the teacher model produces more precise predictions. These predictions are then used to train the student model, which continues to operate on the unmodified target images. This dual-input strategy for teacher and student models significantly minimizes the impact of domain-specific biases.

2. Augmented Training Samples: To combat student model bias, training samples are augmented through pixel-level adaptation. This involves generating target-like images from source domain data. The student model is consequently exposed to an extended dataset encompassing both source and target-like images, further mitigating bias through diversified training inputs.

3. Out-of-Distribution Estimation: Lastly, the UMT model incorporates an out-of-distribution estimation strategy to enhance distillation efficiency. This involves using confidence prediction mechanisms to selectively identify and utilize target samples that fit the current model most aptly, dynamically refining the model's focus during its teaching phase.

Empirical Results and Implications

The UMT model's efficacy is empirically validated across several benchmark datasets (Clipart1k, Watercolor2k, Foggy Cityscapes, and Cityscapes). The model consistently outperforms existing state-of-the-art techniques, indicating its robust ability to generalize across domains. Notably, the UMT achieves mean average precisions (mAPs) of 44.1%, 58.1%, 41.7%, and 43.1% respectively on these datasets, showcasing significant margins over existing methods.

Theoretical and Practical Implications

Theoretically, the UMT model sets a precedent for addressing domain bias through innovative cross-domain interactions within teacher-student paradigms. It exemplifies how multi-faceted adaptations (pixel-level, sample selection, and model distillation) can collectively resolve biases that hamper domain transferability.

Practically, the UMT's successes present a compelling case for their application in real-world challenges where data variance and labeling constraints pose significant obstacles. Particularly, its application potential in autonomous systems, where adaptive detection is critical, cannot be overstated.

Future Directions

The field of cross-domain object detection could evolve by expanding upon the UMT framework in several ways. Potential future endeavors might include extending these strategies to other architectures beyond Faster RCNN, exploring more advanced generative models for image translation, and further refining out-of-distribution estimates. Moreover, scaling these ideas to a broader range of real-world conditions and more complex datasets would be crucial for developing general-purpose, adaptable detection systems.

In summary, the "Unbiased Mean Teacher for Cross-domain Object Detection" paper provides an impactful contribution to the domain of adaptive object detection, offering both a profound understanding of model biases and effective counterstrategies to overcome them.

PDF Markdown