Cross-domain Object Detection through Coarse-to-Fine Feature Adaptation (2003.10275v1)

Published 23 Mar 2020 in cs.CV

Abstract: Recent years have witnessed great progress in deep learning based object detection. However, due to the domain shift problem, applying off-the-shelf detectors to an unseen domain leads to significant performance drop. To address such an issue, this paper proposes a novel coarse-to-fine feature adaptation approach to cross-domain object detection. At the coarse-grained stage, different from the rough image-level or instance-level feature alignment used in the literature, foreground regions are extracted by adopting the attention mechanism, and aligned according to their marginal distributions via multi-layer adversarial learning in the common feature space. At the fine-grained stage, we conduct conditional distribution alignment of foregrounds by minimizing the distance of global prototypes with the same category but from different domains. Thanks to this coarse-to-fine feature adaptation, domain knowledge in foreground regions can be effectively transferred. Extensive experiments are carried out in various cross-domain detection scenarios. The results are state-of-the-art, which demonstrate the broad applicability and effectiveness of the proposed approach.

Authors (4)

Yangtao Zheng (2 papers)
Di Huang (203 papers)
Songtao Liu (34 papers)
Yunhong Wang (115 papers)

Citations (187)

View on Semantic Scholar

Summary

The paper introduces a coarse-to-fine feature adaptation strategy that aligns both foreground and class-specific features to mitigate domain shift in object detection.
The Attention-based Region Transfer module leverages adversarial learning to emphasize transferable foreground regions, reducing reliance on domain-specific annotations.
The Prototype-based Semantic Alignment method iteratively refines category prototypes to ensure robust cross-domain semantic alignment, achieving significant improvements on multiple benchmarks.

Cross-domain Object Detection through Coarse-to-Fine Feature Adaptation

The paper entitled "Cross-domain Object Detection through Coarse-to-Fine Feature Adaptation" presents a novel approach to address the persistent challenge of domain shift in object detection tasks within deep learning frameworks. Standard object detectors, typically trained on labeled datasets, experience a marked decline in performance when deployed on unseen domains due to discrepancies in domain characteristics known as domain shift. This issue is specifically problematic in applications whereby detectors must generalize to a broad spectrum of visual environments without the luxury of annotated data for each new scene.

Coarse-to-Fine Adaptation Framework

The proposed methodology is underpinned by a coarse-to-fine feature adaptation strategy, integrating both a coarse attention-focused phase and a fine-grained semantic alignment phase, aimed at effectively transferring domain-specific knowledge in the detection of objects from one domain to another.

Attention-based Region Transfer (ART): This module introduces a class-agnostic focus on foreground regions, leveraging an attention mechanism driven by the high-level feature maps derived from the Region Proposal Network (RPN). Foregrounds are emphasized while aligning their marginal distributions via multilayer adversarial learning. This stage facilitates domain confusion by concentrating on the parts of images that carry more relevant transferable features, rather than treating the entire feature space uniformly.
Prototype-based Semantic Alignment (PSA): The fine-grained phase further refines adaptation by focusing on class-specific features. Global prototypes are maintained for each category, updated iteratively during training to correct misaligned features across domains. This step employs the minimization of feature distances between corresponding class prototypes across source and target domains. This approach exploits categorical variance and ensures robust semantic alignment, even in the presence of potentially noisy pseudo-labels in the target domain.

Experimental Validation

The authors evaluate the framework's efficacy across several benchmarks with diverse domain shifts: normal-to-foggy weather conditions (Cityscapes → FoggyCityscapes), synthetic-to-real domain transition (SIM10k → Cityscapes), and cross-camera adaptation (Cityscapes → KITTI). The experiments consistently demonstrate state-of-the-art results, showcasing significant enhancements over baseline models and competitive methods like SWDA and SCDA. The model excels particularly in reducing the Proxy $\mathcal{A}$ -distance, indicating effective reduction in cross-domain feature discrepancies, hence affirming its capability in domain adaptation.

Practical and Theoretical Implications

The practical applications of this research are multifaceted, spanning various fields such as autonomous driving and urban monitoring systems, where robust object detection across diverse environments is crucial. The theoretical implications extend the understanding of how fine-grained adaptation techniques can be applied to align higher-dimensional feature spaces from distinctly different domains. This paper introduces a robust transfer mechanism, potentially influencing future work in domain invariant feature learning and broader applications concerning cross-domain detections.

Future Directions

Potential future research could explore several avenues such as further enhancing the adaptability of this approach to newer domains with even less overlap in terms of visual and semantic features, leveraging self-supervised learning techniques to improve adaptability, or integrating domain-specific augmentation strategies to boost the performance of object detection models even further. The general framework may also be adapted to one-stage detection models or applied to adjacent tasks like domain-adaptive semantic segmentation.

PDF Markdown