- The paper introduces a progressive domain adaptation framework that first transfers images using CycleGAN and then refines detection with pseudo-labeling.
- The method boosts mean average precision by 5–20 percentage points, achieving 46.0% mAP on the Clipart1k dataset compared to baselines.
- This work enables effective object detection in resource-limited visual domains like watercolor and comic images, paving the way for scalable transfer learning.
Cross-Domain Weakly-Supervised Object Detection through Progressive Domain Adaptation
Overview
The paper introduces a framework for a novel task called cross-domain weakly supervised object detection. The core challenge addressed is detecting objects across differing image domains using limited annotations—specifically, instance-level annotations in a source domain and only image-level annotations in a target domain. This approach is particularly relevant when instance-level annotations are prohibitively difficult or expensive to obtain in the target domain, such as with watercolor or comic images.
Methodology
The authors propose a two-step progressive domain adaptation technique. The process begins with a fully supervised object detector (FSD) trained on a source domain with complete instance-level annotations. The adaptation involves:
- Domain Transfer (DT): Using a CycleGAN to perform unpaired image-to-image translation, domain-transferred images resembling the target domain are generated from the source domain images. The FSD is fine-tuned on these newly transformed images.
- Pseudo-Labeling (PL): Pseudo instance-level annotations are created by selecting the most confident detections from the FSD, which has been fine-tuned with DT. These pseudo-labels are used for further fine-tuning, refining the object detector's performance in the target domain.
Experiments and Results
The framework underwent rigorous testing on newly created datasets—Clipart1k, Watercolor2k, and Comic2k. Each dataset encompasses various visual domains removed from natural imagery, featuring a mix of 1,000–2,000 images with instance-level annotations. Key numerical outcomes of the paper highlight:
- An improvement of 5 to 20 percentage points in mean average precision (mAP) across all dataset evaluations compared to baseline methods.
- The combined DT+PL approach resulted in an mAP of 46.0% in Clipart1k, substantially closing the gap with an ideal case scenario, which had complete instance-level access in the target domain.
Implications
This research contributes significantly to the fields of domain adaptation and weakly supervised learning. The proposed framework demonstrates a viable pathway for extending the capabilities of object detectors across diverse and challenging visual domains without the prerequisite of exhaustive annotation efforts.
Future Directions
Future research opportunities include enhancing the localization accuracy of pseudo-labels. This could involve integrating multiple instance learning principles to better utilize partially reliable pseudo annotations. Additionally, further exploration into scalable transfer learning techniques across more varied domains will be crucial for practical applications in areas like autonomous driving and digital media content analysis.
Overall, this paper establishes a robust baseline and opens avenues for research into efficient use of limited annotations, broadening the application scope of object detection technologies.