Cross-Domain Weakly-Supervised Object Detection through Progressive Domain Adaptation (1803.11365v1)

Published 30 Mar 2018 in cs.CV

Abstract: Can we detect common objects in a variety of image domains without instance-level annotations? In this paper, we present a framework for a novel task, cross-domain weakly supervised object detection, which addresses this question. For this paper, we have access to images with instance-level annotations in a source domain (e.g., natural image) and images with image-level annotations in a target domain (e.g., watercolor). In addition, the classes to be detected in the target domain are all or a subset of those in the source domain. Starting from a fully supervised object detector, which is pre-trained on the source domain, we propose a two-step progressive domain adaptation technique by fine-tuning the detector on two types of artificially and automatically generated samples. We test our methods on our newly collected datasets containing three image domains, and achieve an improvement of approximately 5 to 20 percentage points in terms of mean average precision (mAP) compared to the best-performing baselines.

Authors (4)

Naoto Inoue (15 papers)
Ryosuke Furuta (22 papers)
Toshihiko Yamasaki (74 papers)
Kiyoharu Aizawa (67 papers)

Citations (495)

View on Semantic Scholar

Summary

The paper introduces a progressive domain adaptation framework that first transfers images using CycleGAN and then refines detection with pseudo-labeling.
The method boosts mean average precision by 5–20 percentage points, achieving 46.0% mAP on the Clipart1k dataset compared to baselines.
This work enables effective object detection in resource-limited visual domains like watercolor and comic images, paving the way for scalable transfer learning.

Cross-Domain Weakly-Supervised Object Detection through Progressive Domain Adaptation

Overview

The paper introduces a framework for a novel task called cross-domain weakly supervised object detection. The core challenge addressed is detecting objects across differing image domains using limited annotations—specifically, instance-level annotations in a source domain and only image-level annotations in a target domain. This approach is particularly relevant when instance-level annotations are prohibitively difficult or expensive to obtain in the target domain, such as with watercolor or comic images.

Methodology

The authors propose a two-step progressive domain adaptation technique. The process begins with a fully supervised object detector (FSD) trained on a source domain with complete instance-level annotations. The adaptation involves:

Domain Transfer (DT): Using a CycleGAN to perform unpaired image-to-image translation, domain-transferred images resembling the target domain are generated from the source domain images. The FSD is fine-tuned on these newly transformed images.
Pseudo-Labeling (PL): Pseudo instance-level annotations are created by selecting the most confident detections from the FSD, which has been fine-tuned with DT. These pseudo-labels are used for further fine-tuning, refining the object detector's performance in the target domain.

Experiments and Results

The framework underwent rigorous testing on newly created datasets—Clipart1k, Watercolor2k, and Comic2k. Each dataset encompasses various visual domains removed from natural imagery, featuring a mix of 1,000–2,000 images with instance-level annotations. Key numerical outcomes of the paper highlight:

An improvement of 5 to 20 percentage points in mean average precision (mAP) across all dataset evaluations compared to baseline methods.
The combined DT+PL approach resulted in an mAP of 46.0% in Clipart1k, substantially closing the gap with an ideal case scenario, which had complete instance-level access in the target domain.

Implications

This research contributes significantly to the fields of domain adaptation and weakly supervised learning. The proposed framework demonstrates a viable pathway for extending the capabilities of object detectors across diverse and challenging visual domains without the prerequisite of exhaustive annotation efforts.

Future Directions

Future research opportunities include enhancing the localization accuracy of pseudo-labels. This could involve integrating multiple instance learning principles to better utilize partially reliable pseudo annotations. Additionally, further exploration into scalable transfer learning techniques across more varied domains will be crucial for practical applications in areas like autonomous driving and digital media content analysis.

Overall, this paper establishes a robust baseline and opens avenues for research into efficient use of limited annotations, broadening the application scope of object detection technologies.

PDF Markdown