Domain Adaptive Faster R-CNN for Object Detection in the Wild (1803.03243v1)

Published 8 Mar 2018 in cs.CV

Abstract: Object detection typically assumes that training and test data are drawn from an identical distribution, which, however, does not always hold in practice. Such a distribution mismatch will lead to a significant performance drop. In this work, we aim to improve the cross-domain robustness of object detection. We tackle the domain shift on two levels: 1) the image-level shift, such as image style, illumination, etc, and 2) the instance-level shift, such as object appearance, size, etc. We build our approach based on the recent state-of-the-art Faster R-CNN model, and design two domain adaptation components, on image level and instance level, to reduce the domain discrepancy. The two domain adaptation components are based on H-divergence theory, and are implemented by learning a domain classifier in adversarial training manner. The domain classifiers on different levels are further reinforced with a consistency regularization to learn a domain-invariant region proposal network (RPN) in the Faster R-CNN model. We evaluate our newly proposed approach using multiple datasets including Cityscapes, KITTI, SIM10K, etc. The results demonstrate the effectiveness of our proposed approach for robust object detection in various domain shift scenarios.

Authors (5)

Yuhua Chen (35 papers)
Wen Li (107 papers)
Christos Sakaridis (46 papers)
Dengxin Dai (100 papers)
Luc Van Gool (570 papers)

Citations (1,204)

View on Semantic Scholar

Summary

The paper introduces a dual-level domain adaptation approach using adversarial training to reduce discrepancies in both image style and object-instance features.
It demonstrates significant mAP improvements on datasets like Cityscapes, KITTI, and SIM10K, outperforming baseline detection models.
The framework reduces dependency on extensive labeled data, enabling more practical deployment in diverse, real-world environments.

Domain Adaptive Faster R-CNN for Object Detection in the Wild

"Domain Adaptive Faster R-CNN for Object Detection in the Wild" addresses fundamental challenges in object detection posed by domain shifts between training and testing datasets. The paper targets two primary levels of domain shifts: image-level and instance-level. Image-level shifts pertain to variables like image style and illumination, while instance-level shifts involve variations in object appearance and size. The authors build on the Faster R-CNN model and propose enhancing its cross-domain robustness with two domain adaptation components designed to reduce domain discrepancies via adversarial training.

Approach Overview

The authors provide a methodical breakdown of their approach through a probabilistic lens. This perspective aids in understanding the reliance on domain-invariant distributions. They decompose the joint distribution $P(C, B, I)$ (where $C$ is the category, $B$ the bounding box, and $I$ the image representation) into conditional probabilities to understand and counteract the domain shifts.

Image-Level Adaptation: Implemented to handle shifts concerning global image properties such as illumination and style changes. A convolutional feature map is utilized, whereby each activation is associated with a domain classifier predicting whether it belongs to the source or target domain.
Instance-Level Adaptation: This component manages discrepancies in how individual object instances appear in images. The ROI-based feature vectors extracted from proposed regions of interest (ROI) undergo alignment to ensure consistency across domains.

Using $H$ -divergence theory, domain alignment is achieved through adversarial training involving domain classifiers. Moreover, a consistency regularizer links the two domain classifiers to enforce a robust Region Proposal Network (RPN), ensuring the bounding box predictor adapts well across domains.

Empirical Evaluation

The effectiveness of the proposed model, termed "Domain Adaptive Faster R-CNN," is validated on datasets like Cityscapes, KITTI, and SIM10K, focusing on object detection under varying domain shifts:

Learning from Synthetic Data: Experiments transitioning from SIM 10k to Cityscapes show a mean average precision (mAP) improvement, where the proposed method surpasses the baseline with +8.8% on detecting cars.
Adverse Weather Conditions: Using clear-weather Cityscapes data to predict in Foggy Cityscapes, the model achieves a +8.6% mAP improvement over the non-adaptive Faster R-CNN, proving its efficacy in domain shift due to weather changes.
Cross Camera Adaptation: In adapting between KITTI and Cityscapes datasets, the model demonstrates noticeable improvements in detecting 'car' over the baseline, achieving gains on both domains (+8.3% on KITTI to Cityscapes and +10.6% on the reverse).

Implications and Future Directions

This paper has significant implications for practical applications, particularly in scenarios where annotated data is sparse or difficult to procure. The ability to train models robust to domain shifts without additional labeled data reduces reliance on extensive manual annotations, making the deployment of object detection systems more feasible across diverse real-world environments.

Theoretically, the dual-level adaptation framework offers a nuanced approach to handling domain shifts comprehensively. This method can inspire extended research into integrating multi-level domain adaptation mechanisms into various machine learning problems, particularly those influenced by environmental factors and data acquisition inconsistencies.

Conclusion

The methodology and experiments presented in "Domain Adaptive Faster R-CNN for Object Detection in the Wild" contribute a significant advancement in handling domain shifts in object detection. By incorporating adversarial training at both image and instance levels and reinforcing this with consistency regularization, the model proves its effectiveness across different scenarios. The approach's robustness and adaptability pave the way for further research into practical applications and extensions of domain adaptation techniques.

Future Work

Future research could explore more sophisticated consistency regularization techniques or expand the domain adaptation framework to other neural network architectures beyond Faster R-CNN. Moreover, investigating domain adaptation in real-time applications such as video surveillance or autonomous driving under rapidly changing conditions warrants exploration. Further work could also involve incorporating unsupervised or semi-supervised learning techniques to enhance the adaptation process.

Overall, the paper offers a robust and practical approach to improving object detection systems' reliability and performance in the face of domain discrepancies, providing both theoretical insights and practical solutions.