Harmonizing Transferability and Discriminability for Adapting Object Detectors (2003.06297v1)

Published 13 Mar 2020 in cs.CV

Abstract: Recent advances in adaptive object detection have achieved compelling results in virtue of adversarial feature adaptation to mitigate the distributional shifts along the detection pipeline. Whilst adversarial adaptation significantly enhances the transferability of feature representations, the feature discriminability of object detectors remains less investigated. Moreover, transferability and discriminability may come at a contradiction in adversarial adaptation given the complex combinations of objects and the differentiated scene layouts between domains. In this paper, we propose a Hierarchical Transferability Calibration Network (HTCN) that hierarchically (local-region/image/instance) calibrates the transferability of feature representations for harmonizing transferability and discriminability. The proposed model consists of three components: (1) Importance Weighted Adversarial Training with input Interpolation (IWAT-I), which strengthens the global discriminability by re-weighting the interpolated image-level features; (2) Context-aware Instance-Level Alignment (CILA) module, which enhances the local discriminability by capturing the underlying complementary effect between the instance-level feature and the global context information for the instance-level feature alignment; (3) local feature masks that calibrate the local transferability to provide semantic guidance for the following discriminative pattern alignment. Experimental results show that HTCN significantly outperforms the state-of-the-art methods on benchmark datasets.

Citations (247)

View on Semantic Scholar

Summary

The paper introduces a Hierarchical Transferability Calibration Network (HTCN) that balances feature transferability and discriminability for unsupervised domain adaptation in object detection.
It employs importance weighted adversarial training, context-aware instance-level alignment, and local feature masks to calibrate multi-level features.
Experimental results on benchmarks like Cityscapes and PASCAL show HTCN achieving competitive performance, nearly matching supervised baselines.

Harmonizing Transferability and Discriminability for Adapting Object Detectors

The paper "Harmonizing Transferability and Discriminability for Adapting Object Detectors" introduces a novel approach to address the challenges encountered in unsupervised domain adaptation (UDA) for object detection. The researchers propose a Hierarchical Transferability Calibration Network (HTCN) to balance the often conflicting objectives of transferability and discriminability when adapting object detectors from a labeled source domain to an unlabeled target domain.

Problem Statement and Contributions

Modern object detectors, although successful in many domains, suffer from a significant performance drop when applied directly to new, unseen domains due to distributional shifts. The paper identifies that while adversarial adaptation enhances the transferability of feature representations, the feature discriminability remains underexplored. Moreover, adversarial adaptation can sometimes negatively impact discriminability due to differing scene layouts and object compositions between domains.

HTCN is designed to harmonize the transferability and discriminability of object detectors by hierarchically calibrating the feature representations at different levels, namely local-region, image, and instance. The key components of HTCN are:

Importance Weighted Adversarial Training with Input Interpolation (IWAT-I): This component augments discriminability by re-weighting interpolated image-level features based on their transferability. Images with higher uncertainty about their domain association contribute more prominently to the model learning process.
Context-aware Instance-Level Alignment (CILA): The CILA module improves local discriminability by fusing instance-level features with global context information. This integration is achieved through a tensor product that facilitates informative interactions between features, providing a more consistent instance-level alignment.
Local Feature Masks: These masks guide the semantic consistency by identifying and emphasizing more informative and descriptive regions within an image, effectively reinforcing the discriminability of such regions during the alignment process.

Experimental Results

The experimental validations on benchmark datasets such as Cityscapes to Foggy-Cityscapes, PASCAL to Clipart, and Sim10K to Cityscapes demonstrate that HTCN achieves superior performance compared to existing state-of-the-art methods. Notably, the authors report achieving competitive results even close to supervised learning baselines on specific benchmarks.

Implications and Future Directions

The approach filers a significant advancement in domain adaptation for object detection by effectively addressing the trade-off between transferability and discriminability. The hierarchical approach to feature calibration offers a promising pathway for enhancing the robustness and versatility of object detectors across varying domains.

Looking forward, this work paves the way for further exploration into hierarchical and multi-level feature adaptation strategies. Potential future directions could involve extending these methods to more complex and varied domain adaptation challenges, including those involving more drastic environmental changes or when adapting across extremely diverse visual domains.

Overall, this paper provides a substantial contribution to the domain adaptation literature, proposing a methodologically sound and effective strategy for object detection tasks. The clear demonstration of performance gains substantiates the importance of balancing transferability and discriminability, which may inspire further innovations in adaptive learning paradigms.