Cross-domain Detection via Graph-induced Prototype Alignment (2003.12849v1)

Published 28 Mar 2020 in cs.CV

Abstract: Applying the knowledge of an object detector trained on a specific domain directly onto a new domain is risky, as the gap between two domains can severely degrade model's performance. Furthermore, since different instances commonly embody distinct modal information in object detection scenario, the feature alignment of source and target domain is hard to be realized. To mitigate these problems, we propose a Graph-induced Prototype Alignment (GPA) framework to seek for category-level domain alignment via elaborate prototype representations. In the nutshell, more precise instance-level features are obtained through graph-based information propagation among region proposals, and, on such basis, the prototype representation of each class is derived for category-level domain alignment. In addition, in order to alleviate the negative effect of class-imbalance on domain adaptation, we design a Class-reweighted Contrastive Loss to harmonize the adaptation training process. Combining with Faster R-CNN, the proposed framework conducts feature alignment in a two-stage manner. Comprehensive results on various cross-domain detection tasks demonstrate that our approach outperforms existing methods with a remarkable margin. Our code is available at https://github.com/ChrisAllenMing/GPA-detection.

PDF Abstract

Cross-domain Detection via Graph-induced Prototype Alignment: A Detailed Examination

The paper "Cross-domain Detection via Graph-induced Prototype Alignment" addresses the persistent challenge in applying object detectors across domains with divergent characteristics. The authors introduce a novel framework called Graph-induced Prototype Alignment (GPA), which enhances domain adaptation by focusing on category-level domain alignment facilitated through prototype representations.

Core Methodology

The central innovation of the paper lies in leveraging graph structures to align prototypes between the source and target domains. Instead of aligning features at the instance level, where domain shifts can cause significant misalignment, the GPA framework focuses on fine-tuning category-level representations.

Graph-based Information Propagation: The authors employ a graph-based approach to refine instance-level features among region proposals. By constructing a relation graph that considers the spatial relationships between proposals, more precise features are obtained. The adjacency matrix is calculated based on Intersection over Union (IoU), capturing both positional and size-related relationships.
Prototype Representation: Using these refined features, prototype representations for each class are generated. These serve as the average embedding incorporating multi-modal information from various instances. Prototypes act as a stable reference point to facilitate cross-domain alignment.
Domain Alignment: Through a Class-reweighted Contrastive Loss, the authors ensure that prototypes from the source and target domains are aligned while maintaining inter-class separability. The loss function emphasizes balancing class representation, particularly addressing the issue of class imbalance that often plagues multi-class detection tasks.
Two-stage Alignment: The methodology integrates seamlessly into a Faster R-CNN architecture, conducting alignment both at the RPN (Region Proposal Network) stage and the RCNN (Region-based CNN) stage. The two-stage alignment strategy aids in progressively improving alignment accuracy from more generalized foreground/background separation to precise category-specific alignment.

Experimental Evaluation

The framework has been rigorously tested across several benchmarks, including Normal to Foggy, Synthetic to Real, and Cross Camera Adaptation tasks. The results consistently show that the GPA method offers superior performance over existing cross-domain detection frameworks.

On the Cityscapes to Foggy Cityscapes benchmark, the proposed approach achieved substantial gains in mAP, indicating improved robustness to environmental noise such as fog.
Evaluations across Synthetic to Real transitions demonstrate that GPA's approach in harnessing prototype representations can bridge the domain gap effectively in varied visual styles.

Implications and Future Directions

The GPA framework exhibits critical advancements in cross-domain object detection by embedding graph-induced prototype alignment within a popular detection architecture. The implication of this work extends to enhancing real-world applications where models trained in controlled environments need to maintain efficacy when deployed in diverse, uncontrolled settings.

In future developments, one could explore extending the graph-based aggregation approach to incorporate temporal information, particularly for video data, which could further enhance multi-frame detection consistency. Additionally, investigating automatic hyperparameter tuning mechanisms for the graph construction and contrastive loss could optimize deployment for varied domain adaptation scenarios.

In summary, this paper provides a comprehensive strategy for addressing cross-domain shifts in object detection, offering a substantial improvement in leveraging graph structures for prototype-based alignment at a category level. The work presents a significant contribution to the field by fostering more adaptable and robust detection models across different domains.

PDF Markdown Bookmark Chat (Pro)

Authors (5)

Minghao Xu (25 papers)
Hang Wang (84 papers)
Bingbing Ni (95 papers)
Qi Tian (314 papers)
Wenjun Zhang (160 papers)

Citations (224)

View on Semantic Scholar

Related Papers

Find Related Papers

GitHub

GitHub - ChrisAllenMing/GPA-detection: Implementation of Cross-domain Detection via Graph-induced Prototype Alignment (CVPR 2020 Oral). (140 stars)