Semi-DETR: Semi-Supervised Object Detection with Detection Transformers (2307.08095v1)

Published 16 Jul 2023 in cs.CV

Abstract: We analyze the DETR-based framework on semi-supervised object detection (SSOD) and observe that (1) the one-to-one assignment strategy generates incorrect matching when the pseudo ground-truth bounding box is inaccurate, leading to training inefficiency; (2) DETR-based detectors lack deterministic correspondence between the input query and its prediction output, which hinders the applicability of the consistency-based regularization widely used in current SSOD methods. We present Semi-DETR, the first transformer-based end-to-end semi-supervised object detector, to tackle these problems. Specifically, we propose a Stage-wise Hybrid Matching strategy that combines the one-to-many assignment and one-to-one assignment strategies to improve the training efficiency of the first stage and thus provide high-quality pseudo labels for the training of the second stage. Besides, we introduce a Crossview Query Consistency method to learn the semantic feature invariance of object queries from different views while avoiding the need to find deterministic query correspondence. Furthermore, we propose a Cost-based Pseudo Label Mining module to dynamically mine more pseudo boxes based on the matching cost of pseudo ground truth bounding boxes for consistency training. Extensive experiments on all SSOD settings of both COCO and Pascal VOC benchmark datasets show that our Semi-DETR method outperforms all state-of-the-art methods by clear margins. The PaddlePaddle version code1 is at https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/semi_det/semi_detr.

Authors (9)

Jiacheng Zhang (52 papers)
Xiangru Lin (10 papers)
Wei Zhang (1489 papers)
Kuo Wang (9 papers)
Xiao Tan (75 papers)
Junyu Han (53 papers)
Errui Ding (156 papers)
Jingdong Wang (236 papers)
Guanbin Li (177 papers)

Citations (28)

View on Semantic Scholar

Summary

Semi-DETR: Semi-Supervised Object Detection with Detection Transformers

In the context of semi-supervised object detection (SSOD), the paper titled "Semi-DETR: Semi-Supervised Object Detection with Detection Transformers" proposes a novel framework, Semi-DETR, integrating detection transformers with semi-supervised learning methodologies. This work addresses several inherent challenges associated with DETR-based frameworks in SSOD, such as inefficiencies arising from one-to-one assignment strategies and the non-deterministic correspondence between input queries and prediction outputs.

Core Contributions

The paper introduces several innovative components within the Semi-DETR framework:

Stage-wise Hybrid Matching (SHM): This approach combines one-to-many and one-to-one assignment strategies. Initially, the one-to-many assignment is employed to mitigate learning inefficiencies caused by inaccurate pseudo-labels, enhancing training efficiency. Subsequently, the one-to-one assignment is reintroduced, leveraging high-quality pseudo-labels generated from the earlier stage.
Cross-view Query Consistency (CQC): This component is designed to overcome the absence of deterministic correspondence in DETR-based detectors. By embedding cross-view object queries, the framework learns the semantic invariance of object queries across different augmented views.
Cost-based Pseudo Label Mining (CPM): This method dynamically mines pseudo labels based on Gaussian Mixture Model (GMM)-derived costs, to effectively filter out low-quality pseudo boxes, thereby optimizing the consistency training process.

Performance and Results

Extensive experiments conducted in this paper demonstrate the superior performance of Semi-DETR compared to state-of-the-art (SOTA) SSOD methods. Specifically, on both COCO and Pascal VOC benchmark datasets, Semi-DETR achieves significant improvements. On the COCO-Partial setting, for example, it outperforms Dense Teacher and PseCo by 2.82 mAP and 2.77 mAP, respectively, under the 1% labeled setting with Deformable DETR. In a full COCO setting, Semi-DETR achieves 47.2 mAP using Deformable DETR, notably surpassing PseCo by 1.1 mAP. Further performance boosts are observed with stronger DETR variants like DINO.

Implications and Future Research

The Semi-DETR framework addresses crucial gaps in applying transformer-based models to semi-supervised domains, particularly by enhancing training efficiency and introducing novel consistency regularization schemes. This refined approach suggests several avenues for future research in SSOD, such as exploring more sophisticated pseudo-labeling approaches and enhancing DETR-based consistency methodologies.

While confirming the effectiveness of combining assignment strategies, the methodology still presents the challenge of balancing performance efficiency with the elimination of post-process components like Non-Maximum Suppression (NMS). Future work might focus on minimizing performance gaps without compromising the end-to-end detection capabilities.

This research contributes notably to the SSOD field, suggesting that with appropriate handling of assignment strategies and consistency regularization, DETR-based models can achieve impressive results in semi-supervised learning contexts.

PDF Markdown

Related Papers

Find Related Papers