Semi-DETR: Semi-Supervised Object Detection with Detection Transformers
In the context of semi-supervised object detection (SSOD), the paper titled "Semi-DETR: Semi-Supervised Object Detection with Detection Transformers" proposes a novel framework, Semi-DETR, integrating detection transformers with semi-supervised learning methodologies. This work addresses several inherent challenges associated with DETR-based frameworks in SSOD, such as inefficiencies arising from one-to-one assignment strategies and the non-deterministic correspondence between input queries and prediction outputs.
Core Contributions
The paper introduces several innovative components within the Semi-DETR framework:
- Stage-wise Hybrid Matching (SHM): This approach combines one-to-many and one-to-one assignment strategies. Initially, the one-to-many assignment is employed to mitigate learning inefficiencies caused by inaccurate pseudo-labels, enhancing training efficiency. Subsequently, the one-to-one assignment is reintroduced, leveraging high-quality pseudo-labels generated from the earlier stage.
- Cross-view Query Consistency (CQC): This component is designed to overcome the absence of deterministic correspondence in DETR-based detectors. By embedding cross-view object queries, the framework learns the semantic invariance of object queries across different augmented views.
- Cost-based Pseudo Label Mining (CPM): This method dynamically mines pseudo labels based on Gaussian Mixture Model (GMM)-derived costs, to effectively filter out low-quality pseudo boxes, thereby optimizing the consistency training process.
Performance and Results
Extensive experiments conducted in this paper demonstrate the superior performance of Semi-DETR compared to state-of-the-art (SOTA) SSOD methods. Specifically, on both COCO and Pascal VOC benchmark datasets, Semi-DETR achieves significant improvements. On the COCO-Partial setting, for example, it outperforms Dense Teacher and PseCo by 2.82 mAP and 2.77 mAP, respectively, under the 1% labeled setting with Deformable DETR. In a full COCO setting, Semi-DETR achieves 47.2 mAP using Deformable DETR, notably surpassing PseCo by 1.1 mAP. Further performance boosts are observed with stronger DETR variants like DINO.
Implications and Future Research
The Semi-DETR framework addresses crucial gaps in applying transformer-based models to semi-supervised domains, particularly by enhancing training efficiency and introducing novel consistency regularization schemes. This refined approach suggests several avenues for future research in SSOD, such as exploring more sophisticated pseudo-labeling approaches and enhancing DETR-based consistency methodologies.
While confirming the effectiveness of combining assignment strategies, the methodology still presents the challenge of balancing performance efficiency with the elimination of post-process components like Non-Maximum Suppression (NMS). Future work might focus on minimizing performance gaps without compromising the end-to-end detection capabilities.
This research contributes notably to the SSOD field, suggesting that with appropriate handling of assignment strategies and consistency regularization, DETR-based models can achieve impressive results in semi-supervised learning contexts.