Efficient Teacher: Semi-Supervised Object Detection for YOLOv5 (2302.07577v3)

Published 15 Feb 2023 in cs.CV

Abstract: Semi-Supervised Object Detection (SSOD) has been successful in improving the performance of both R-CNN series and anchor-free detectors. However, one-stage anchor-based detectors lack the structure to generate high-quality or flexible pseudo labels, leading to serious inconsistency problems in SSOD. In this paper, we propose the Efficient Teacher framework for scalable and effective one-stage anchor-based SSOD training, consisting of Dense Detector, Pseudo Label Assigner, and Epoch Adaptor. Dense Detector is a baseline model that extends RetinaNet with dense sampling techniques inspired by YOLOv5. The Efficient Teacher framework introduces a novel pseudo label assignment mechanism, named Pseudo Label Assigner, which makes more refined use of pseudo labels from Dense Detector. Epoch Adaptor is a method that enables a stable and efficient end-to-end semi-supervised training schedule for Dense Detector. The Pseudo Label Assigner prevents the occurrence of bias caused by a large number of low-quality pseudo labels that may interfere with the Dense Detector during the student-teacher mutual learning mechanism, and the Epoch Adaptor utilizes domain and distribution adaptation to allow Dense Detector to learn globally distributed consistent features, making the training independent of the proportion of labeled data. Our experiments show that the Efficient Teacher framework achieves state-of-the-art results on VOC, COCO-standard, and COCO-additional using fewer FLOPs than previous methods. To the best of our knowledge, this is the first attempt to apply Semi-Supervised Object Detection to YOLOv5.Code is available: https://github.com/AlibabaResearch/efficientteacher

PDF Abstract

Efficient Teacher: Advancements in Semi-Supervised Object Detection for YOLOv5

The paper, titled "Efficient Teacher: Semi-Supervised Object Detection for YOLOv5," introduces a novel framework aimed at addressing the longstanding challenges in Semi-Supervised Object Detection (SSOD), particularly focusing on one-stage anchor-based detectors. The Efficient Teacher framework is notably composed of three core components: Dense Detector, Pseudo Label Assigner (PLA), and Epoch Adaptor (EA).

Key Contributions

The research identifies significant obstacles in SSOD, especially the challenge of generating high-quality pseudo labels for one-stage anchor-based detectors. The authors propose the Dense Detector, an extension of RetinaNet incorporating dense sampling techniques from YOLOv5, to enhance pseudo label quality and inference efficiency.

The Pseudo Label Assigner is introduced to manage pseudo label inconsistency by stratifying pseudo labels into reliable and uncertain categories. This differentiation allows for a nuanced approach that mitigates biases from low-quality pseudo labels, thereby enhancing model performance during the semi-supervised training phase.

To improve training efficiency, the Epoch Adaptor is introduced. It utilizes domain and distribution adaptation techniques to create a stable, end-to-end SSOD training schedule, effectively bridging the gap between labeled and unlabeled domain distributions.

Empirical Evaluation

Experiments conducted on the VOC, COCO-standard, and COCO-additional datasets indicate that the Efficient Teacher framework achieves state-of-the-art results with reduced computational requirements, evidenced by lower FLOPs compared to existing methods.

The Dense Detector demonstrated a notable improvement of 5.36 in $AP_{50:95}$ over conventional RetinaNet by leveraging dense sampling.
The Efficient Teacher was applied to YOLOv5, achieving an $AP_{50:95}$ improvement comparable to robust two-stage detection frameworks while maintaining computational efficiency.

Theoretical and Practical Implications

The introduction of a sophisticated label assignment mechanism through PLA addresses a crucial aspect of SSOD by refining pseudo label utilization. This advancement is theoretically significant, offering a model that adapts dynamically to the inconsistencies in pseudo label generation inherent in one-stage anchor-based detectors.

Practically, the Efficient Teacher framework enhances the efficacy of one-stage detectors like YOLOv5, which are commonly deployed in real-world scenarios due to their speed and simplicity. By integrating EA, the framework promotes efficient, rapid convergence, further facilitating deployment in resource-constrained environments.

Future Directions

The framework paves the way for more sophisticated SSOD approaches that leverage dense sampling and adaptative label strategies. Future research may explore the extension of these techniques across various architectures and domains, including real-time applications and other object-detection paradigms.

Moreover, expanding upon the domain and distribution adaptation techniques may offer deeper insights into their impacts across diverse datasets and detector types. The exploration of additional augmentation strategies or enhanced pseudo label scoring systems could further optimize the precision and recall metrics achievable with similar frameworks.

In summary, the Efficient Teacher framework presents a substantial advancement in semi-supervised detection, addressing critical gaps in the application of SSOD to one-stage anchor-based detectors, especially within the context of YOLOv5. This work demonstrates a meaningful stride in improving both the accuracy and efficiency of semi-supervised learning paradigms in object detection tasks.