Instant-Teaching: An End-to-End Semi-Supervised Object Detection Framework (2103.11402v1)

Published 21 Mar 2021 in cs.CV and cs.AI

Abstract: Supervised learning based object detection frameworks demand plenty of laborious manual annotations, which may not be practical in real applications. Semi-supervised object detection (SSOD) can effectively leverage unlabeled data to improve the model performance, which is of great significance for the application of object detection models. In this paper, we revisit SSOD and propose Instant-Teaching, a completely end-to-end and effective SSOD framework, which uses instant pseudo labeling with extended weak-strong data augmentations for teaching during each training iteration. To alleviate the confirmation bias problem and improve the quality of pseudo annotations, we further propose a co-rectify scheme based on Instant-Teaching, denoted as Instant-Teaching$^*$. Extensive experiments on both MS-COCO and PASCAL VOC datasets substantiate the superiority of our framework. Specifically, our method surpasses state-of-the-art methods by 4.2 mAP on MS-COCO when using $2\%$ labeled data. Even with full supervised information of MS-COCO, the proposed method still outperforms state-of-the-art methods by about 1.0 mAP. On PASCAL VOC, we can achieve more than 5 mAP improvement by applying VOC07 as labeled data and VOC12 as unlabeled data.

PDF Abstract

Insights into Semi-Supervised Object Detection Using Instant-Teaching Framework

The paper, "Instant-Teaching: An End-to-End Semi-Supervised Object Detection Framework," presents a novel approach to object detection that mitigates the limitations of traditional supervised methods, which rely heavily on extensive labeled datasets. Semi-supervised learning (SSL) techniques in object detection have the potential to exploit unlabeled data, thus reducing the dependency on manual annotations. This research introduces the Instant-Teaching framework, an end-to-end system designed to enhance semi-supervised object detection (SSOD) by integrating instantaneous pseudo-annotations with nuanced data augmentations during each training iteration.

Methodology and Framework

The Instant-Teaching framework resolves key issues in SSOD by using an approach that continuously updates pseudo annotations generated from unlabeled data, rather than relying on static annotations from pre-trained teacher models. This advancement mitigates the confirmation bias prevalent in other methods, like STAC, where pseudo-annotations are not updated during training. The proposed framework leverages weak-strong data augmentations—a scheme critical for maintaining consistent predictions between different augmented data, thereby reinforcing the learning from pseudo-annotations.

Further refinement is achieved through the Instant-Teaching $^*$ variant, which implements a co-rectify mechanism. This involves training two identical models simultaneously but independently, aiding in rectifying incorrect predictions through mutual refinement processes.

Experimental Insights

Experiments conducted on MS-COCO and PASCAL VOC datasets demonstrate the efficacy of the Instant-Teaching framework. Notably, the framework improves performance by substantial margins over existing methods. On MS-COCO, using only $2\%$ of labeled data, Instant-Teaching achieves an improvement of 4.2 mAP over state-of-the-art benchmarks. This enhancement underscores the framework’s ability to effectively leverage unlabeled data and improve prediction quality, even with limited supervision.

In PASCAL VOC experiments, Instant-Teaching achieves mAP improvements exceeding 5 points using VOC07 as labeled data and VOC12 as unlabeled data. These results reflect the robustness and adaptability of the framework in varying data environments.

Implications and Future Directions

The Instant-Teaching framework's ability to enhance SSOD underscores its potential for practical applications where labeled data is scarce or expensive to obtain. Its efficiency and simplicity indicate broader applicability across varying object detection models beyond the tested two-stage Faster-RCNN architecture. Future research could explore the integration of more complex data augmentations, continuous pseudo-annotations during all training phases, and applications to single-stage models like SSD or FCOS.

By significantly improving object detection performance, Instant-Teaching marks a pivotal advancement in SSOD, with theoretical and practical implications spanning numerous fields requiring efficient and accurate object detection solutions. Emphasis on real-time pseudo-annotation updates and innovative co-rectification approaches opens pathways for continued advancements in AI detection capabilities.

PDF Markdown Bookmark Chat (Pro)

Authors (5)

Qiang Zhou (124 papers)
Chaohui Yu (29 papers)
Zhibin Wang (53 papers)
Qi Qian (54 papers)
Hao Li (803 papers)

Citations (178)

View on Semantic Scholar

Instant-Teaching: An End-to-End Semi-Supervised Object Detection Framework (2103.11402v1)

Insights into Semi-Supervised Object Detection Using Instant-Teaching Framework

Methodology and Framework

Experimental Insights

Implications and Future Directions

Related Papers