Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Simple Semi-Supervised Learning Framework for Object Detection (2005.04757v2)

Published 10 May 2020 in cs.CV

Abstract: Semi-supervised learning (SSL) has a potential to improve the predictive performance of machine learning models using unlabeled data. Although there has been remarkable recent progress, the scope of demonstration in SSL has mainly been on image classification tasks. In this paper, we propose STAC, a simple yet effective SSL framework for visual object detection along with a data augmentation strategy. STAC deploys highly confident pseudo labels of localized objects from an unlabeled image and updates the model by enforcing consistency via strong augmentations. We propose experimental protocols to evaluate the performance of semi-supervised object detection using MS-COCO and show the efficacy of STAC on both MS-COCO and VOC07. On VOC07, STAC improves the AP${0.5}$ from $76.30$ to $79.08$; on MS-COCO, STAC demonstrates $2{\times}$ higher data efficiency by achieving 24.38 mAP using only 5\% labeled data than supervised baseline that marks 23.86\% using 10\% labeled data. The code is available at https://github.com/google-research/ssl_detection/.

A Simple Semi-Supervised Learning Framework for Object Detection

The paper "A Simple Semi-Supervised Learning Framework for Object Detection" presents an innovative framework called STAC, which aims to extend the sphere of Semi-Supervised Learning (SSL) from mainly image classification to object detection tasks.

Introduction and Background

The authors address a critical issue in computer vision: the high cost of labeled data for object detection compared to image classification. In SSL, utilizing unlabeled data can significantly enhance model performance. The key idea leveraged is consistency-based self-training, which applies data augmentations to improve robustness in neural networks. Previous successful applications of these methods were primarily limited to image classification tasks.

Proposed Method: STAC

STAC embodies a dual-stage training process. Initially, a teacher model is trained exclusively on available labeled data. This model then generates pseudo labels on the remaining unlabeled data. Subsequently, the model is re-trained with both the labeled data and these pseudo labels, enforcing consistency via strong data augmentations.

The innovations in STAC include:

  1. High-Confidence Pseudo Labeling: The framework uses a high threshold for confidence-based selection of pseudo labels, inspired by techniques such as FixMatch. This ensures high precision of the predictions retained for training.
  2. Augmentation Strategy: STAC employs a robust augmentation pipeline, including global color transformations and geometric transformations, to enforce prediction consistency. Strategies like Cutout are used to further enhance the model's robustness.
  3. Experimental Protocols: To evaluate STAC’s efficacy, distinct protocols using MS-COCO and PASCAL VOC datasets were developed, notably demonstrating its impact in low-data regimes.

Numerical Results

STAC's effectiveness is validated by strong numerical outcomes. On MS-COCO, it achieves a mAP of 24.38 using only 5% labeled data, outperforming supervised baselines that use twice as much labeled data. With VOC07, STAC enhances AP0.5^{0.5} from 76.30 to 79.08, showcasing a significant boost in data efficiency.

Implications and Future Directions

Practically, STAC provides a scalable solution, improving data efficiency in object detection tasks where acquiring labeled data is costly. Theoretically, it opens up future development paths in SSL methodologies applied to more complex vision tasks.

Future research could look into the optimization of pseudo labeling and augmentation strategies, as well as extending STAC's principles to other domains in computer vision. Investigating the interplay between model architecture choice and SSL efficacy could also yield insightful outcomes.

Conclusion

In sum, this paper presents a well-structured approach to extend semi-supervised learning into object detection, delivering compelling results through a blend of self-training and sophisticated augmentations. The STAC framework is a promising direction for researchers focusing on label-efficient learning strategies in AI.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Kihyuk Sohn (54 papers)
  2. Zizhao Zhang (44 papers)
  3. Chun-Liang Li (60 papers)
  4. Han Zhang (338 papers)
  5. Chen-Yu Lee (48 papers)
  6. Tomas Pfister (89 papers)
Citations (456)
Github Logo Streamline Icon: https://streamlinehq.com