Rethinking Pre-training and Self-training (2006.06882v2)

Published 11 Jun 2020 in cs.CV, cs.LG, and stat.ML

Abstract: Pre-training is a dominant paradigm in computer vision. For example, supervised ImageNet pre-training is commonly used to initialize the backbones of object detection and segmentation models. He et al., however, show a surprising result that ImageNet pre-training has limited impact on COCO object detection. Here we investigate self-training as another method to utilize additional data on the same setup and contrast it against ImageNet pre-training. Our study reveals the generality and flexibility of self-training with three additional insights: 1) stronger data augmentation and more labeled data further diminish the value of pre-training, 2) unlike pre-training, self-training is always helpful when using stronger data augmentation, in both low-data and high-data regimes, and 3) in the case that pre-training is helpful, self-training improves upon pre-training. For example, on the COCO object detection dataset, pre-training benefits when we use one fifth of the labeled data, and hurts accuracy when we use all labeled data. Self-training, on the other hand, shows positive improvements from +1.3 to +3.4AP across all dataset sizes. In other words, self-training works well exactly on the same setup that pre-training does not work (using ImageNet to help COCO). On the PASCAL segmentation dataset, which is a much smaller dataset than COCO, though pre-training does help significantly, self-training improves upon the pre-trained model. On COCO object detection, we achieve 54.3AP, an improvement of +1.5AP over the strongest SpineNet model. On PASCAL segmentation, we achieve 90.5 mIOU, an improvement of +1.5% mIOU over the previous state-of-the-art result by DeepLabv3+.

Citations (617)

View on Semantic Scholar

Summary

The paper presents experimental evidence showing that self-training boosts COCO object detection by +1.3 to +3.4AP even when strong augmentations are applied.
The paper reveals that conventional pre-training can harm performance by reducing AP by up to -1.0 on COCO under intensified data augmentations.
The paper demonstrates that self-training achieves state-of-the-art results, delivering 54.3AP on COCO and 90.5 mIOU on PASCAL segmentation.

Rethinking Pre-training and Self-training

The paper "Rethinking Pre-training and Self-training" tackles a pressing issue in the domain of computer vision: the effectiveness of pre-training versus self-training paradigms. It challenges the prevailing reliance on ImageNet pre-training, especially in tasks like object detection and segmentation, and investigates the potential of self-training as an alternative approach.

Key Insights

The authors present a detailed experimental analysis contrasting pre-training with self-training, revealing key insights:

Data Augmentation and Pre-training: The analysis indicates that stronger data augmentation and more labeled data reduce the efficacy of pre-training. Pre-training can negatively impact performance, showing a decline of up to -1.0AP on COCO object detection when stronger augmentations are applied.
Resilience of Self-training: Unlike pre-training, self-training remains beneficial under strong data augmentations and across various data regimes. Notably, it yields consistent improvements from +1.3 to +3.4AP on COCO across all data sizes, even when pre-training detracts from overall accuracy.
Enhanced Performance Metrics: On the COCO dataset, the application of self-training with efficient combinations of architectures and data sources resulted in a significant boost, achieving 54.3AP—an improvement of +1.5AP over the leading SpineNet model. On PASCAL segmentation, a notable 90.5 mIOU was recorded, surpassing the previous state-of-the-art achieved by DeepLabv3+.

Implications

The findings of this research have several notable implications:

Redefining Paradigms: Self-training proves to be a versatile and robust training paradigm, offering potential as a primary approach over traditional pre-training methods.
Reduction in Dependency: The decreased reliance on pre-trained models, especially in scenarios with extensive data augmentations or dataset sizes, could lead to more efficient training processes and resource utilization.
Task Alignment: The research underscores the significance of task alignment in data training paradigms, suggesting that self-training aligns more closely with target tasks compared to conventional pre-training.

Future Developments

Looking ahead, the paper opens avenues for further investigation in several areas:

Extension to Other Domains: Exploring self-training applications beyond computer vision to fields like natural language processing and bioinformatics might uncover more extensive benefits.
Hybrid Approaches: Combining pre-training, self-training, and joint-training could yield hybrid models that leverage strengths from each method, offering a holistic training approach.
Advanced Model Architectures: The exploration of novel architectures in conjunction with self-training could enhance model robustness and performance across diverse datasets and tasks.

In summary, this paper underscores the promise of self-training as a method that not only complements but also potentially surpasses pre-training in effectiveness under certain conditions. It serves as a call to the computer vision research community to revise the standing paradigms and embrace self-training as a versatile and scalable method for improved model performance.

PDF Markdown

Related Papers

YouTube

Show All Videos