Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Big Self-Supervised Models are Strong Semi-Supervised Learners (2006.10029v2)

Published 17 Jun 2020 in cs.LG, cs.CV, and stat.ML
Big Self-Supervised Models are Strong Semi-Supervised Learners

Abstract: One paradigm for learning from few labeled examples while making best use of a large amount of unlabeled data is unsupervised pretraining followed by supervised fine-tuning. Although this paradigm uses unlabeled data in a task-agnostic way, in contrast to common approaches to semi-supervised learning for computer vision, we show that it is surprisingly effective for semi-supervised learning on ImageNet. A key ingredient of our approach is the use of big (deep and wide) networks during pretraining and fine-tuning. We find that, the fewer the labels, the more this approach (task-agnostic use of unlabeled data) benefits from a bigger network. After fine-tuning, the big network can be further improved and distilled into a much smaller one with little loss in classification accuracy by using the unlabeled examples for a second time, but in a task-specific way. The proposed semi-supervised learning algorithm can be summarized in three steps: unsupervised pretraining of a big ResNet model using SimCLRv2, supervised fine-tuning on a few labeled examples, and distillation with unlabeled examples for refining and transferring the task-specific knowledge. This procedure achieves 73.9% ImageNet top-1 accuracy with just 1% of the labels ($\le$13 labeled images per class) using ResNet-50, a $10\times$ improvement in label efficiency over the previous state-of-the-art. With 10% of labels, ResNet-50 trained with our method achieves 77.5% top-1 accuracy, outperforming standard supervised training with all of the labels.

Big Self-Supervised Models are Strong Semi-Supervised Learners

The paper "Big Self-Supervised Models are Strong Semi-Supervised Learners" by Chen et al. presents an examination of the semi-supervised learning paradigm, focusing on the application of large-scale self-supervised models to the ImageNet dataset. The authors utilize an unsupervised pretraining phase using SimCLRv2 followed by supervised fine-tuning of a large network, leveraging unlabeled data task-agnostically.

Summary of Methods

The authors propose a semi-supervised learning framework consisting of three crucial stages:

  1. Unsupervised Pretraining: Utilizing SimCLRv2, the framework first learns representations from large, unlabeled datasets using a big ResNet architecture.
  2. Supervised Fine-Tuning: The pretrained model is then fine-tuned using a small fraction of labeled examples, explicitly leveraging the previously acquired knowledge.
  3. Distillation with Unlabeled Examples: The fine-tuned large model is distilled into a smaller network using the unlabeled data again but targeted specifically at enhancing classification performance.

Key Findings

  1. Model Size and Label Efficiency: The empirical results suggest that larger models significantly boost label efficiency, showing greater improvements in classification accuracy as the number of labeled examples decreases. Specifically, a ResNet-50 obtained 73.9% top-1 accuracy with only 1% labels, indicating a tenfold enhancement over earlier methods.
  2. Projection Head: Incorporating a deeper projection head in SimCLRv2, and fine-tuning from its middle layer, further enhances both linear evaluation and fine-tuning performance. This decoupling was particularly beneficial in scenarios with fewer labeled examples.
  3. Secondary Utilization of Unlabeled Data: The task-specific distillation phase, akin to pseudo-labeling with a sophisticated architecture, significantly improves the model performance. This distillation yielded a state-of-the-art top-1 accuracy of 76.6% on ImageNet when using just 1% of the labeled data.

Numerical Results

The paper presents compelling numerical results:

  • Using 1% of labeled data, a ResNet-50 model trained with this methodology achieves 73.9% top-1 accuracy, a remarkable improvement from the previous state-of-the-art at 63.0%.
  • With 10% subset labels, the framework accomplishes a 77.5% top-1 accuracy, surpassing fully supervised training of ResNet-50 on the entire dataset.
  • When distilled into smaller architectures, the relative improvement of the task-specific knowledge was consistent without significant reductions in classification accuracy.

Implications

Practical Implications: The efficacy of utilizing large-scale self-supervised learning for semi-supervised tasks has profound practical implications. Industries and research areas concerned with limited labeled data—such as medical imaging, satellite imaging, and other specialized fields—stand to benefit greatly. Efficient use of limited labeled data enables the rapid development of robust models without the prohibitive cost associated with accruing labeled datasets.

Theoretical Implications: From a theoretical standpoint, this research advances the understanding of how network scale and parameter efficiency intersect with unsupervised learning paradigms. The work underscores the importance of model capacity in leveraging unlabeled data for effective representation learning.

Future Directions

Future research avenues highlighted by these findings include:

  • Exploration of more sophisticated task-specific distillation techniques.
  • Improved architecture search for optimizing parameter efficiency alongside model capacity.
  • Application and validation of the discussed techniques across varied large-scale datasets beyond ImageNet, assessing the generalizability and robustness across domains.
  • Investigation into the underlying reasons for the efficacy of large-scale models in semi-supervised learning, potentially informing better regularization techniques.

The detailed analyses and promising results presented by Chen et al. reinforce the utility of large self-supervised models in semi-supervised learning settings, setting a benchmark for future research in the area.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Ting Chen (148 papers)
  2. Simon Kornblith (53 papers)
  3. Kevin Swersky (51 papers)
  4. Mohammad Norouzi (81 papers)
  5. Geoffrey Hinton (38 papers)
Citations (2,085)
Youtube Logo Streamline Icon: https://streamlinehq.com