USB: A Unified Semi-supervised Learning Benchmark for Classification (2208.07204v2)

Published 12 Aug 2022 in cs.LG, cs.AI, and cs.CV

Abstract: Semi-supervised learning (SSL) improves model generalization by leveraging massive unlabeled data to augment limited labeled samples. However, currently, popular SSL evaluation protocols are often constrained to computer vision (CV) tasks. In addition, previous work typically trains deep neural networks from scratch, which is time-consuming and environmentally unfriendly. To address the above issues, we construct a Unified SSL Benchmark (USB) for classification by selecting 15 diverse, challenging, and comprehensive tasks from CV, NLP, and audio processing (Audio), on which we systematically evaluate the dominant SSL methods, and also open-source a modular and extensible codebase for fair evaluation of these SSL methods. We further provide the pre-trained versions of the state-of-the-art neural models for CV tasks to make the cost affordable for further tuning. USB enables the evaluation of a single SSL algorithm on more tasks from multiple domains but with less cost. Specifically, on a single NVIDIA V100, only 39 GPU days are required to evaluate FixMatch on 15 tasks in USB while 335 GPU days (279 GPU days on 4 CV datasets except for ImageNet) are needed on 5 CV tasks with TorchSSL.

Citations (94)

View on Semantic Scholar

Summary

The paper demonstrates that integrating pre-trained models enables efficient SSL evaluation across multiple domains.
The benchmark’s 15 diverse tasks in CV, NLP, and audio illustrate the importance of cross-domain evaluation for SSL algorithms.
The findings reveal significant computational savings and expose inconsistent gains from unlabeled data compared to supervised methods.

USB: A Unified Semi-supervised Learning Benchmark for Classification

Overview

The paper introduces USB, a comprehensive benchmark designed to evaluate semi-supervised learning (SSL) methods across diverse domains, including computer vision (CV), NLP, and audio processing. It addresses key limitations observed in existing SSL protocols, which often constrain evaluations to CV tasks and necessitate training deep networks from scratch—an approach both costly and environmentally taxing.

Benchmark Design

USB comprises 15 tasks spanning three domains, aimed at providing a rigorous testing ground for SSL methods. Key highlights include:

Task Diversity and Challenge: The benchmark encompasses a range of complex and varied datasets from CV, NLP, and audio, ensuring comprehensive assessment and fostering generalization across different types of data.
Pre-trained Models: To mitigate the computational demands of SSL, the benchmark incorporates pre-trained state-of-the-art neural models, particularly for CV tasks. This facilitates further tuning at a reduced cost, significantly lowering the environmental and time impact compared to training from scratch.

Numerical Results

The paper reports a substantial reduction in computational resources required to evaluate SSL methods using USB. Evaluating the FixMatch algorithm on USB's 15 tasks demands only 39 GPU days compared to the 335 GPU days required for a limited 5-task evaluation in typical benchmarks.

Key Findings

The paper highlights several insights from evaluating 14 SSL algorithms using USB:

Cross-Domain Evaluation: Introducing diverse tasks across multiple domains is crucial for accurate assessment of SSL algorithms, as performance can vary depending on domain-specific data characteristics.
Effectiveness of Pre-training: The adoption of pre-trained models not only improves efficiency but also enhances the generalization capability of SSL algorithms. This is particularly evident in the improved performance and faster convergence of models like Vision Transformers (ViTs) compared to training from scratch.
Inconsistent Gains from Unlabeled Data: SSL does not always outperform purely supervised methods, especially when labeled data is scarce. This observation indicates potential for further research in robust SSL methods capable of consistent improvements.

Implications and Future Directions

The development of USB presents practical implications for researchers by making SSL evaluation more accessible, cost-effective, and environmentally sustainable. The modular and extensible nature of the benchmark's codebase encourages further community-driven innovation and adaptation to include new algorithms and domains.

Theoretical implications suggest a re-evaluation of SSL methodologies within the context of diverse domain landscapes. The paper's findings invite exploration into domain-specific challenges and optimization strategies.

Future work may focus on extending USB with emerging SSL research areas such as open-set SSL, semi-supervised regression, and imbalanced SSL. Additionally, exploring robust SSL in scenarios with extremely limited labeled data or heavy class imbalance presents intriguing research opportunities.

Conclusion

USB sets a new standard in the evaluation of semi-supervised learning algorithms by integrating pre-trained models, supporting cross-domain tasks, and providing an infrastructure that honors environmental and computational constraints. It fosters an open, collaborative environment for future advances and broader applicability of SSL techniques across various fields.

PDF Markdown

Related Papers

GitHub

GitHub - microsoft/Semi-supervised-learning: A Unified Semi-Supervised Learning Codebase (NeurIPS'22) (1,233 stars)

YouTube

Show All Videos