Revisiting Weakly Supervised Pre-Training of Visual Perception Models (2201.08371v2)
Abstract: Model pre-training is a cornerstone of modern visual recognition systems. Although fully supervised pre-training on datasets like ImageNet is still the de-facto standard, recent studies suggest that large-scale weakly supervised pre-training can outperform fully supervised approaches. This paper revisits weakly-supervised pre-training of models using hashtag supervision with modern versions of residual networks and the largest-ever dataset of images and corresponding hashtags. We study the performance of the resulting models in various transfer-learning settings including zero-shot transfer. We also compare our models with those obtained via large-scale self-supervised learning. We find our weakly-supervised models to be very competitive across all settings, and find they substantially outperform their self-supervised counterparts. We also include an investigation into whether our models learned potentially troubling associations or stereotypes. Overall, our results provide a compelling argument for the use of weakly supervised learning in the development of visual recognition systems. Our models, Supervised Weakly through hashtAGs (SWAG), are available publicly.
- Mannat Singh (13 papers)
- Laura Gustafson (11 papers)
- Aaron Adcock (10 papers)
- Vinicius de Freitas Reis (1 paper)
- Bugra Gedik (8 papers)
- Raj Prateek Kosaraju (3 papers)
- Dhruv Mahajan (38 papers)
- Ross Girshick (75 papers)
- Piotr Dollár (49 papers)
- Laurens van der Maaten (54 papers)