How Well Do Self-Supervised Models Transfer? (2011.13377v2)

Published 26 Nov 2020 in cs.CV

Abstract: Self-supervised visual representation learning has seen huge progress recently, but no large scale evaluation has compared the many models now available. We evaluate the transfer performance of 13 top self-supervised models on 40 downstream tasks, including many-shot and few-shot recognition, object detection, and dense prediction. We compare their performance to a supervised baseline and show that on most tasks the best self-supervised models outperform supervision, confirming the recently observed trend in the literature. We find ImageNet Top-1 accuracy to be highly correlated with transfer to many-shot recognition, but increasingly less so for few-shot, object detection and dense prediction. No single self-supervised method dominates overall, suggesting that universal pre-training is still unsolved. Our analysis of features suggests that top self-supervised learners fail to preserve colour information as well as supervised alternatives, but tend to induce better classifier calibration, and less attentive overfitting than supervised learners.

PDF Abstract

Evaluation of Self-Supervised Model Transfer Capabilities

The paper, "How Well Do Self-Supervised Models Transfer?" offers a robust examination of self-supervised visual representation learning models, which have significantly advanced in recent years. The evaluation centres on the transfer performance of 13 prominent self-supervised models across 40 diverse downstream tasks, integrating evaluations in many-shot and few-shot recognition, object detection, and dense prediction. The paper juxtaposes self-supervised models against supervised baselines to ascertain their relative efficacy.

Core Findings

The key finding from this paper is the empirical evidence showcasing that the top-performing self-supervised models surpass their supervised counterparts on most evaluated tasks. This observation aligns with the contemporary trend of self-supervised learning achieving parity or superiority over traditional supervised methods.

A pivotal aspect discussed is the correlation of ImageNet Top-1 accuracy with transfer performance. For many-shot recognition tasks, there is a substantial correlation with ImageNet performance. However, this correlation diminishes in scenarios involving few-shot tasks, object detection, and dense prediction. This implies that while models are tuned well for specific contexts like many-shot recognition, their generalizability is limited across different task modalities.

Implications

The analysis reveals that no singular self-supervised model uniformly excels across all tasks, indicating that universal pre-training remains an unsolved problem. This fragmentation signifies that distinct model architectures or training methodologies might be better suited to particular downstream tasks. Moreover, a detailed analysis of the learnt features by these models reveals a disparity: while self-supervised models show promising results in some domains, they often fall short in aspects like preserving color information compared to supervised models. Conversely, they exhibit stronger classifier calibration and reduced overfitting, highlighting their potential advantages in specific contexts.

Discussion and Future Considerations

These findings open pathways for several future research directions. The pursuit of a universal self-supervised model that can uniformly outperform supervised learning across all tasks remains a seminal goal. Investigating the architectural innovations or training regimes that foster improved generalization across varying task formats is warranted.

The paper also highlights a crucial need for understanding the underlying features that contribute to this model behaviour. A more granular exploration into how self-supervised models learn and encode features differently can aid in developing methodologies that harness the advantages observed in classifier calibration and overfitting reduction while addressing identified shortcomings such as color information preservation.

In conclusion, this paper significantly enriches the discourse on self-supervised learning by providing a comprehensive evaluation of current models on a diverse array of tasks. While it confirms the potential of self-supervised models in outstripping supervised counterparts, it also underscores existing limitations and frames the challenges that future models must overcome to achieve universality and robust cross-domain performance. As the trend towards self-supervised learning continues, it is anticipated that resolving these challenges will be crucial to expanding the applicability and efficacy of machine learning models in increasingly complex and varied real-world settings.

PDF Markdown Bookmark Chat (Pro)

Authors (3)

Linus Ericsson (13 papers)
Henry Gouk (30 papers)
Timothy M. Hospedales (69 papers)

Citations (261)

View on Semantic Scholar

Related Papers

Find Related Papers

YouTube

Show All Videos