Revisiting the Transferability of Supervised Pretraining: an MLP Perspective (2112.00496v3)

Published 1 Dec 2021 in cs.CV

Abstract: The pretrain-finetune paradigm is a classical pipeline in visual learning. Recent progress on unsupervised pretraining methods shows superior transfer performance to their supervised counterparts. This paper revisits this phenomenon and sheds new light on understanding the transferability gap between unsupervised and supervised pretraining from a multilayer perceptron (MLP) perspective. While previous works focus on the effectiveness of MLP on unsupervised image classification where pretraining and evaluation are conducted on the same dataset, we reveal that the MLP projector is also the key factor to better transferability of unsupervised pretraining methods than supervised pretraining methods. Based on this observation, we attempt to close the transferability gap between supervised and unsupervised pretraining by adding an MLP projector before the classifier in supervised pretraining. Our analysis indicates that the MLP projector can help retain intra-class variation of visual features, decrease the feature distribution distance between pretraining and evaluation datasets, and reduce feature redundancy. Extensive experiments on public benchmarks demonstrate that the added MLP projector significantly boosts the transferability of supervised pretraining, e.g. +7.2% top-1 accuracy on the concept generalization task, +5.8% top-1 accuracy for linear evaluation on 12-domain classification tasks, and +0.8% AP on COCO object detection task, making supervised pretraining comparable or even better than unsupervised pretraining.

Citations (48)

View on Semantic Scholar

Summary

The paper demonstrates that integrating MLP projectors into supervised pretraining bridges the transferability gap by preserving intra-class variation and reducing feature distribution differences.
The study reports a +7.2% boost in concept generalization and comparable image classification performance across architectures like ResNet and EfficientNet.
The paper offers empirical and theoretical insights that guide architectural improvements in pretraining strategies for enhanced visual learning.

Revisiting the Transferability of Supervised Pretraining: An MLP Perspective

The paper "Revisiting the Transferability of Supervised Pretraining: an MLP Perspective" addresses a critical insight into the pretrain-finetune paradigm in visual learning. This paper specifically focuses on the transferability gap between supervised and unsupervised pretraining methods by investigating the role of multilayer perceptron (MLP) projectors. It provides significant empirical and theoretical analysis, revealing the positive impact of MLP inclusion on closing the transferability gap between these paradigms.

Key Contributions and Findings

MLP Projectors Enhance Transferability: Previous research highlighted the superior transfer performance of unsupervised pretraining over supervised counterparts without delving deeply into the architectural features contributing to this trend. This paper identifies the MLP projector as an essential component of unsupervised methods, fundamentally improving transferability by preserving intra-class variation and decreasing feature distribution distance between pretraining and evaluation datasets.
Comparison of Supervised and Unsupervised Pretraining: Through rigorous empirical testing, the paper demonstrates substantial improvements in transferability when introducing an MLP projector into supervised pretraining architectures (SL-MLP). This modification makes supervised pretraining comparable to, or even surpass, unsupervised counterparts like Byol and Mocov2. The paper reports improved top-1 accuracy figures, with findings such as a +7.2% boost in concept generalization tasks and comparable performance in image classification across various domains.
General Applicability across Architectures: The positive effects of integrating MLP are consistent across various backbone architectures, including ResNet-50, ResNet-101, Swin-Tiny, and EfficientNetb2, indicating its broad applicability.
Stage-wise Evaluation: The research utilizes stage-wise evaluations to identify performance drops in the transferability at different stages of supervised learning without MLP. When the MLP is incorporated, these drops are avoided, indicating that the MLP assists in maintaining high transferability through the network stages.
Empirical and Theoretical Analysis: The paper provides theoretical insights explaining why optimizing discriminative ratios on pretraining datasets beyond a certain threshold may adversely impact transferability. The threshold of this ratio is dependent on the semantic gap between pretraining and evaluation datasets, offering guidelines for designing pretraining architectures.

Implications and Future Directions

The implications of this research suggest that architectural design, specifically the inclusion of MLP projectors, can significantly impact the efficacy of transfer learning models. By reducing feature redundancy and enhancing discriminative ability, such design modifications can optimize pretraining frameworks for better scalability and cross-domain performance.

This insight paves the way for future exploration into additional architectural modifications or objective function designs that could further minimize the gap in transfer learning performance. Researchers could explore investigating other forms of nonlinear transformations in pretraining architectures or explore the fusion of MLP projectors with innovative loss functions.

In summary, the paper provides a substantial contribution towards understanding model transferability, emphasizing the architectural elements that drive performance improvements in pretraining paradigms. As a result, it helps refine strategies for advancing visual learning processes, with potential applications extending into various AI-driven domains.

Related Papers

YouTube

Show All Videos