Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
162 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Investigating Transfer Learning Capabilities of Vision Transformers and CNNs by Fine-Tuning a Single Trainable Block (2110.05270v1)

Published 11 Oct 2021 in cs.CV and cs.AI

Abstract: In recent developments in the field of Computer Vision, a rise is seen in the use of transformer-based architectures. They are surpassing the state-of-the-art set by CNN architectures in accuracy but on the other hand, they are computationally very expensive to train from scratch. As these models are quite recent in the Computer Vision field, there is a need to study it's transfer learning capabilities and compare it with CNNs so that we can understand which architecture is better when applied to real world problems with small data. In this work, we follow a simple yet restrictive method for fine-tuning both CNN and Transformer models pretrained on ImageNet1K on CIFAR-10 and compare them with each other. We only unfreeze the last transformer/encoder or last convolutional block of a model and freeze all the layers before it while adding a simple MLP at the end for classification. This simple modification lets us use the raw learned weights of both these neural networks. From our experiments, we find out that transformers-based architectures not only achieve higher accuracy than CNNs but some transformers even achieve this feat with around 4 times lesser number of parameters.

Citations (5)

Summary

We haven't generated a summary for this paper yet.