Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Less is More: On the Feature Redundancy of Pretrained Models When Transferring to Few-shot Tasks (2310.03843v1)

Published 5 Oct 2023 in cs.CV and cs.LG

Abstract: Transferring a pretrained model to a downstream task can be as easy as conducting linear probing with target data, that is, training a linear classifier upon frozen features extracted from the pretrained model. As there may exist significant gaps between pretraining and downstream datasets, one may ask whether all dimensions of the pretrained features are useful for a given downstream task. We show that, for linear probing, the pretrained features can be extremely redundant when the downstream data is scarce, or few-shot. For some cases such as 5-way 1-shot tasks, using only 1\% of the most important feature dimensions is able to recover the performance achieved by using the full representation. Interestingly, most dimensions are redundant only under few-shot settings and gradually become useful when the number of shots increases, suggesting that feature redundancy may be the key to characterizing the "few-shot" nature of few-shot transfer problems. We give a theoretical understanding of this phenomenon and show how dimensions with high variance and small distance between class centroids can serve as confounding factors that severely disturb classification results under few-shot settings. As an attempt at solving this problem, we find that the redundant features are difficult to identify accurately with a small number of training samples, but we can instead adjust feature magnitude with a soft mask based on estimated feature importance. We show that this method can generally improve few-shot transfer performance across various pretrained models and downstream datasets.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (39)
  1. Exploring the limits of large scale pre-training. In ICLR, 2022.
  2. Describing textures in the wild. In CVPR, 2014.
  3. Skin lesion analysis toward melanoma detection 2018: A challenge hosted by the international skin imaging collaboration (isic). arXiv preprint arXiv:1902.03368, 2019.
  4. Confess: A framework for single source cross-domain few-shot learning. In ICLR, 2022.
  5. Scaling vision transformers to 22 billion parameters. In ICML, 2023.
  6. Imagenet: A large-scale hierarchical image database. In CVPR, 2009.
  7. An image is worth 16x16 words: Transformers for image recognition at scale. In ICLR, 2021.
  8. How well do self-supervised models transfer? In CVPR, 2021.
  9. Cyclip: Cyclic contrastive language-image pretraining. NeurIPS, 2022.
  10. Bootstrap your own latent-a new approach to self-supervised learning. NeurIPS, 2020.
  11. Masked autoencoders are scalable vision learners. In CVPR, 2022.
  12. Detection of traffic signs in real-world images: The german traffic sign detection benchmark. In IJCNN, 2013.
  13. A broad study on the transferability of visual representations with contrastive learning. In ICCV, 2021.
  14. The quick, draw! – a.i. experiment. quickdraw.withgoogle.com, 2016.
  15. Big transfer (bit): General visual representation learning. In ECCV, 2020.
  16. Do better imagenet models transfer better? In CVPR, 2019.
  17. Fine-tuning can distort pretrained features and underperform out-of-distribution. In ICLR, 2022.
  18. Human-level concept learning through probabilistic program induction. Science, 2015.
  19. Efficient self-supervised vision transformers for representation learning. In ICLR, 2022.
  20. Microsoft coco: Common objects in context. In ECCV, 2014.
  21. Swin transformer: Hierarchical vision transformer using shifted windows. In ICCV, 2021.
  22. A convnet for the 2020s. In CVPR, 2022.
  23. Channel importance matters in few-shot image classification. In ICML, 2022.
  24. A closer look at few-shot classification again. In ICML, 2023.
  25. Fine-grained visual classification of aircraft. arXiv preprint arXiv:1306.5151, 2013.
  26. Using deep learning for image-based plant disease detection. Frontiers in plant science, 2016.
  27. Automated flower classification over a large number of classes. In 2008 Sixth Indian conference on computer vision, graphics & image processing, 2008.
  28. Dinov2: Learning robust visual features without supervision. arXiv preprint arXiv:2304.07193, 2023.
  29. Task-specific skill localization in fine-tuned language models. In ICML, 2023.
  30. Learning transferable visual models from natural language supervision. In ICML, 2021.
  31. Svcca: Singular vector canonical correlation analysis for deep learning dynamics and interpretability. NeurIPS, 2017.
  32. FGVCx fungi classification challenge 2018. github.com/visipedia/fgvcx_fungi_comp, 2018.
  33. Prototypical networks for few-shot learning. NeurIPS, 2017.
  34. Revisiting unreasonable effectiveness of data in deep learning era. In ICCV, 2017.
  35. Chestx-ray8: Hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases. In CVPR, 2017.
  36. Caltech-ucsd birds 200. Technical Report CNS-TR-201, Caltech, 2010.
  37. Barlow twins: Self-supervised learning via redundancy reduction. In ICML, 2021.
  38. What makes instance discrimination good for transfer learning? In ICLR, 2021.
  39. Image BERT pre-training with online tokenizer. In ICLR, 2022.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Xu Luo (22 papers)
  2. Difan Zou (71 papers)
  3. Lianli Gao (100 papers)
  4. Zenglin Xu (145 papers)
  5. Jingkuan Song (116 papers)

Summary

We haven't generated a summary for this paper yet.