Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
158 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

What Variables Affect Out-of-Distribution Generalization in Pretrained Models? (2405.15018v3)

Published 23 May 2024 in cs.LG, cs.AI, and cs.CV

Abstract: Embeddings produced by pre-trained deep neural networks (DNNs) are widely used; however, their efficacy for downstream tasks can vary widely. We study the factors influencing transferability and out-of-distribution (OOD) generalization of pre-trained DNN embeddings through the lens of the tunnel effect hypothesis, which is closely related to intermediate neural collapse. This hypothesis suggests that deeper DNN layers compress representations and hinder OOD generalization. Contrary to earlier work, our experiments show this is not a universal phenomenon. We comprehensively investigate the impact of DNN architecture, training data, image resolution, and augmentations on transferability. We identify that training with high-resolution datasets containing many classes greatly reduces representation compression and improves transferability. Our results emphasize the danger of generalizing findings from toy datasets to broader contexts.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (90)
  1. Visualizing and understanding convolutional networks. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part I 13, pages 818–833. Springer, 2014.
  2. What makes instance discrimination good for transfer learning? In The Ninth International Conference on Learning Representations (ICLR 2021), 2021.
  3. How transferable are features in deep neural networks? Advances in neural information processing systems, 27, 2014.
  4. Fortuitous forgetting in connectionist networks. In International Conference on Learning Representations, 2021.
  5. Anatomy of catastrophic forgetting: Hidden representations and task semantics. In International Conference on Learning Representations, 2020.
  6. What happens during finetuning of vision transformers: An invariance based investigation. In Conference on Lifelong Learning Agents, pages 601–619. PMLR, 2023.
  7. Head2toe: Utilizing intermediate representations for better transfer learning. In International Conference on Machine Learning, pages 6009–6033. PMLR, 2022.
  8. Fine-tuning can distort pretrained features and underperform out-of-distribution. In International Conference on Learning Representations, 2021.
  9. The tunnel effect: Building data representations in deep neural networks. Advances in Neural Information Processing Systems, 36, 2024.
  10. Feature learning in deep classifiers through intermediate neural collapse. In International Conference on Machine Learning, pages 28729–28745. PMLR, 2023.
  11. Are open set classification methods effective on large-scale datasets? Plos one, 15(9):e0238302, 2020.
  12. Active learning at the imagenet scale. arXiv preprint arXiv:2111.12880, 2021.
  13. Mos: Towards scaling out-of-distribution detection for large semantic space. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8710–8719, 2021.
  14. Benchmarking uncertainty disentanglement: Specialized uncertainties for specialized tasks. arXiv preprint arXiv:2402.19460, 2024.
  15. Squeeze, recover and relabel: Dataset condensation at imagenet scale from a new perspective. Advances in Neural Information Processing Systems, 36, 2024.
  16. Deep class-incremental learning: A survey. arXiv preprint arXiv:2302.03648, 2023.
  17. A continual learning survey: Defying forgetting in classification tasks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(7):3366–3385, 2022. doi: 10.1109/TPAMI.2021.3057446.
  18. Computationally budgeted continual learning: What does matter? In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3698–3707, 2023.
  19. Quantifying the effects of data augmentation and stain color normalization in convolutional neural networks for computational pathology. Medical image analysis, 58:101544, 2019.
  20. Decaug: Out-of-distribution generalization via decomposed feature representation and semantic augmentation. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 6705–6713, 2021.
  21. Ssmba: Self-supervised manifold based data augmentation for improving out-of-domain robustness, 2020.
  22. Does progress on imagenet transfer to real-world datasets? Advances in Neural Information Processing Systems, 36, 2024.
  23. Generalizing to unseen domains via adversarial data augmentation. Advances in neural information processing systems, 31, 2018.
  24. A data-augmentation is worth a thousand samples: Exact quantification from analytical augmented sample moments. arXiv preprint arXiv:2202.08325, 2022.
  25. Decaf: A deep convolutional activation feature for generic visual recognition. In International conference on machine learning, pages 647–655. PMLR, 2014.
  26. Cnn features off-the-shelf: an astounding baseline for recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pages 806–813, 2014.
  27. Do better imagenet models transfer better? In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2661–2671, 2019.
  28. Do imagenet classifiers generalize to imagenet? In International conference on machine learning, pages 5389–5400. PMLR, 2019.
  29. Measuring robustness to natural distribution shifts in image classification. Advances in Neural Information Processing Systems, 33:18583–18599, 2020.
  30. Accuracy on the line: on the strong correlation between out-of-distribution and in-distribution generalization. In International conference on machine learning, pages 7721–7735. PMLR, 2021.
  31. On robustness and transferability of convolutional neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16458–16468, 2021.
  32. Why do better loss functions lead to less transferable features? Advances in Neural Information Processing Systems, 34:28648–28662, 2021.
  33. Assaying out-of-distribution generalization in transfer learning. Advances in Neural Information Processing Systems, 35:7181–7198, 2022.
  34. Id and ood performance are sometimes inversely correlated on real-world datasets. Advances in Neural Information Processing Systems, 36, 2024.
  35. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
  36. On the connection between pre-training data diversity and fine-tuning robustness. Advances in Neural Information Processing Systems, 36, 2024.
  37. Simulated annealing in early layers leads to better generalization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20205–20214, 2023.
  38. Big transfer (bit): General visual representation learning. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pages 491–507. Springer, 2020.
  39. Convnet vs transformer, supervised vs clip: Beyond imagenet accuracy, 2024.
  40. Delving deep into the generalization of vision transformers under distribution shifts. In Proceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition, pages 7277–7286, 2022.
  41. Understanding intermediate layers using linear classifier probes. arXiv preprint arXiv:1610.01644, 2016.
  42. Ood-probe: A neural interpretation of out-of-domain generalization. arXiv preprint arXiv:2208.12352, 2022.
  43. Dive into the chasm: Probing the gap between in-and cross-topic generalization. arXiv preprint arXiv:2402.01375, 2024.
  44. Parametric instance classification for unsupervised visual feature learning. Advances in neural information processing systems, 33:15614–15624, 2020.
  45. Beit: Bert pre-training of image transformers. arXiv preprint arXiv:2106.08254, 2021.
  46. Generative pretraining from pixels. In International conference on machine learning, pages 1691–1703. PMLR, 2020a.
  47. A simple framework for contrastive learning of visual representations. In International conference on machine learning, pages 1597–1607. PMLR, 2020b.
  48. Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems, 33:21271–21284, 2020.
  49. Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9729–9738, 2020.
  50. Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16000–16009, 2022.
  51. Simmim: A simple framework for masked image modeling. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9653–9663, 2022.
  52. Self-supervised learning from images with a joint-embedding predictive architecture. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15619–15629, 2023.
  53. Very deep convolutional networks for large-scale image recognition. In International Conference on Learning Representations, 2015.
  54. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
  55. An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations, 2020.
  56. Learning overparameterized neural networks via stochastic gradient descent on structured data. Advances in neural information processing systems, 31, 2018.
  57. Better plain vit baselines for imagenet-1k. arXiv preprint arXiv:2205.01580, 2022.
  58. Contrastive multiview coding. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XI 16, pages 776–794. Springer, 2020.
  59. Learning multiple layers of features from tiny images. 2009.
  60. Cifar-100 (canadian institute for advanced research). Technical report, University of Toronto, 2014.
  61. In or out? fixing imagenet out-of-distribution detection evaluation. ICML, 2023.
  62. The many faces of robustness: A critical analysis of out-of-distribution generalization. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 8340–8349, 2021.
  63. Automated flower classification over a large number of classes. In 2008 Sixth Indian conference on computer vision, graphics & image processing, pages 722–729. IEEE, 2008.
  64. The caltech-ucsd birds-200-2011 dataset. California Institute of Technology, 2011.
  65. Fine-grained visual classification of aircraft. arXiv preprint arXiv:1306.5151, 2013.
  66. Cats and dogs. In 2012 IEEE conference on computer vision and pattern recognition, pages 3498–3505. IEEE, 2012.
  67. An analysis of single-layer networks in unsupervised feature learning. In Proceedings of the fourteenth international conference on artificial intelligence and statistics, pages 215–223. JMLR Workshop and Conference Proceedings, 2011.
  68. Norman Cliff. Dominance statistics: Ordinal analyses to answer ordinal questions. Psychological bulletin, 114(3):494, 1993.
  69. Towards efficient data valuation based on the shapley value. In The 22nd International Conference on Artificial Intelligence and Statistics, pages 1167–1176. PMLR, 2019.
  70. From local explanations to global understanding with explainable ai for trees. Nature machine intelligence, 2(1):56–67, 2020.
  71. A convnet for the 2020s. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11976–11986, 2022.
  72. Unsupervised learning of visual features by contrasting cluster assignments. Advances in neural information processing systems, 33:9912–9924, 2020.
  73. Mugs: A multi-granular self-supervised learning framework. arXiv preprint arXiv:2203.14415, 2022.
  74. Emerging properties in self-supervised vision transformers. In Proceedings of the IEEE/CVF international conference on computer vision, pages 9650–9660, 2021.
  75. Convnext v2: Co-designing and scaling convnets with masked autoencoders. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16133–16142, 2023.
  76. Continual lifelong learning with neural networks: A review. Neural Networks, 2019.
  77. Siesta: Efficient online continual learning with sleep. Transactions on Machine Learning Research (TMLR), 2023.
  78. Remind your neural network to prevent catastrophic forgetting. In European Conference on Computer Vision, pages 466–483. Springer, 2020.
  79. Do vision transformers see like convolutional neural networks? Advances in Neural Information Processing Systems, 34:12116–12128, 2021.
  80. Training data-efficient image transformers & distillation through attention. In International conference on machine learning, pages 10347–10357. PMLR, 2021.
  81. A-vit: Adaptive tokens for efficient vision transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10809–10818, 2022.
  82. Ross Wightman. Pytorch image models. https://github.com/rwightman/pytorch-image-models, 2019.
  83. Manifold mixup: Better representations by interpolating hidden states. In International conference on machine learning, pages 6438–6447. PMLR, 2019.
  84. Cutmix: Regularization strategy to train strong classifiers with localizable features. In Proceedings of the IEEE/CVF international conference on computer vision, pages 6023–6032, 2019.
  85. Imagenet large scale visual recognition challenge. International journal of computer vision, 115:211–252, 2015.
  86. Dualprompt: Complementary prompting for rehearsal-free continual learning. In European Conference on Computer Vision, pages 631–648. Springer, 2022.
  87. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF international conference on computer vision, pages 10012–10022, 2021.
  88. Context autoencoder for self-supervised representation learning. International Journal of Computer Vision, 132(1):208–223, 2024.
  89. Poly-view contrastive learning. arXiv preprint arXiv:2403.05490, 2024.
  90. Augmentations vs algorithms: What works in self-supervised learning. arXiv preprint arXiv:2403.05726, 2024.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets