On the Trade-off of Intra-/Inter-class Diversity for Supervised Pre-training (2305.12224v2)
Abstract: Pre-training datasets are critical for building state-of-the-art machine learning models, motivating rigorous study on their impact on downstream tasks. In this work, we study the impact of the trade-off between the intra-class diversity (the number of samples per class) and the inter-class diversity (the number of classes) of a supervised pre-training dataset. Empirically, we found that with the size of the pre-training dataset fixed, the best downstream performance comes with a balance on the intra-/inter-class diversity. To understand the underlying mechanism, we show theoretically that the downstream performance depends monotonically on both types of diversity. Notably, our theory reveals that the optimal class-to-sample ratio (#classes / #samples per class) is invariant to the size of the pre-training dataset, which motivates an application of predicting the optimal number of pre-training classes. We demonstrate the effectiveness of this application by an improvement of around 2 points on the downstream tasks when using ImageNet as the pre-training dataset.
- Flamingo: a visual language model for few-shot learning. Advances in neural information processing systems, 2022.
- Spectrally-normalized margin bounds for neural networks. Advances in neural information processing systems, 30, 2017.
- Weighted training for cross-task learning. ArXiv, abs/2105.14095, 2021.
- ImageNet: A Large-Scale Hierarchical Image Database. In CVPR09, 2009.
- An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
- Few-shot learning via learning the representation, provably. In International Conference on Learning Representations, 2021.
- The role of pre-training data in transfer learning, 2023.
- Datacomp: In search of the next generation of multimodal datasets. arXiv preprint arXiv:2304.14108, 2023.
- Tatsunori Hashimoto. Model performance scaling with multiple data sources. In International Conference on Machine Learning, 2021.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
- Scaling laws for transfer. ArXiv, abs/2102.01293, 2021.
- Towards understanding the effect of pretraining label granularity. arXiv preprint arXiv:2303.16887, 2023.
- What makes imagenet good for transfer learning? ArXiv, abs/1608.08614, 2016.
- A data-based perspective on transfer learning. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
- Novel dataset for fine-grained image categorization: Stanford dogs. In Proc. CVPR workshop on fine-grained visual categorization (FGVC), volume 2. Citeseer, 2011.
- Alex Krizhevsky. Learning multiple layers of features from tiny images. In toronto, 2009.
- Exploring the limits of weakly supervised pretraining. In Vittorio Ferrari, Martial Hebert, Cristian Sminchisescu, and Yair Weiss, editors, Computer Vision - ECCV 2018 - 15th European Conference, Munich, Germany, September 8-14, 2018, Proceedings, Part II, volume 11206 of Lecture Notes in Computer Science, pages 185–201. Springer, 2018.
- Fine-grained visual classification of aircraft. ArXiv, abs/1306.5151, 2013.
- Foundations of machine learning. MIT press, 2018.
- Automated flower classification over a large number of classes. In 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing, pages 722–729. IEEE, 2008.
- Moment matching for multi-source domain adaptation. In Proceedings of the IEEE/CVF international conference on computer vision, pages 1406–1415, 2019.
- Recognizing indoor scenes. In 2009 IEEE conference on computer vision and pattern recognition, pages 413–420. IEEE, 2009.
- Learning transferable visual models from natural language supervision. In International Conference on Machine Learning, pages 8748–8763. PMLR, 2021.
- ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV), 115(3):211–252, 2015.
- Human action recognition by learning bases of action attributes and parts. In 2011 International conference on computer vision, pages 1331–1338. IEEE, 2011.
- Blessing of class diversity in pre-training. International Conference on Artificial Intelligence and Statistics, 2022.
- Places: A 10 million image database for scene recognition. IEEE transactions on pattern analysis and machine intelligence, 40(6):1452–1464, 2017.