Beyond Random Augmentations: Pretraining with Hard Views (2310.03940v5)
Abstract: Many Self-Supervised Learning (SSL) methods aim for model invariance to different image augmentations known as views. To achieve this invariance, conventional approaches make use of random sampling operations within the image augmentation pipeline. We hypothesize that the efficacy of pretraining pipelines based on conventional random view sampling can be enhanced by explicitly selecting views that benefit the learning progress. A simple, yet effective approach is to select hard views that yield a higher loss. In this paper, we present Hard View Pretraining (HVP), a learning-free strategy that builds upon this hypothesis and extends random view generation. HVP exposes the model to harder, more challenging samples during SSL pretraining, which enhances downstream performance. It encompasses the following iterative steps: 1) randomly sample multiple views and forward each view through the pretrained model, 2) create pairs of two views and compute their loss, 3) adversarially select the pair yielding the highest loss depending on the current model state, and 4) run the backward pass with the selected pair. As a result, HVP achieves linear evaluation accuracy improvements of 1% on average on ImageNet for both 100 and 300 epoch pretraining and similar improvements on transfer tasks across DINO, SimSiam, iBOT, and SimCLR.
- Food-101 – mining discriminative components with random forests. In Proc. of ECCV’14, 2014.
- Unsupervised learning of visual features by contrasting cluster assignments. In Proc. of NeurIPS’20, 2020.
- Emerging properties in self-supervised vision transformers. In Proc. of ICCV’21, pp. 9630–9640, 2021.
- A simple framework for contrastive learning of visual representations. In Proc. of ICML’20, pp. 1597–1607, 2020a.
- X. Chen and K. He. Exploring simple siamese representation learning. In Proc. of CVPR’21, pp. 15750–15758, 2021.
- Improved baselines with momentum contrastive learning. CoRR, abs/2003.04297, 2020b.
- Autoaugment: Learning augmentation strategies from data. In Proc. of CVPR’19, pp. 113–123, 2019.
- ImageNet: A Large-Scale Hierarchical Image Database. In Proc. of CVPR’09, pp. 248–255, 2009.
- An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:12010.11929, 2020.
- Whitening for self-supervised representation learning. In Proc. of ICML’21, pp. 3015–3024, 2021.
- The pascal visual object classes (VOC) challenge. In I. J. of Computer Vision (IJCV’10), pp. 303–338, 2010.
- J. Friedman. Greedy function approximation: A gradient boosting machine. Annals of Statistics, pp. 1189–1232, 2001.
- Bootstrap your own latent: A new approach to self-supervised learning. In Proc. of NeurIPS’20, 2020.
- Dimensionality reduction by learning an invariant mapping. In Proc. of CVPR’06, pp. 1735–1742, 2006.
- Faster autoaugment: Learning augmentation strategies using backpropagation. In Proc. of ECCV’20, pp. 1–16, 2020.
- Deep residual learning for image recognition. In Proc. of CVPR’16, pp. 770–778, 2016.
- Momentum contrast for unsupervised visual representation learning. In Proc. of CVPR’20, pp. 9726–9735, 2020.
- Masked autoencoders are scalable vision learners. In Proc. of CVPR’22, pp. 15979–15988, 2022.
- Population based augmentation: Efficient learning of augmentation policy schedules. In Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA, pp. 2731–2741, 2019.
- When to learn what: Model-adaptive data augmentation curriculum. CoRR, abs/2309.04747, 2023.
- An efficient approach for assessing hyperparameter importance. In Proc. of ICML’14, pp. 754–762, 2014.
- iNaturalist 2021 competition dataset. iNaturalist 2021 competition dataset. https://github.com/visipedia/inat_comp/tree/master/2021, 2021.
- Spatial transformer networks. In Proc. of NeurIPS’15, pp. 2017–2025, 2015.
- A. Krizhevsky. Learning multiple layers of features from tiny images. Technical report, University of Toronto, 2009.
- Online hyper-parameter learning for auto-augmentation strategy. In Proc. of ICCV’19, pp. 6578–6587, 2019.
- Microsoft COCO: common objects in context. In Proc. of ECCV’14, pp. 740–755, 2014.
- I. Loshchilov and F. Hutter. Decoupled weight decay regularization. In Proc. of ICLR’19, 2019.
- S. Müller and F. Hutter. Trivialaugment: Tuning-free yet state-of-the-art data augmentation. In Proc. of ICCV’21, pp. 774–782, 2021.
- M.-E. Nilsback and A. Zisserman. Automated flower classification over a large number of classes. In Proc. of ICVGIP’08, pp. 722–729, 2008.
- PyTorch: An imperative style, high-performance deep learning library. In Proc. of NeurIPS’19, pp. 8024–8035, 2019.
- S. Purushwalkam and A. Gupta. Demystifying contrastive self-supervised learning: Invariances, augmentations and dataset biases. In Proc. of NeurIPS’20, 2020.
- Faster R-CNN: towards real-time object detection with region proposal networks. In Proc. of NeurIPS’15, pp. 91–99, 2015.
- Adversarial masking for self-supervised learning. In Proc. of ICML’22, volume 162, pp. 20026–20040, 2022.
- Viewmaker networks: Learning views for unsupervised representation learning. In Proc. of ICLR’21, 2021.
- Contrastive multiview coding. In Proc. of ECCV’20, pp. 776–794, 2020a.
- What makes for good views for contrastive learning? In Proc. of NeurIPS’20, 2020b.
- Representation learning with contrastive predictive coding. CoRR, abs/1807.03748, 2018.
- On the importance of hyperparameters and data augmentation for self-supervised learning. International Conference on Machine Learning (ICML) 2022 Pre-Training Workshop, 2022.
- On mutual information in contrastive learning for visual representations. arXiv:2005.13149 [cs.CV], 2020.
- Detectron2. https://github.com/facebookresearch/detectron2, 2019.
- Unsupervised feature learning via non-parametric instance-level discrimination. In Proc. of CVPR’18, 2018.
- On the algorithmic stability of adversarial training. In Proc. of NeurIPS’21, pp. 26523–26535, 2021.
- Large batch training of convolutional networks. arXiv preprint arXiv:1708.03888, 2017.
- Barlow twins: Self-supervised learning via redundancy reduction. In Proc. of ICML’21, pp. 12310–12320, 2021.
- Adversarial AutoAugment. In Proc. of ICLR’20, 2020.
- ibot: Image BERT pre-training with online tokenizer. CoRR, abs/2111.07832, 2021.