Affine transformation estimation improves visual self-supervised learning (2402.09071v1)
Abstract: The standard approach to modern self-supervised learning is to generate random views through data augmentations and minimise a loss computed from the representations of these views. This inherently encourages invariance to the transformations that comprise the data augmentation function. In this work, we show that adding a module to constrain the representations to be predictive of an affine transformation improves the performance and efficiency of the learning process. The module is agnostic to the base self-supervised model and manifests in the form of an additional loss term that encourages an aggregation of the encoder representations to be predictive of an affine transformation applied to the input images. We perform experiments in various modern self-supervised models and see a performance improvement in all cases. Further, we perform an ablation study on the components of the affine transformation to understand which of them is affecting performance the most, as well as on key architectural design decisions.
- Big self-supervised models are strong semi-supervised learners. In Proceedings of the 34th International Conference on Neural Information Processing Systems, 2020a.
- Barlow twins: Self-supervised learning via redundancy reduction. In Proceedings of the 38th International Conference on Machine Learning, pages 12310–12320, 2021.
- Improved baselines with momentum contrastive learning. arXiv preprint arXiv:2003.04297, 2020b.
- Emerging properties in self-supervised vision transformers. In Proceedings of the International Conference on Computer Vision, 2021.
- Self-supervised learning from a multi-view perspective. In International Conference on Learning Representations, 2021.
- Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems, 33:21271–21284, 2020.
- M. Noroozi and P. Favaro. Unsupervised learning of visual representations by solving jigsaw puzzles. In ECCV, 2016.
- Unsupervised visual representation learning by context prediction. In IEEE International Conference on Computer Vision, pages 1422–1430, 2015.
- Colorful image colorization. In ECCV, 2016.
- Context encoders: Feature learning by inpainting. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2536–2544, 2016.
- Self-supervised representation learning by rotation feature decoupling. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10356–10366, 2019.
- Unsupervised representation learning by predicting image rotations. 2018.
- A simple framework for contrastive learning of visual representations. In Proceedings of the 37th International Conference on Machine Learning, pages 1597–1607, 2020c.
- Representation learning with contrastive predictive coding. ArXiv, abs/1807.03748, 2018.
- Unsupervised feature learning via non-parametric instance discrimination. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018.
- Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020.
- Noise-contrastive estimation: A new estimation principle for unnormalized statistical models. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, 2010.
- Unsupervised learning of visual features by contrasting cluster assignments. Advances in neural information processing systems, 33:9912–9924, 2020.
- Exploring simple siamese representation learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 15750–15758, 2021.
- Ya Le and Xuan S. Yang. Tiny imagenet visual recognition challenge. 2015.
- Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition, 2016.
- Learning multiple layers of features from tiny images. 2009.
- Caltech 101, 2022.
- Maintainers. maxrect. https://github.com/planetlabs/maxrect, 2015.