SASSL: Enhancing Self-Supervised Learning via Neural Style Transfer (2312.01187v4)
Abstract: Existing data augmentation in self-supervised learning, while diverse, fails to preserve the inherent structure of natural images. This results in distorted augmented samples with compromised semantic information, ultimately impacting downstream performance. To overcome this limitation, we propose SASSL: Style Augmentations for Self Supervised Learning, a novel data augmentation technique based on Neural Style Transfer. SASSL decouples semantic and stylistic attributes in images and applies transformations exclusively to their style while preserving content, generating diverse samples that better retain semantic information. SASSL boosts top-1 image classification accuracy on ImageNet by up to 2 percentage points compared to established self-supervised methods like MoCo, SimCLR, and BYOL, while achieving superior transfer learning performance across various datasets. Because SASSL can be performed asynchronously as part of the data augmentation pipeline, these performance impacts can be obtained with no change in pretraining throughput.
- Tensorflow: A system for large-scale machine learning. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16), pp. 265–283, 2016. URL https://www.usenix.org/system/files/conference/osdi16/osdi16-abadi.pdf.
- Masked siamese networks for label-efficient learning. In European Conference on Computer Vision, pp. 456–473. Springer, 2022.
- Rsa: Reducing semantic shift from aggressive augmentations for self-supervised learning. Advances in Neural Information Processing Systems, 35:21128–21141, 2022.
- Vicreg: Variance-invariance-covariance regularization for self-supervised learning. arXiv preprint arXiv:2105.04906, 2021.
- Unsupervised learning of visual features by contrasting cluster assignments. Advances in neural information processing systems, 33:9912–9924, 2020.
- Emerging properties in self-supervised vision transformers. In Proceedings of the IEEE/CVF international conference on computer vision, pp. 9650–9660, 2021.
- Artistic style transfer with internal-external learning and contrastive learning. Advances in Neural Information Processing Systems, 34:26561–26573, 2021a.
- A simple framework for contrastive learning of visual representations. In International conference on machine learning, pp. 1597–1607. PMLR, 2020a.
- An empirical study of training self-supervised vision transformers. In 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 9620–9629, Los Alamitos, CA, USA, oct 2021b. IEEE Computer Society. doi: 10.1109/ICCV48922.2021.00950. URL https://doi.ieeecomputersociety.org/10.1109/ICCV48922.2021.00950.
- Exploring simple siamese representation learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 15750–15758, 2021.
- Improved baselines with momentum contrastive learning. arXiv preprint arXiv:2003.04297, 2020b.
- Describing textures in the wild. In Proceedings of the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2014.
- Stytr2: Image style transfer with transformers. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 11326–11336, 2022.
- Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
- A learned representation for artistic style. In International Conference on Learning Representations, 2017. URL https://openreview.net/forum?id=BJO-BuT1g.
- Image style transfer using convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2414–2423, 2016.
- Controlling perceptual factors in neural style transfer. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3985–3993, 2017.
- Imagenet-trained cnns are biased towards texture; increasing shape bias improves accuracy and robustness. arXiv preprint arXiv:1811.12231, 2018.
- Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems, 33:21271–21284, 2020.
- You only cut once: Boosting data augmentation with a single cut. In International Conference on Machine Learning, pp. 8196–8212. PMLR, 2022.
- Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016.
- Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 9729–9738, 2020.
- Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 16000–16009, 2022.
- Pyramid-based texture analysis/synthesis. In Proceedings of the Conference on Computer Graphics and Interactive Techniques, pp. 229–238, 1995.
- A sliced Wasserstein loss for neural texture synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9412–9420, 2021.
- Stylemix: Separating content and style for enhanced data augmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 14862–14870, 2021.
- Arbitrary style transfer in real-time with adaptive instance normalization. In Proceedings of the IEEE International Conference on Computer Vision, pp. 1501–1510, 2017.
- iNaturalist 2021. iNaturalist 2021 competition dataset. https://github.com/visipedia/inat_comp/tree/master/2021, 2021.
- Style augmentation: data augmentation via style randomization. In CVPR workshops, volume 6, pp. 10–11, 2019.
- Neural style transfer: A review. IEEE transactions on visualization and computer graphics, 26(11):3365–3385, 2019.
- Dynamic instance normalization for arbitrary style transfer. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pp. 4369–4376, 2020.
- Kaggle and EyePacs. Kaggle diabetic retinopathy detection, jul 2015. URL https://www.kaggle.com/c/diabetic-retinopathy-detection/data.
- Wendy Kan. Painter by numbers, 2016. URL https://kaggle.com/competitions/painter-by-numbers.
- Optimal whitening and decorrelation. The American Statistician, 72(4):309–314, 2018.
- Do better imagenet models transfer better? In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 2661–2671, 2019.
- Improving transferability of representations via augmentation-aware self-supervision. Advances in Neural Information Processing Systems, 34:17710–17722, 2021.
- Demystifying neural style transfer. arXiv preprint arXiv:1701.01036, 2017a.
- Universal style transfer via feature transforms. In Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS’17, pp. 385–395, Red Hook, NY, USA, 2017b. Curran Associates Inc. ISBN 9781510860964.
- Adaattn: Revisit attention mechanism in arbitrary neural style transfer. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6649–6658, 2021.
- A parametric texture model based on joint statistics of complex wavelet coefficients. International Journal of Computer Vision, 40(1):49–70, 2000.
- Demystifying contrastive self-supervised learning: Invariances, augmentations and dataset biases. Advances in Neural Information Processing Systems, 33:3407–3418, 2020.
- Selfaugment: Automatic augmentation policies for self-supervised learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 2674–2683, 2021.
- Stable and controllable neural texture synthesis and style transfer using histogram losses. 2017. URL http://arxiv.org/abs/1701.08893.
- Avatar-net: Multi-scale zero-shot style transfer by feature decoration. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8242–8250, 2018.
- Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
- Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2818–2826, 2016.
- What makes for good views for contrastive learning? Advances in neural information processing systems, 33:6827–6839, 2020.
- Laurens Van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. Journal of machine learning research, 9(11), 2008.
- On the importance of hyperparameters and data augmentation for self-supervised learning. arXiv preprint arXiv:2207.07875, 2022.
- Diversified arbitrary style transfer via deep feature perturbation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7789–7798, 2020.
- Photorealistic style transfer via wavelet transforms. In Proceedings of the IEEE International Conference on Computer Vision, pp. 9036–9045, 2019.
- Large batch training of convolutional networks. arXiv preprint arXiv:1708.03888, 2017.
- Cutmix: Regularization strategy to train strong classifiers with localizable features. In Proceedings of the IEEE/CVF international conference on computer vision, pp. 6023–6032, 2019.
- Barlow twins: Self-supervised learning via redundancy reduction. In International Conference on Machine Learning, pp. 12310–12320. PMLR, 2021.
- mixup: Beyond empirical risk minimization. arXiv preprint arXiv:1710.09412, 2017.
- Stada: Style transfer as data augmentation. arXiv preprint arXiv:1909.01056, 2019.
- Random erasing data augmentation. In Proceedings of the AAAI conference on artificial intelligence, volume 34, pp. 13001–13008, 2020.
- Exploring texture ensembles by efficient Markov chain monte carlo-toward a “trichromacy” theory of texture. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(6):554–569, 2000.
- Renan A. Rojas-Gomez (5 papers)
- Karan Singhal (26 papers)
- Ali Etemad (118 papers)
- Alex Bijamov (4 papers)
- Warren R. Morningstar (9 papers)
- Philip Andrew Mansfield (9 papers)