WERank: Towards Rank Degradation Prevention for Self-Supervised Learning Using Weight Regularization (2402.09586v1)
Abstract: A common phenomena confining the representation quality in Self-Supervised Learning (SSL) is dimensional collapse (also known as rank degeneration), where the learned representations are mapped to a low dimensional subspace of the representation space. The State-of-the-Art SSL methods have shown to suffer from dimensional collapse and fall behind maintaining full rank. Recent approaches to prevent this problem have proposed using contrastive losses, regularization techniques, or architectural tricks. We propose WERank, a new regularizer on the weight parameters of the network to prevent rank degeneration at different layers of the network. We provide empirical evidence and mathematical justification to demonstrate the effectiveness of the proposed regularization method in preventing dimensional collapse. We verify the impact of WERank on graph SSL where dimensional collapse is more pronounced due to the lack of proper data augmentation. We empirically demonstrate that WERank is effective in helping BYOL to achieve higher rank during SSL pre-training and consequently downstream accuracy during evaluation probing. Ablation studies and experimental analysis shed lights on the underlying factors behind the performance gains of the proposed approach.
- Implicit regularization in deep matrix factorization. CoRR.
- Learning representations by maximizing mutual information across views. In Wallach, H., Larochelle, H., Beygelzimer, A., d'Alché-Buc, F., Fox, E., and Garnett, R., editors, Advances in Neural Information Processing Systems, NeurIPS’19.
- Contrastive and non-contrastive self-supervised learning recover global and local spectral embedding methods. In Advances in Neural Information Processing Systems, NeurIPS’22.
- Vicreg: Variance-invariance-covariance regularization for self-supervised learning. In The Tenth International Conference on Learning Representations, ICLR. OpenReview.net.
- Graph barlow twins: A self-supervised representation learning framework for graphs. Knowl. Based Syst.
- Graph barlow twins: A self-supervised representation learning framework for graphs. Knowledge-Based Systems.
- Deep gaussian embedding of graphs: Unsupervised inductive learning via ranking. In International Conference on Learning Representations, ICLR.
- Unsupervised learning of visual features by contrasting cluster assignments. In Proceedings of the 34th International Conference on Neural Information Processing Systems, NeurIPS’20.
- Unsupervised learning of visual features by contrasting cluster assignments. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems, NeurIPS’20.
- Unsupervised learning of visual features by contrasting cluster assignments.
- A simple framework for contrastive learning of visual representations. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. PMLR.
- Exploring simple siamese representation learning. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR. Computer Vision Foundation / IEEE.
- Whitening for self-supervised representation learning. In Meila, M. and Zhang, T., editors, Proceedings of the 38th International Conference on Machine Learning. PMLR.
- Rankme: Assessing the downstream performance of pretrained self-supervised representations by their rank.
- Learning word vectors for 157 languages. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018).
- Bootstrap your own latent: A new approach to self-supervised learning. CoRR.
- Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems, 33:21271–21284.
- Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In 2015 IEEE International Conference on Computer Vision, ICCV.
- Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In 2015 IEEE International Conference on Computer Vision (ICCV).
- Distilling the knowledge in a neural network. In NeurIPS Deep Learning and Representation Learning Workshop.
- On feature decorrelation in self-supervised learning. CoRR.
- Unsupervised deep learning by neighbourhood discovery. Proceedings of Machine Learning Research. PMLR.
- Understanding dimensional collapse in contrastive self-supervised learning. In The Tenth International Conference on Learning Representations, ICLR.
- Adam: A method for stochastic optimization. In Bengio, Y. and LeCun, Y., editors, 3rd International Conference on Learning Representations, ICLR.
- Semi-supervised classification with graph convolutional networks. In International Conference on Learning Representations.
- Augmentation-free self-supervised learning on graphs. Proceedings of the AAAI Conference on Artificial Intelligence.
- Image-based recommendations on styles and substitutes. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2015. ACM.
- Efficient estimation of word representations in vector space. In 1st International Conference on Learning Representations, ICLR.
- Glove: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014. ACL.
- A critical look at the evaluation of GNNs under heterophily: Are we really making progress? In The Eleventh International Conference on Learning Representations.
- Collective classification in network data. AI Magazine.
- An overview of microsoft academic service (mas) and applications. In Proceedings of the 24th International Conference on World Wide Web. Association for Computing Machinery.
- Large-scale representation learning on graphs via bootstrapping. In International Conference on Learning Representations.
- Bootstrapped representation learning on graphs. CoRR.
- Understanding self-supervised learning dynamics without contrastive pairs.
- Deep graph infomax. CoRR.
- Single-pass contrastive learning can work for both homophilic and heterophilic graph.
- Barlow twins: Self-supervised learning via redundancy reduction. In International Conference on Machine Learning, pages 12310–12320. PMLR.
- Deep graph contrastive representation learning.
- Predicting multicellular function through multi-layer tissue networks. Bioinformatics.