LDReg: Local Dimensionality Regularized Self-Supervised Learning (2401.10474v2)
Abstract: Representations learned via self-supervised learning (SSL) can be susceptible to dimensional collapse, where the learned representation subspace is of extremely low dimensionality and thus fails to represent the full data distribution and modalities. Dimensional collapse also known as the "underfilling" phenomenon is one of the major causes of degraded performance on downstream tasks. Previous work has investigated the dimensional collapse problem of SSL at a global level. In this paper, we demonstrate that representations can span over high dimensional space globally, but collapse locally. To address this, we propose a method called $\textit{local dimensionality regularization (LDReg)}$. Our formulation is based on the derivation of the Fisher-Rao metric to compare and optimize local distance distributions at an asymptotically small radius for each data point. By increasing the local intrinsic dimensionality, we demonstrate through a range of experiments that LDReg improves the representation quality of SSL. The results also show that LDReg can regularize dimensionality at both local and global levels.
- Extreme-value-theoretic estimation of local intrinsic dimensionality. Data Mining and Knowledge Discovery, 2018.
- Intrinsic dimension of data representations in deep neural networks. In NeurIPS, 2019.
- Local intrinsic dimensionality, entropy and statistical divergences. Entropy, 24(9):1220, 2022.
- BEit: BERT pre-training of image transformers. In ICLR, 2022.
- VICReg: Variance-invariance-covariance regularization for self-supervised learning. In ICLR, 2022.
- Birdsnap: Large-scale fine-grained visual categorization of birds. In CVPR, 2014.
- Food-101–mining discriminative components with random forests. In ECCV, 2014.
- Intrinsic dimensionality estimation with optimally topology preserving maps. TPAMI, 1998.
- Deep clustering for unsupervised learning of visual features. In ECCV, 2018.
- Unsupervised learning of visual features by contrasting cluster assignments. NeurIPS, 2020.
- Emerging properties in self-supervised vision transformers. In ICCV, 2021.
- Learning on statistical manifolds for clustering and visualization. In Allerton Conference on Communication, Control, and Computing, 2007.
- Danco: An intrinsic dimensionality estimator exploiting angle and norm concentration. Pattern recognition, 47(8), 2014.
- A simple framework for contrastive learning of visual representations. In ICML, 2020a.
- Big self-supervised models are strong semi-supervised learners. NeurIPS, 2020b.
- Exploring simple siamese representation learning. In CVPR, 2021.
- An empirical study of training self-supervised vision transformers. In ICCV, 2021.
- Describing textures in the wild. In CVPR, 2014.
- Toward a geometrical understanding of self-supervised contrastive learning. arXiv preprint arXiv:2205.06926, 2022.
- Marco Del Giudice. Effective dimensionality: A tutorial. Multivariate behavioral research, 56(3):527–542, 2021.
- Imagenet: A large-scale hierarchical image database. In CVPR, 2009.
- Collider: A robust training framework for backdoor data. In ACCV, 2022.
- An image is worth 16x16 words: Transformers for image recognition at scale. In ICLR, 2021.
- Improving self-supervised learning by characterizing idealized representations. NeurIPS, 2022.
- With a little help from my friends: Nearest-neighbor contrastive learning of visual representations. In ICCV, 2021.
- Whitening for self-supervised representation learning. In ICML, 2021.
- Rankme: Assessing the downstream performance of pretrained self-supervised representations by their rank. In ICML, 2023a.
- On the duality between contrastive and non-contrastive self-supervised learning. In ICLR, 2023b.
- Soft neighbors are positive supporters in contrastive visual representation learning. In ICLR, 2023.
- Rich feature hierarchies for accurate object detection and semantic segmentation. In CVPR, 2014.
- On the intrinsic dimensionality of image representations. In CVPR, 2019.
- Bootstrap your own latent-a new approach to self-supervised learning. NeurIPS, 2020.
- Exploring the gap between collapsed & whitened features in self-supervised learning. In ICML, 2022.
- Deep residual learning for image recognition. In CVPR, 2016.
- Momentum contrast for unsupervised visual representation learning. In CVPR, 2020.
- Masked autoencoders are scalable vision learners. In CVPR, 2022.
- Autoencoders, minimum description length and helmholtz free energy. NeurIPS, 1993.
- Michael E Houle. Local intrinsic dimensionality I: an extreme-value-theoretic foundation for similarity applications. In SISAP, 2017a.
- Michael E Houle. Local intrinsic dimensionality II: multivariate analysis and distributional support. In SISAP, 2017b.
- Dimensional testing for multi-step similarity search. In ICDM, 2012.
- On feature decorrelation in self-supervised learning. In ICCV, 2021.
- Towards the generalization of contrastive self-supervised learning. In ICLR, 2023.
- Understanding dimensional collapse in contrastive self-supervised learning. In ICLR, 2022.
- Jacob Devlin Ming-Wei Chang Kenton and Lee Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. In NAACL-HLT, 2019.
- Collecting a large-scale dataset of fine-grained cars. 2013.
- Learning multiple layers of features from tiny images. 2009.
- Maximum likelihood estimation of intrinsic dimension. NeurIPS, 2004.
- Understanding collapse in non-contrastive siamese representation learning. In ECCV, 2022.
- Microsoft coco: Common objects in context. In ECCV, 2014.
- Sgdr: Stochastic gradient descent with warm restarts. arXiv preprint arXiv:1608.03983, 2016.
- Decoupled weight decay regularization. In ICLR, 2019.
- Characterizing adversarial subspaces using local intrinsic dimensionality. In ICLR, 2018a.
- Dimensionality-driven learning with noisy labels. In ICML, 2018b.
- Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748, 2018.
- Unsupervised visual representation learning by synchronous momentum grouping. In ECCV, 2022.
- An intrinsic dimensionality estimator from near-neighbor information. TPAMI, 1979.
- The intrinsic dimension of images and its impact on learning. In ICLR, 2021.
- Measuring dependency via intrinsic dimensionality. In ICPR, 2016.
- The effective rank: A measure of effective dimensionality. In European signal processing conference, 2007.
- Stephen Taylor. Clustering financial return distributions using the fisher information metric. Entropy, 21(2):110, 2019.
- Understanding self-supervised learning dynamics without contrastive pairs. In ICML, 2021.
- How should relative changes be measured? The American Statistician, 39(1):43–46, 1985.
- Self-supervised learning with data augmentations provably isolates content from style. NeurIPS, 2021.
- On the importance of hyperparameters and data augmentation for self-supervised learning. In First Workshop on Pre-training: Perspectives, Pitfalls, and Paths Forward at ICML, 2022.
- Mosaic representation learning for self-supervised visual pre-training. In ICLR, 2023.
- Simmim: A simple framework for masked image modeling. In CVPR, 2022.
- Decoupled contrastive learning. In ECCV, 2022.
- Large batch training of convolutional networks. arXiv preprint arXiv:1708.03888, 2017.
- Barlow twins: Self-supervised learning via redundancy reduction. In ICML, 2021.
- How mask matters: Towards theoretical understandings of masked autoencoders. In NeurIPS, 2022.
- Zero-cl: Instance and feature decorrelation for negative-free symmetric contrastive learning. In ICLR, 2021.
- Towards a unified theoretical understanding of non-contrastive learning via rank differential mechanism. In ICLR, 2023.