Non-negative Contrastive Learning (2403.12459v3)
Abstract: Deep representations have shown promising performance when transferred to downstream tasks in a black-box manner. Yet, their inherent lack of interpretability remains a significant challenge, as these features are often opaque to human understanding. In this paper, we propose Non-negative Contrastive Learning (NCL), a renaissance of Non-negative Matrix Factorization (NMF) aimed at deriving interpretable features. The power of NCL lies in its enforcement of non-negativity constraints on features, reminiscent of NMF's capability to extract features that align closely with sample clusters. NCL not only aligns mathematically well with an NMF objective but also preserves NMF's interpretability attributes, resulting in a more sparse and disentangled representation compared to standard contrastive learning (CL). Theoretically, we establish guarantees on the identifiability and downstream generalization of NCL. Empirically, we show that these advantages enable NCL to outperform CL significantly on feature disentanglement, feature selection, as well as downstream classification tasks. At last, we show that NCL can be easily extended to other learning scenarios and benefit supervised learning as well. Code is available at https://github.com/PKU-ML/non_neg.
- Variational learning with disentanglement-pytorch. arXiv preprint arXiv:1912.05184, 2019.
- Unsupervised feature learning and deep learning: A review and new perspectives. arXiv preprint arXiv:1206.5538, 2012.
- Representation learning: A review and new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8):1798–1828, 2013.
- Isolating sources of disentanglement in variational autoencoders. In NeurIPS, 2018.
- A simple framework for contrastive learning of visual representations. In ICML, 2020.
- A deep non-negative matrix factorization model for big data representation learning. Frontiers in Neurorobotics, 2021.
- Rethinking weak supervision in helping contrastive learning. In ICML, 2023.
- Imagenet: A large-scale hierarchical image database. In CVPR, 2009.
- On the equivalence of nonnegative matrix factorization and spectral clustering. In SDM, 2005.
- Theory and evaluation metrics for learning disentangled representations. In ICLR, 2020.
- On identifiability of nonnegative matrix factorization. IEEE Signal Processing Letters, 25(3):328–332, 2018.
- Imagenet-trained cnns are biased towards texture; increasing shape bias improves accuracy and robustness. arXiv preprint arXiv:1811.12231, 2018.
- Introductory overview of identifiability analysis: A guide to evaluating whether you have the right type of data for your modeling purpose. Environmental Modelling & Software, 2019.
- Contranorm: A contrastive learning perspective on oversmoothing and beyond. In ICLR, 2023.
- Provable guarantees for self-supervised deep learning with spectral contrastive loss. In NeurIPS, 2021.
- Deep residual learning for image recognition. In CVPR, 2016.
- Momentum contrast for unsupervised visual representation learning. In CVPR, 2020.
- Benchmarking neural network robustness to common corruptions and perturbations. In ICLR, 2019.
- Gaussian error linear units (gelus). arXiv preprint arXiv: 1606.08415, 2016.
- Non-negative matrix factorization revisited: Uniqueness and algorithm for symmetric decomposition. IEEE Transactions on Signal Processing, 62(1):211–224, 2014.
- Categorical reparameterization with gumbel-softmax. In ICLR, 2016.
- Variational autoencoders and nonlinear ica: A unifying framework. In AISTATS, 2020.
- Supervised contrastive learning. In NeurIPS, 2020.
- Auto-encoding variational bayes. In ICLR, 2014.
- Identifiability of deep generative models without auxiliary information. In NeurIPS, 2022.
- Learning multiple layers of features from tiny images. 2009.
- Matryoshka representation learning. In NeurIPS, 2022.
- Theorems on positive data: On the uniqueness of nmf. Computational Intelligence and Neuroscience, 2008.
- Learning the parts of objects by non-negative matrix factorization. Nature, 401(6755):788–791, 1999.
- Challenging common assumptions in the unsupervised learning of disentangled representations. In ICML, 2019.
- Dying relu and initialization: Theory and numerical examples. arXiv preprint arXiv: 1903.06733, 2019.
- Rethinking the effect of data augmentation in adversarial contrastive learning. In ICLR, 2023.
- Henryk Minc. Nonnegative matrices, volume 170. Wiley New York, 1988.
- Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748, 2018.
- OpenAI. New embedding models and api updates, 2024. URL https://openai.com/blog/new-embedding-models-and-api-updates.
- Orthogonal symmetric non-negative matrix factorization under the stochastic block model. arXiv preprint arXiv:1605.05349, 2016.
- Glove: Global vectors for word representation. In EMNLP, 2014.
- Learning transferable visual models from natural language supervision. In ICML, 2021a.
- Learning transferable visual models from natural language supervision. In ICML, 2021b.
- A theoretical analysis of contrastive unsupervised representation learning. In ICML, 2019.
- Grad-cam: Visual explanations from deep networks via gradient-based localization. In ICCV, 2017.
- Contrastive learning is spectral clustering on similarity graph. arXiv preprint arXiv:2303.15103, 2023.
- Learning robust global representations by penalizing local predictive power. In NeurIPS, 2019.
- Residual relaxation for multi-view representation learning. In NeurIPS, 2021.
- Chaos is a ladder: A new theoretical understanding of contrastive learning via augmentation overlap. In ICLR, 2022.
- Nonnegative matrix factorization: A comprehensive review. IEEE Transactions on Knowledge and Data Engineering, 25(6):1336–1353, 2012.
- Measuring disentanglement: A review of metrics. IEEE Transactions on Neural Networks and Learning Systems, 2020.
- How mask matters: Towards theoretical understandings of masked autoencoders. In NeurIPS, 2022.
- On the generalization of multi-modal contrastive learning. In ICML, 2023a.
- Identifiable contrastive learning with automatic feature importance discovery. In NeurIPS, 2023b.