Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
194 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Non-negative Contrastive Learning (2403.12459v3)

Published 19 Mar 2024 in cs.LG, cs.AI, cs.CV, and stat.ML

Abstract: Deep representations have shown promising performance when transferred to downstream tasks in a black-box manner. Yet, their inherent lack of interpretability remains a significant challenge, as these features are often opaque to human understanding. In this paper, we propose Non-negative Contrastive Learning (NCL), a renaissance of Non-negative Matrix Factorization (NMF) aimed at deriving interpretable features. The power of NCL lies in its enforcement of non-negativity constraints on features, reminiscent of NMF's capability to extract features that align closely with sample clusters. NCL not only aligns mathematically well with an NMF objective but also preserves NMF's interpretability attributes, resulting in a more sparse and disentangled representation compared to standard contrastive learning (CL). Theoretically, we establish guarantees on the identifiability and downstream generalization of NCL. Empirically, we show that these advantages enable NCL to outperform CL significantly on feature disentanglement, feature selection, as well as downstream classification tasks. At last, we show that NCL can be easily extended to other learning scenarios and benefit supervised learning as well. Code is available at https://github.com/PKU-ML/non_neg.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (50)
  1. Variational learning with disentanglement-pytorch. arXiv preprint arXiv:1912.05184, 2019.
  2. Unsupervised feature learning and deep learning: A review and new perspectives. arXiv preprint arXiv:1206.5538, 2012.
  3. Representation learning: A review and new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8):1798–1828, 2013.
  4. Isolating sources of disentanglement in variational autoencoders. In NeurIPS, 2018.
  5. A simple framework for contrastive learning of visual representations. In ICML, 2020.
  6. A deep non-negative matrix factorization model for big data representation learning. Frontiers in Neurorobotics, 2021.
  7. Rethinking weak supervision in helping contrastive learning. In ICML, 2023.
  8. Imagenet: A large-scale hierarchical image database. In CVPR, 2009.
  9. On the equivalence of nonnegative matrix factorization and spectral clustering. In SDM, 2005.
  10. Theory and evaluation metrics for learning disentangled representations. In ICLR, 2020.
  11. On identifiability of nonnegative matrix factorization. IEEE Signal Processing Letters, 25(3):328–332, 2018.
  12. Imagenet-trained cnns are biased towards texture; increasing shape bias improves accuracy and robustness. arXiv preprint arXiv:1811.12231, 2018.
  13. Introductory overview of identifiability analysis: A guide to evaluating whether you have the right type of data for your modeling purpose. Environmental Modelling & Software, 2019.
  14. Contranorm: A contrastive learning perspective on oversmoothing and beyond. In ICLR, 2023.
  15. Provable guarantees for self-supervised deep learning with spectral contrastive loss. In NeurIPS, 2021.
  16. Deep residual learning for image recognition. In CVPR, 2016.
  17. Momentum contrast for unsupervised visual representation learning. In CVPR, 2020.
  18. Benchmarking neural network robustness to common corruptions and perturbations. In ICLR, 2019.
  19. Gaussian error linear units (gelus). arXiv preprint arXiv: 1606.08415, 2016.
  20. Non-negative matrix factorization revisited: Uniqueness and algorithm for symmetric decomposition. IEEE Transactions on Signal Processing, 62(1):211–224, 2014.
  21. Categorical reparameterization with gumbel-softmax. In ICLR, 2016.
  22. Variational autoencoders and nonlinear ica: A unifying framework. In AISTATS, 2020.
  23. Supervised contrastive learning. In NeurIPS, 2020.
  24. Auto-encoding variational bayes. In ICLR, 2014.
  25. Identifiability of deep generative models without auxiliary information. In NeurIPS, 2022.
  26. Learning multiple layers of features from tiny images. 2009.
  27. Matryoshka representation learning. In NeurIPS, 2022.
  28. Theorems on positive data: On the uniqueness of nmf. Computational Intelligence and Neuroscience, 2008.
  29. Learning the parts of objects by non-negative matrix factorization. Nature, 401(6755):788–791, 1999.
  30. Challenging common assumptions in the unsupervised learning of disentangled representations. In ICML, 2019.
  31. Dying relu and initialization: Theory and numerical examples. arXiv preprint arXiv: 1903.06733, 2019.
  32. Rethinking the effect of data augmentation in adversarial contrastive learning. In ICLR, 2023.
  33. Henryk Minc. Nonnegative matrices, volume 170. Wiley New York, 1988.
  34. Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748, 2018.
  35. OpenAI. New embedding models and api updates, 2024. URL https://openai.com/blog/new-embedding-models-and-api-updates.
  36. Orthogonal symmetric non-negative matrix factorization under the stochastic block model. arXiv preprint arXiv:1605.05349, 2016.
  37. Glove: Global vectors for word representation. In EMNLP, 2014.
  38. Learning transferable visual models from natural language supervision. In ICML, 2021a.
  39. Learning transferable visual models from natural language supervision. In ICML, 2021b.
  40. A theoretical analysis of contrastive unsupervised representation learning. In ICML, 2019.
  41. Grad-cam: Visual explanations from deep networks via gradient-based localization. In ICCV, 2017.
  42. Contrastive learning is spectral clustering on similarity graph. arXiv preprint arXiv:2303.15103, 2023.
  43. Learning robust global representations by penalizing local predictive power. In NeurIPS, 2019.
  44. Residual relaxation for multi-view representation learning. In NeurIPS, 2021.
  45. Chaos is a ladder: A new theoretical understanding of contrastive learning via augmentation overlap. In ICLR, 2022.
  46. Nonnegative matrix factorization: A comprehensive review. IEEE Transactions on Knowledge and Data Engineering, 25(6):1336–1353, 2012.
  47. Measuring disentanglement: A review of metrics. IEEE Transactions on Neural Networks and Learning Systems, 2020.
  48. How mask matters: Towards theoretical understandings of masked autoencoders. In NeurIPS, 2022.
  49. On the generalization of multi-modal contrastive learning. In ICML, 2023a.
  50. Identifiable contrastive learning with automatic feature importance discovery. In NeurIPS, 2023b.
Citations (5)

Summary

  • The paper presents Non-negative Contrastive Learning, which integrates non-negativity constraints into contrastive frameworks to enhance representation interpretability.
  • It employs a methodology inspired by NMF to enforce sparsity and orthogonality, ensuring feature identifiability and unique representations.
  • The enhanced semantic consistency and potential for optimal downstream performance underscore its practical advantages over traditional CL methods.

Non-negative Contrastive Learning: Enhancing Interpretability of Self-supervised Representations

Introduction

Recent advancements in self-supervised learning, especially in Contrastive Learning (CL), have pushed the boundaries of unsupervised representation learning. However, despite achieving impressive downstream task performance, CL often results in representations that are inherently non-explanatory and lack interpretability. This limitation restricts the ability to understand and trust the model's decision-making process, a critical aspect, especially in applications requiring high reliability and transparency.

Addressing this gap, we introduce Non-negative Contrastive Learning (NCL), a novel approach that infuses non-negativity constraints into the CL framework, inspired by the well-established Non-negative Matrix Factorization (NMF) technique. NCL not only preserves the robust representation learning capability of CL but also significantly enhances feature interpretability, sparsity, and orthogonality. In this post, we delve into the theoretical underpinning of NCL, its empirical benefits, and its broader implications.

Non-negative Contrastive Learning

NCL emerges from the philosophical underpinnings of NMF, which is renowned for its part-based representation capability leading to interpretable features. By imposing non-negativity constraints on the feature representations, similar to the manifests in NMF, NCL ensures that the learned features are strictly non-negative. This condition induces sparsity and orthogonality in the extracted features, thereby enhancing their interpretability. NCL can be viewed as an extension to the traditional CL, where NCL additionally requires the feature vectors to be non-negative.

Empirically, we demonstrate that, compared to CL, NCL significantly improves semantic consistency, where top-activated examples along each feature dimension exhibit better semantic alignment. Furthermore, NCL features display a higher degree of sparsity and orthogonality, attributes conducive to interpretability.

Theoretical Insights into NCL

The theoretical foundation of NCL is laid on the equivalence between its objective function and that of NMF, specially tailored for the CL setting. This adaptation brings forth several desirable properties:

  • Identifiability: Under mild conditions, NCL enjoys feature identifiability, ensuring the learned representations are unique up to permutation and scaling. This property is crucial for disentangling the learned features, aligning them closer to the inherent data structure.
  • Optimality and Sparsity: We establish that under certain assumptions, the optimal solution for NCL aligns with the latent class probabilities, thus ensuring the learned representations are not only interpretable but also sparse.
  • Downstream Generalization: Theoretical analysis demonstrates that NCL, with its identifiable and sparse features, can potentially achieve Bayes-optimal error in downstream tasks, underlining its efficacy beyond just interpretability.

Practical Implications and Applications

Leveraging its theoretical advantages, NCL manifests its practical utility in various tasks. In feature selection and disentanglement tasks, NCL outperforms CL significantly, showcasing its capability to extract more meaningful and robust representations. Moreover, when evaluated on standard downstream classification tasks, NCL demonstrates comparable or superior performance to CL, emphasizing its practical relevance.

Future Directions

NCL opens up new avenues for enhancing interpretability in learning representations. Its extension to supervised learning scenarios and applicability across different domains signify its versatility. Further exploration on integrating NCL with broader learning paradigms presents an exciting research trajectory, potentially leading to more interpretable, trustworthy, and high-performing models.

In conclusion, Non-negative Contrastive Learning stands as a significant stride toward bridging the gap between performance and interpretability in representation learning. By imbibing the essence of NMF into the field of CL, NCL paves the way for a new generation of interpretable and high-performing models, applicable across a wide range of domains and tasks.