Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Understanding Representation Learnability of Nonlinear Self-Supervised Learning (2401.03214v1)

Published 6 Jan 2024 in cs.LG and cs.AI

Abstract: Self-supervised learning (SSL) has empirically shown its data representation learnability in many downstream tasks. There are only a few theoretical works on data representation learnability, and many of those focus on final data representation, treating the nonlinear neural network as a ``black box". However, the accurate learning results of neural networks are crucial for describing the data distribution features learned by SSL models. Our paper is the first to analyze the learning results of the nonlinear SSL model accurately. We consider a toy data distribution that contains two features: the label-related feature and the hidden feature. Unlike previous linear setting work that depends on closed-form solutions, we use the gradient descent algorithm to train a 1-layer nonlinear SSL model with a certain initialization region and prove that the model converges to a local minimum. Furthermore, different from the complex iterative analysis, we propose a new analysis process which uses the exact version of Inverse Function Theorem to accurately describe the features learned by the local minimum. With this local minimum, we prove that the nonlinear SSL model can capture the label-related feature and hidden feature at the same time. In contrast, the nonlinear supervised learning (SL) model can only learn the label-related feature. We also present the learning processes and results of the nonlinear SSL and SL model via simulation experiments.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (35)
  1. A convergence theory for deep learning via over-parameterization. In International Conference on Machine Learning, 242–252. PMLR.
  2. A theoretical analysis of contrastive unsupervised representation learning. arXiv preprint arXiv:1902.09229.
  3. Signature verification using a” siamese” time delay neural network. Advances in neural information processing systems, 6.
  4. Globally optimal gradient descent for a convnet with gaussian inputs. In International conference on machine learning, 605–614. PMLR.
  5. Bubeck, S.; et al. 2015. Convex optimization: Algorithms and complexity. Foundations and Trends® in Machine Learning, 8(3-4): 231–357.
  6. Unsupervised learning of visual features by contrasting cluster assignments. Advances in Neural Information Processing Systems, 33: 9912–9924.
  7. A simple framework for contrastive learning of visual representations. In International conference on machine learning, 1597–1607. PMLR.
  8. Exploring Simple Siamese Representation Learning. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, June 19-25, 2021, 15750–15758. Computer Vision Foundation / IEEE.
  9. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
  10. Gradient descent finds global minima of deep neural networks. In International conference on machine learning, 1675–1685. PMLR.
  11. Gradient descent can take exponential time to escape saddle points. Advances in neural information processing systems, 30.
  12. Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems, 33: 21271–21284.
  13. Provable guarantees for self-supervised deep learning with spectral contrastive loss. Advances in Neural Information Processing Systems, 34: 5000–5011.
  14. Beyond separability: Analyzing the linear transferability of contrastive representations to related subpopulations. arXiv preprint arXiv:2204.02683.
  15. Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 9729–9738.
  16. Understanding dimensional collapse in contrastive self-supervised learning. arXiv preprint arXiv:2110.09348.
  17. Kahan, W. 1975. Spectra of nearly Hermitian matrices. Proceedings of the American Mathematical Society, 48(1): 11–17.
  18. Supervised contrastive learning. Advances in Neural Information Processing Systems, 33: 18661–18673.
  19. Predicting what you already know helps: Provable self-supervised learning. Advances in Neural Information Processing Systems, 34: 309–323.
  20. Learning overparameterized neural networks via stochastic gradient descent on structured data. Advances in neural information processing systems, 31.
  21. Convergence analysis of two-layer neural networks with relu activation. Advances in neural information processing systems, 30.
  22. Self-supervised learning is more robust to dataset imbalance. arXiv preprint arXiv:2110.05025.
  23. Language models are unsupervised multitask learners. OpenAI blog, 1(8): 9.
  24. Rudin, W.; et al. 1976. Principles of mathematical analysis, volume 3. McGraw-hill New York.
  25. Tian, Y. 2017. An analytical formula of population gradient for two-layered relu network and its applications in convergence and critical point analysis. In International conference on machine learning, 3404–3413. PMLR.
  26. Tian, Y. 2022a. Deep contrastive learning is provably (almost) principal component analysis. arXiv preprint arXiv:2201.12680.
  27. Tian, Y. 2022b. Understanding the Role of Nonlinearity in Training Dynamics of Contrastive Learning. arXiv preprint arXiv:2206.01342.
  28. Understanding self-supervised learning dynamics without contrastive pairs. In International Conference on Machine Learning, 10268–10278. PMLR.
  29. Understanding self-supervised learning with dual deep networks. arXiv preprint arXiv:2010.00578.
  30. Contrastive estimation reveals topic posterior information to linear models. J. Mach. Learn. Res., 22: 281–1.
  31. Semi-Supervised Semantic Segmentation Using Unreliable Pseudo-Labels. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 18-24, 2022, 4238–4247. IEEE.
  32. Toward understanding the feature learning process of self-supervised contrastive learning. In International Conference on Machine Learning, 11112–11122. PMLR.
  33. Self-supervised representations improve end-to-end speech translation. arXiv preprint arXiv:2006.12124.
  34. Learning one-hidden-layer relu networks via gradient descent. In The 22nd international conference on artificial intelligence and statistics, 1524–1534. PMLR.
  35. Is Self-Supervised Learning More Robust Than Supervised Learning? arXiv preprint arXiv:2206.05259.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Ruofeng Yang (2 papers)
  2. Xiangyuan Li (3 papers)
  3. Bo Jiang (235 papers)
  4. Shuai Li (295 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.