Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
184 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

On Linear Separation Capacity of Self-Supervised Representation Learning (2310.19041v2)

Published 29 Oct 2023 in stat.ML, cs.LG, math.ST, and stat.TH

Abstract: Recent advances in self-supervised learning have highlighted the efficacy of data augmentation in learning data representation from unlabeled data. Training a linear model atop these enhanced representations can yield an adept classifier. Despite the remarkable empirical performance, the underlying mechanisms that enable data augmentation to unravel nonlinear data structures into linearly separable representations remain elusive. This paper seeks to bridge this gap by investigating under what conditions learned representations can linearly separate manifolds when data is drawn from a multi-manifold model. Our investigation reveals that data augmentation offers additional information beyond observed data and can thus improve the information-theoretic optimal rate of linear separation capacity. In particular, we show that self-supervised learning can linearly separate manifolds with a smaller distance than unsupervised learning, underscoring the additional benefits of data augmentation. Our theoretical analysis further underscores that the performance of downstream linear classifiers primarily hinges on the linear separability of data representations rather than the size of the labeled data set, reaffirming the viability of constructing efficient classifiers with limited labeled data amid an expansive unlabeled data set.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (64)
  1. Vatt: Transformers for multimodal self-supervised learning from raw video, audio and text. Advances in Neural Information Processing Systems, 34:24206–24221, 2021.
  2. M. A. Arcones. A bernstein-type inequality for u-statistics and u-processes. Statistics & probability letters, 22(3):239–247, 1995.
  3. Spectral clustering based on local pca. The Journal of Machine Learning Research, 18(1):253–309, 2017.
  4. A theoretical analysis of contrastive unsupervised representation learning. arXiv preprint arXiv:1902.09229, 2019.
  5. Learning representations by maximizing mutual information across views. Advances in neural information processing systems, 32, 2019.
  6. R. Balestriero and Y. LeCun. Contrastive and non-contrastive self-supervised learning recover global and local spectral embedding methods. arXiv preprint arXiv:2205.11508, 2022.
  7. A cookbook of self-supervised learning. arXiv preprint arXiv:2304.12210, 2023.
  8. R. Basri and D. W. Jacobs. Lambertian reflectance and linear subspaces. IEEE transactions on pattern analysis and machine intelligence, 25(2):218–233, 2003.
  9. M. Belkin and P. Niyogi. Laplacian eigenmaps for dimensionality reduction and data representation. Neural computation, 15(6):1373–1396, 2003.
  10. Concentration inequalities: A nonasymptotic theory of independence. Oxford university press, 2013.
  11. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
  12. A graph discretization of the laplace–beltrami operator. Journal of Spectral Theory, 4(4):675–714, 2015.
  13. The ssl interplay: Augmentations, inductive bias, and generalization. arXiv preprint arXiv:2302.02774, 2023.
  14. J. Calder and N. García Trillos. Improved spectral convergence rates for graph laplacians on ε𝜀\varepsilonitalic_ε-graphs and k-nn graphs. Applied and Computational Harmonic Analysis, 60:123–175, 2022.
  15. A group-theoretic framework for data augmentation. The Journal of Machine Learning Research, 21(1):9885–9955, 2020a.
  16. A simple framework for contrastive learning of visual representations. In International conference on machine learning, pages 1597–1607. PMLR, 2020b.
  17. X. Chen and K. He. Exploring simple siamese representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15750–15758, 2021.
  18. X. Chen and Y. Yang. Diffusion k-means clustering on manifolds: Provable exact recovery via semidefinite relaxations. Applied and Computational Harmonic Analysis, 52:303–347, 2021.
  19. Improved baselines with momentum contrastive learning. arXiv preprint arXiv:2003.04297, 2020c.
  20. R. R. Coifman and S. Lafon. Diffusion maps. Applied and Computational Harmonic Analysis, 21(1):5–30, 2006.
  21. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of NAACL-HLT, pages 4171–4186, 2019.
  22. An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations, 2021.
  23. Similarity based vehicle trajectory clustering and anomaly detection. In IEEE International Conference on Image Processing 2005, volume 2, pages II–602. Ieee, 2005.
  24. Error estimates for spectral convergence of the graph laplacian on random geometric graphs toward the laplace–beltrami operator. Foundations of Computational Mathematics, 20(4):827–887, 2020.
  25. Large sample spectral analysis of graph-based multi-manifold clustering. arXiv preprint arXiv:2107.13610, 2021.
  26. Bootstrap your own latent-a new approach to self-supervised learning. Advances in Neural Information Processing Systems, 33:21271–21284, 2020.
  27. Provable guarantees for self-supervised deep learning with spectral contrastive loss. Advances in Neural Information Processing Systems, 34, 2021.
  28. The elements of statistical learning: data mining, inference, and prediction, volume 2. Springer, 2009.
  29. Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9729–9738, 2020.
  30. Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16000–16009, 2022.
  31. G. E. Hinton and S. Roweis. Stochastic neighbor embedding. Advances in Neural Information Processing Systems, 15, 2002.
  32. Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670, 2018.
  33. Z. Ji and M. Telgarsky. Risk and parameter convergence of logistic regression. arXiv preprint arXiv:1803.07300, 2018.
  34. Contrastive learning can find an optimal basis for approximately view-invariant functions. arXiv preprint arXiv:2210.01883, 2022.
  35. Fine-tuning can distort pretrained features and underperform out-of-distribution. In International Conference on Learning Representations, 2022.
  36. Y. LeCun. A path towards autonomous machine intelligence. Open Review, 62, 2022.
  37. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.
  38. R. R. Lederman and R. Talmon. Learning the geometry of common latent variables using alternating-diffusion. Applied and Computational Harmonic Analysis, 44(3):509–536, 2018.
  39. Predicting what you already know helps: Provable self-supervised learning. Advances in Neural Information Processing Systems, 34:309–323, 2021.
  40. V. J. Martinez and E. Saar. Statistics of the galaxy distribution. Chapman and Hall/CRC, 2001.
  41. Reconstruction and estimation in the planted partition model. Probability Theory and Related Fields, 162:431–461, 2015.
  42. On spectral clustering: Analysis and an algorithm. Advances in neural information processing systems, 14, 2001.
  43. M. Penrose. Random geometric graphs, volume 5. OUP Oxford, 2003.
  44. The intrinsic dimension of images and its impact on learning. arXiv preprint arXiv:2104.08894, 2021.
  45. Improving language understanding by generative pre-training. OpenAI, 2018.
  46. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
  47. Understanding contrastive learning requires incorporating inductive biases. In International Conference on Machine Learning, pages 19250–19286. PMLR, 2022.
  48. C. Shorten and T. M. Khoshgoftaar. A survey on image data augmentation for deep learning. Journal of big data, 6(1):1–48, 2019.
  49. The implicit bias of gradient descent on separable data. The Journal of Machine Learning Research, 19(1):2822–2878, 2018.
  50. Contrastive multiview coding. In European conference on computer vision, pages 776–794. Springer, 2020a.
  51. Understanding self-supervised learning with dual deep networks. arXiv preprint arXiv:2010.00578, 2020b.
  52. Contrastive learning, multi-view redundancy, and linear models. In Algorithmic Learning Theory, pages 1179–1206. PMLR, 2021.
  53. Self-supervised learning from a multi-view perspective. arXiv preprint arXiv:2006.05576, 2020.
  54. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  55. R. Vidal and Y. Ma. A unified algebraic approach to 2-d and 3-d motion segmentation and estimation. Journal of Mathematical Imaging and Vision, 25(3):403–421, 2006.
  56. U. Von Luxburg. A tutorial on spectral clustering. Statistics and computing, 17(4):395–416, 2007.
  57. S. Wang. Augmentation invariant manifold learning. arXiv preprint arXiv:2211.00460, 2022a.
  58. S. Wang. Self-supervised metric learning in multi-view data: A downstream task perspective. Journal of the American Statistical Association, just-accepted, 2022b.
  59. Generalizing from a few examples: A survey on few-shot learning. ACM Computing Surveys, 53(3):1–34, 2020.
  60. Theoretical analysis of self-training with deep networks on unlabeled data. arXiv preprint arXiv:2010.03622, 2020.
  61. Z. Wen and Y. Li. Toward understanding the feature learning process of self-supervised contrastive learning. In International Conference on Machine Learning, pages 11112–11122. PMLR, 2021.
  62. Z. Wen and Y. Li. The mechanism of prediction head in non-contrastive self-supervised learning. arXiv preprint arXiv:2205.06226, 2022.
  63. Barlow twins: Self-supervised learning via redundancy reduction. In International Conference on Machine Learning, pages 12310–12320. PMLR, 2021.
  64. A comprehensive survey on pretrained foundation models: A history from bert to chatgpt. arXiv preprint arXiv:2302.09419, 2023.

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets