Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
126 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Memorization in Self-Supervised Learning Improves Downstream Generalization (2401.12233v3)

Published 19 Jan 2024 in cs.LG

Abstract: Self-supervised learning (SSL) has recently received significant attention due to its ability to train high-performance encoders purely on unlabeled data-often scraped from the internet. This data can still be sensitive and empirical evidence suggests that SSL encoders memorize private information of their training data and can disclose them at inference time. Since existing theoretical definitions of memorization from supervised learning rely on labels, they do not transfer to SSL. To address this gap, we propose SSLMem, a framework for defining memorization within SSL. Our definition compares the difference in alignment of representations for data points and their augmented views returned by both encoders that were trained on these data points and encoders that were not. Through comprehensive empirical analysis on diverse encoder architectures and datasets we highlight that even though SSL relies on large datasets and strong augmentations-both known in supervised learning as regularization techniques that reduce overfitting-still significant fractions of training data points experience high memorization. Through our empirical results, we show that this memorization is essential for encoders to achieve higher generalization performance on different downstream tasks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (50)
  1. Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC conference on computer and communications security, pp.  308–318, 2016.
  2. A theoretical analysis of contrastive unsupervised representation learning. arXiv preprint arXiv:1902.09229, 2019.
  3. A closer look at memorization in deep networks. In International conference on machine learning, pp.  233–242. PMLR, 2017.
  4. For self-supervised learning, rationality implies generalization, provably. In International Conference on Learning Representations, 2020.
  5. VICReg: Variance-invariance-covariance regularization for self-supervised learning. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=xm6YD62D1Ub.
  6. Benign overfitting in linear regression. Proceedings of the National Academy of Sciences, 117(48):30063–30070, 2020.
  7. Representation learning: A review and new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8):1798–1828, 2013. doi: 10.1109/TPAMI.2013.50.
  8. The ssl interplay: Augmentations, inductive bias, and generalization. In International Conference on Machine Learning, pp.  3252–3298. PMLR, 2023.
  9. The secret sharer: Evaluating and testing unintended memorization in neural networks. In 28th USENIX Security Symposium (USENIX Security 19), pp.  267–284, 2019.
  10. Extracting training data from large language models. In 30th USENIX Security Symposium (USENIX Security 21), pp.  2633–2650, 2021.
  11. The privacy onion effect: Memorization is relative. Advances in Neural Information Processing Systems, 35:13263–13276, 2022.
  12. Emerging properties in self-supervised vision transformers. In Proceedings of the IEEE/CVF international conference on computer vision, pp.  9650–9660, 2021.
  13. Satrajit Chatterjee. Learning and memorization. In International conference on machine learning, pp.  755–763. PMLR, 2018.
  14. A simple framework for contrastive learning of visual representations. In International conference on machine learning, pp.  1597–1607. PMLR, 2020.
  15. Exploring simple siamese representation learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.  15750–15758, 2021.
  16. An analysis of single-layer networks in unsupervised feature learning. In Proceedings of the fourteenth international conference on artificial intelligence and statistics, pp.  215–223. JMLR Workshop and Conference Proceedings, 2011.
  17. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
  18. Cynthia Dwork. Differential privacy. In International colloquium on automata, languages, and programming, pp.  1–12. Springer, 2006.
  19. Vitaly Feldman. Does learning require memorization? a short tale about a long tail. In Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing, pp.  954–959, 2020.
  20. What neural networks memorize and why: Discovering the long tail via influence estimation. Advances in Neural Information Processing Systems, 33:2881–2891, 2020.
  21. Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.  16000–16009, 2022.
  22. On feature decorrelation in self-supervised learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.  9598–9608, 2021.
  23. Towards the generalization of contrastive self-supervised learning. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=XDJwuEYHhme.
  24. Measuring forgetting of memorized training examples. In The Eleventh International Conference on Learning Representations, 2022.
  25. Understanding dimensional collapse in contrastive self-supervised learning. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=YevsQ05DEN7.
  26. Conservative or liberal? personalized differential privacy. In 2015 IEEE 31St international conference on data engineering, pp.  1023–1034. IEEE, 2015.
  27. Learning multiple layers of features from tiny images. 2009.
  28. Encodermi: Membership inference against pre-trained encoders in contrastive learning. In Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security, pp.  2081–2095, 2021.
  29. Can neural network memorization be localized? arXiv preprint arXiv:2307.09542, 2023.
  30. Do ssl models have déjà vu? a case of unintended memorization in self-supervised learning. arXiv e-prints, pp.  arXiv–2304, 2023.
  31. Pushmeet Kohli Nathan Silberman, Derek Hoiem and Rob Fergus. Indoor segmentation and support inference from rgbd images. In ECCV, 2012.
  32. Reading digits in natural images with unsupervised feature learning. 2011.
  33. Infonce loss provably learns cluster-preserving representations. arXiv preprint arXiv:2302.07920, 2023.
  34. Deep learning on a data diet: Finding important examples early in training. Advances in Neural Information Processing Systems, 34:20596–20607, 2021.
  35. Contrasting the landscape of contrastive and non-contrastive learning. In Gustau Camps-Valls, Francisco J. R. Ruiz, and Isabel Valera (eds.), Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, volume 151 of Proceedings of Machine Learning Research, pp.  8592–8618. PMLR, 28–30 Mar 2022. URL https://proceedings.mlr.press/v151/pokle22a.html.
  36. Language models are unsupervised multitask learners.
  37. Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 21(1):5485–5551, 2020.
  38. U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18, pp.  234–241. Springer, 2015.
  39. Imagenet large scale visual recognition challenge. International journal of computer vision, 115:211–252, 2015.
  40. On the memorization properties of contrastive learning. arXiv preprint arXiv:2107.10143, 2021.
  41. Active learning for convolutional neural networks: A core-set approach. In International Conference on Learning Representations, 2018.
  42. Membership inference attacks against machine learning models. In 2017 IEEE symposium on security and privacy (SP), pp.  3–18. IEEE, 2017.
  43. Memorization without overfitting: Analyzing the training dynamics of large language models. Advances in Neural Information Processing Systems, 35:38274–38290, 2022.
  44. Core vector machines: Fast svm training on very large data sets. Journal of Machine Learning Research, 6(4), 2005.
  45. Chaos is a ladder: A new theoretical understanding of contrastive learning via augmentation overlap. arXiv preprint arXiv:2203.13457, 2022.
  46. Unified perceptual parsing for scene understanding. In Proceedings of the European conference on computer vision (ECCV), pp.  418–434, 2018.
  47. Vip: A differentially private foundation model for computer vision. arXiv preprint arXiv:2306.08842, 2023.
  48. Understanding deep learning requires rethinking generalization. In International Conference on Learning Representations, 2016.
  49. How mask matters: Towards theoretical understandings of masked autoencoders. Advances in Neural Information Processing Systems, 35:27127–27139, 2022.
  50. Semantic understanding of scenes through the ade20k dataset. International Journal of Computer Vision, 127:302–321, 2019.
Citations (7)

Summary

  • The paper presents SSLMem, a novel framework defining and quantifying memorization in self-supervised learning.
  • It demonstrates that encoder memorization, especially of atypical samples, correlates with enhanced performance on various downstream tasks.
  • The study reveals that techniques reducing memorization, such as differential privacy, may inadvertently lower downstream task effectiveness.

Introduction

Self-supervised learning (SSL) has emerged as an influential paradigm in recent years, offering a less resource-intensive alternative to supervised learning by making use of unlabeled data. Until now, the implications of data memorization in SSL were murky due to a lack of precise definition, distinct from the foundations established in supervised learning which hinge on label reliance. Addressing this deficiency, a novel framework has been introduced—SSLMem—that encapsulates memorization within the SSL context.

The SSLMem Framework

The newly proposed SSLMem framework constructs its definition of memorization based on the difference in alignment or representation similarity for data points and their augmented views, as processed by encoders trained with or without the subject data points. It takes into account that SSL is defined by its absence of labels and varying optimization objectives across different SSL methods. This work positions augmentations and the alignment of representations as unifying elements that transcend various SSL approaches, allowing for a comparison of memorization effects in a label-agnostic and method-independent fashion.

Empirical Analysis and Findings

The empirical analysis conducted with SSLMem incorporated multiple encoder architectures and datasets, revealing that SSL encoders—despite their reliance on extensive datasets and aggressive data augmentations as regularizers—still exhibit substantial memorization of training data. Atypical samples, in particular, garner higher memorization levels, a phenomenon paralleling trends in supervised learning. This paper also makes a striking revelation: Encoder memorization is paramount to achieving superior generalization performance across a spectrum of downstream tasks and distributions.

Downstream Impact and Conclusion

The paper's examination extends to an evaluation of the pertinence of memorization in various downstream applications, from semantic segmentation to classification tasks. Results unambiguously point to memorization as a vital cog in the SSL machinery that augments downstream generalization capabilities. Furthermore, the paper observes that interventions such as differential privacy, which aim to curtail memorization and thereby enhance data privacy, can inversely affect downstream task performance, thus underscoring the tightrope walk between privacy and model utility in the SSL domain. This intensive inquiry into SSL memorization not only lays the groundwork for future explorations but firmly cements the role of memorization in the robustness and agility of SSL models.