Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Multi-Perspective Analysis of Memorization in Large Language Models (2405.11577v4)

Published 19 May 2024 in cs.CL and cs.AI

Abstract: LLMs, trained on massive corpora with billions of parameters, show unprecedented performance in various fields. Though surprised by their excellent performances, researchers also noticed some special behaviors of those LLMs. One of those behaviors is memorization, in which LLMs can generate the same content used to train them. Though previous research has discussed memorization, the memorization of LLMs still lacks explanation, especially the cause of memorization and the dynamics of generating them. In this research, we comprehensively discussed memorization from various perspectives and extended the discussion scope to not only just the memorized content but also less and unmemorized content. Through various studies, we found that: (1) Through experiments, we revealed the relation of memorization between model size, continuation size, and context size. Further, we showed how unmemorized sentences transition to memorized sentences. (2) Through embedding analysis, we showed the distribution and decoding dynamics across model size in embedding space for sentences with different memorization scores. The n-gram statistics analysis presents d (3) An analysis over n-gram and entropy decoding dynamics discovered a boundary effect when the model starts to generate memorized sentences or unmemorized sentences. (4)We trained a Transformer model to predict the memorization of different models, showing that it is possible to predict memorizations by context.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (30)
  1. Exploring the limits of large scale pre-training.
  2. Guillaume Alain and Yoshua Bengio. 2018. Understanding intermediate layers using linear classifier probes.
  3. A review on language models as knowledge bases.
  4. Palm 2 technical report.
  5. Emergent and predictable memorization in large language models.
  6. Pythia: A suite for analyzing large language models across training and scaling.
  7. Language models are few-shot learners. In Advances in Neural Information Processing Systems, volume 33, pages 1877–1901. Curran Associates, Inc.
  8. Quantifying memorization across neural language models.
  9. Extracting training data from large language models. In 30th USENIX Security Symposium (USENIX Security 21), pages 2633–2650. USENIX Association.
  10. Bert: Pre-training of deep bidirectional transformers for language understanding.
  11. Vitaly Feldman. 2020. Does learning require memorization? a short tale about a long tail. In Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing, STOC 2020, page 954–959, New York, NY, USA. Association for Computing Machinery.
  12. The Pile: An 800gb dataset of diverse text for language modeling. arXiv preprint arXiv:2101.00027.
  13. Sok: Memorization in general-purpose large language models.
  14. Preventing generation of verbatim memorization in language models gives a false sense of privacy. In Proceedings of the 16th International Natural Language Generation Conference, pages 28–53, Prague, Czechia. Association for Computational Linguistics.
  15. Scaling laws for neural language models.
  16. Can language models learn from explanations in context? In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 537–563, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
  17. Are emergent abilities in large language models just in-context learning?
  18. Show your work: Scratchpads for intermediate computation with language models.
  19. Gpt-4 technical report.
  20. Language models as knowledge bases?
  21. Exploring the limits of transfer learning with a unified text-to-text transformer.
  22. NLP evaluation in trouble: On the need to measure LLM data contamination for each benchmark. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 10776–10787, Singapore. Association for Computational Linguistics.
  23. Are emergent abilities of large language models a mirage? In Thirty-seventh Conference on Neural Information Processing Systems.
  24. Localizing paragraph memorization in language models.
  25. Zhifan Sun and Antonio Valerio Miceli-Barone. 2024. Scaling behavior of machine translation with large language models under prompt injection attacks. In Proceedings of the First edition of the Workshop on the Scaling Behavior of Large Language Models (SCALE-LLM 2024), pages 9–23, St. Julian’s, Malta. Association for Computational Linguistics.
  26. Neural network studies. 1. comparison of overfitting and overtraining. Journal of Chemical Information and Modeling, 35(5):826–833.
  27. Pablo Villalobos. 2023. Scaling laws literature review. Accessed: 2024-04-29.
  28. Emergent abilities of large language models.
  29. A survey on large language model (llm) security and privacy: The good, the bad, and the ugly. High-Confidence Computing, 4(2):100211.
  30. Understanding deep learning requires rethinking generalization.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Bowen Chen (50 papers)
  2. Namgi Han (6 papers)
  3. Yusuke Miyao (34 papers)
X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

Youtube Logo Streamline Icon: https://streamlinehq.com