Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Measuring Forgetting of Memorized Training Examples (2207.00099v2)

Published 30 Jun 2022 in cs.LG

Abstract: Machine learning models exhibit two seemingly contradictory phenomena: training data memorization, and various forms of forgetting. In memorization, models overfit specific training examples and become susceptible to privacy attacks. In forgetting, examples which appeared early in training are forgotten by the end. In this work, we connect these phenomena. We propose a technique to measure to what extent models "forget" the specifics of training examples, becoming less susceptible to privacy attacks on examples they have not seen recently. We show that, while non-convex models can memorize data forever in the worst-case, standard image, speech, and LLMs empirically do forget examples over time. We identify nondeterminism as a potential explanation, showing that deterministically trained models do not forget. Our results suggest that examples seen early when training with extremely large datasets - for instance those examples used to pre-train a model - may observe privacy benefits at the expense of examples seen later.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (11)
  1. Matthew Jagielski (51 papers)
  2. Om Thakkar (25 papers)
  3. Florian Tramèr (87 papers)
  4. Daphne Ippolito (47 papers)
  5. Katherine Lee (34 papers)
  6. Nicholas Carlini (101 papers)
  7. Eric Wallace (42 papers)
  8. Shuang Song (54 papers)
  9. Abhradeep Thakurta (55 papers)
  10. Nicolas Papernot (123 papers)
  11. Chiyuan Zhang (57 papers)
Citations (88)

Summary

We haven't generated a summary for this paper yet.