Forgetting Measure (FM) Overview
- Forgetting Measure (FM) is a diagnostic metric that quantifies information loss over time or updates across domains like language models, federated learning, and human memory.
- It employs diverse mathematical formulations—ranging from sample-wise and class-wise to probabilistic assessments—to capture transitions in accuracy and inferential strength.
- FM guides practical strategies for retention, privacy, and theory pruning, thereby influencing adaptive rehearsal protocols and model adjustment in both AI and cognitive science.
A forgetting measure (FM) quantifies the degree to which an agent, model, or information system loses previously acquired information as a result of learning, updating, or passage of time. It formalizes the notion of information loss, either at the granular (sample-wise, token-wise, class-wise) or structural (theoretical, inferential, model-theoretic) level. Across domains—LLMs, federated learning, knowledge representation, human memory—forgetting measures serve both as diagnostic metrics and as benchmarks guiding the development of retention and rehearsal strategies.
1. Mathematical Formulations of Forgetting Measures
Forgetting measures span a wide array of formal definitions, each grounded in the specifics of the domain and the questions of interest.
Sample-wise Forgetting in LLM Post-Training
The framework of "Mapping Post-Training Forgetting in LLMs at Scale" defines FM as the fraction of items for which a model transitions from correct (pre-training) to incorrect (post-training):
A companion metric, backward transfer (BT), tracks 0→1 transitions. Chance-adjusted variants correct for random guessing by subtracting analytically derived baselines (Harmon et al., 20 Oct 2025):
Federated Learning: Round-wise Class-Granular FM
Flashback introduces a per-round loss-based FM capturing negative changes in per-class accuracy:
This strictly aggregates losses, ignoring gains, ensuring that net knowledge loss is visible even when overall accuracy improves in some classes (Aljahdali et al., 8 Feb 2024).
Privacy-oriented Example-wise FM in Supervised ML
In the context of privacy attacks, FM is mechanism-agnostic and attack-calibrated:
Here is the adversarial success rate distinguishing model inclusion of after further training steps; is (A, α, k)-forgotten if (Jagielski et al., 2022).
Information-structural FM via Model Counting and Probability
In probabilistic logic and knowledge representation, Doherty and Szałas define three interlocked FM loss functions quantifying inferential strength drop following variable forgetting:
Their probabilistic analogues replace counts with measures (Doherty et al., 3 Apr 2024).
Human Memory: Fractional Dynamics and Power-law FM
In chunk-memory models, forgetting is parameterized by the tail exponent in
The FM here is precisely , also expressible as (Lubashevsky et al., 2014).
Retroactive Interference Models
Here, , where is the retention function computed analytically, and the exponent of the power law is fit from empirical memory data (Georgiou et al., 2019).
Long-context LM Memorization: Forgetting Curve Gap
The FM for memorization length is operationalized as
where gives token-wise copy accuracy and the baseline LM accuracy under unrelated contexts. and are derived memory lengths (Liu et al., 7 Oct 2024).
2. Domains of Application
Forgetting measures have been developed and deployed in several key areas:
- LLMs: quantifying knowledge loss and backward transfer during post-training, surcharge (RL, SFT), and model merging (Harmon et al., 20 Oct 2025, Liao et al., 22 Oct 2024, Liu et al., 7 Oct 2024).
- Federated learning: monitoring loss of information in highly heterogeneous data aggregation across rounds and classes (Aljahdali et al., 8 Feb 2024).
- Privacy and data-removal: empirically bounding remaining attack surface after data is passively "forgotten" in large models (Jagielski et al., 2022).
- Knowledge representation: comparing various logical forgetting operators by inferential strength loss (Doherty et al., 3 Apr 2024).
- Human and animal memory: fitting recognition curves and characterizing age-stabilization via power-law decay exponents (Lubashevsky et al., 2014, Georgiou et al., 2019).
- Semantic desktops: continuous relevance estimation via exponentially decaying Memory Buoyancy (Jilek et al., 2018).
3. Theoretical Insights and Foundations
Forgetting is not a defect but an adaptive, information-theoretic process. The measure in "Forgetting is Everywhere" quantifies divergence between baseline induced futures and simulated post-update mixtures:
Here, full self-consistency () is achieved only by ideal Bayesian learners; practical algorithms generally violate it. Moderate forgetting is not universally harmful—it often correlates with accelerated adaptation in nonstationary or class-incremental settings (Sanati et al., 6 Nov 2025).
FM in memory models exposes the independence between learning and forgetting exponents, supporting dissociation and flexible tuning of retention versus learning speed (Lubashevsky et al., 2014). Retroactive interference models explain age-dependent stabilization quantitatively (Georgiou et al., 2019).
Model-counting FM establishes that inferential loss is additive for disjoint theories and monotonic in the forgotten variable set, giving formal guarantees for reasoning system modularity and robustness (Doherty et al., 3 Apr 2024).
4. Practical Computation and Benchmarking
Forgetting measures typically require fine-grained bookkeeping:
- Sample-wise FM: comparison of correctness states pre- and post-update; counting transitions.
- Class-wise FM: computation of negative deltas per class accuracy; aggregation over rounds or stages.
- Privacy FM: parallel training runs with and without probe examples (canaries); measurement of attack success rates such as membership inference or canary exposure.
- Information-structural FM: model counting or probability assignment via logic-program translation (ProbLog), quantifier elimination, and explicit counting/valuation queries (Doherty et al., 3 Apr 2024).
- Power-law FM: fitting exponents to the empirical decay or retention curve, via log-log regression or closed-form maximum-likelihood estimation (Lubashevsky et al., 2014, Georgiou et al., 2019).
- Semantic Desktop MB: event-triggered update loop with exponentially decaying "buoyancy" plus contextual fusion (Jilek et al., 2018).
- Long-context LM FM: iterative measurement of copy vs. LM accuracy at increasing context lengths, automated sample extraction (Liu et al., 7 Oct 2024).
5. Empirical Observations Across Learning Settings
Multiple studies confirm key behaviors:
- Low-to-moderate forgetting and backward transfer are typical in large-scale LLM post-training; larger model scales consistently mitigate these effects (Harmon et al., 20 Oct 2025).
- Roundwise FM in federated learning reveals that knowledge acquired by clients is readily lost during aggregation, especially under data heterogeneity; techniques such as dynamic distillation (Flashback) markedly reduce FM and accelerate convergence (Aljahdali et al., 8 Feb 2024).
- Privacy FM shows that deterministic training leaves all injected probes vulnerable indefinitely, while stochastic SGD enables gradual forgetting; examples injected early in training are forgotten faster (Jagielski et al., 2022).
- Model-counting FM supports rapid, automated comparison of logic-reduction policies with formal additivity and monotonicity (Doherty et al., 3 Apr 2024).
- Power-law FM exponents in human and animal memory settle robustly around that observed in recognition data (approx. 0.8 for n=5-dimensional valence models) (Georgiou et al., 2019).
- Semantic Desktop MB scores decay as expected, triggering graduated forgetting actions; local/global/group MB layers respect stability under context switches (Jilek et al., 2018).
- Long-context LM FM curves reveal plateau, steep decay, and amnesia phases, with fine memory length under 1–4K tokens for transformers and RNN/SSM architectures dropping rapidly below transformer baselines (Liu et al., 7 Oct 2024).
6. Research Impact and Future Directions
Forgetting measures enable controlled, systematic evaluation of memory loss, offering actionable diagnostics and tuning criteria:
- In language modeling, FM highlights not only what is forgotten but which transfer (new gains) occurs; straightforward reporting of FM and BT avoids conflation typical of accuracy averages (Harmon et al., 20 Oct 2025).
- FM frameworks are vital for privacy compliance, allowing empirical confirmation of data removal and attack resilience (Jagielski et al., 2022).
- Knowledge representation leverages FM for theory abstraction, rule-base pruning, and constraint manipulation, providing a unified loss-calculation interface (Doherty et al., 3 Apr 2024).
- Federated and continual learning benefit from real-time FM diagnosis and targeted rehearsal or distillation protocols (Aljahdali et al., 8 Feb 2024, Sanati et al., 6 Nov 2025).
- Future work aims to integrate forgetting penalties within training objectives, enforce retention bursts via synthetic rehearsal, leverage external retrieval systems to offset in-weight knowledge loss, and extend forgetting measures to more complex, relational fact and semantic memory domains (Harmon et al., 20 Oct 2025, Liao et al., 22 Oct 2024).
7. Comparative Summary Table
| Domain/Metric | FM Definition/Formula | Key Use |
|---|---|---|
| LM Post-Training (Harmon et al., 20 Oct 2025) | Knowledge loss/BT diagnostics | |
| Federated Learning (Aljahdali et al., 8 Feb 2024) | Roundwise loss tracking | |
| Privacy (Jagielski et al., 2022) | Data removal, attack decay | |
| Knowledge Represent. (Doherty et al., 3 Apr 2024) | (model count/probability loss) | Theory inferential loss |
| Human memory (1402.40581907.08946) | , FM=, FM | Recognition, stabilization |
| Semantic Desktop (Jilek et al., 2018) | , (exponential+activation bump) | Info relevance, auto-forgetting |
| Long-context LM (Liu et al., 7 Oct 2024) | , | Memory length quantification |
Forgetting measures—by focusing on granular, interpretive, domain-adjusted quantification—have become central instruments in the analysis and control of information retention in both artificial and biological learning systems.