Papers
Topics
Authors
Recent
2000 character limit reached

Forgetting Measure (FM) Overview

Updated 19 December 2025
  • Forgetting Measure (FM) is a diagnostic metric that quantifies information loss over time or updates across domains like language models, federated learning, and human memory.
  • It employs diverse mathematical formulations—ranging from sample-wise and class-wise to probabilistic assessments—to capture transitions in accuracy and inferential strength.
  • FM guides practical strategies for retention, privacy, and theory pruning, thereby influencing adaptive rehearsal protocols and model adjustment in both AI and cognitive science.

A forgetting measure (FM) quantifies the degree to which an agent, model, or information system loses previously acquired information as a result of learning, updating, or passage of time. It formalizes the notion of information loss, either at the granular (sample-wise, token-wise, class-wise) or structural (theoretical, inferential, model-theoretic) level. Across domains—LLMs, federated learning, knowledge representation, human memory—forgetting measures serve both as diagnostic metrics and as benchmarks guiding the development of retention and rehearsal strategies.

1. Mathematical Formulations of Forgetting Measures

Forgetting measures span a wide array of formal definitions, each grounded in the specifics of the domain and the questions of interest.

Sample-wise Forgetting in LLM Post-Training

The framework of "Mapping Post-Training Forgetting in LLMs at Scale" defines FM as the fraction of items for which a model transitions from correct (pre-training) to incorrect (post-training):

F=1Ni=1N1[aipre=1aipost=0]F = \frac{1}{N} \sum_{i=1}^N \mathbf{1}\left[a_i^{pre} = 1 \wedge a_i^{post} = 0\right]

A companion metric, backward transfer (BT), tracks 0→1 transitions. Chance-adjusted variants correct for random guessing by subtracting analytically derived baselines (Harmon et al., 20 Oct 2025):

Ftrue=max(FFchance,0)F_{true} = \max(F - F_{chance}, 0)

BTtrue=max(BTBTchance,0)BT_{true} = \max(BT - BT_{chance}, 0)

Federated Learning: Round-wise Class-Granular FM

Flashback introduces a per-round loss-based FM capturing negative changes in per-class accuracy:

Ft=1Cc=1Cmin(0,AtcAt1c)\mathcal{F}_t = -\frac{1}{C}\sum_{c=1}^C \min(0,\, A^c_t - A^c_{t-1})

This strictly aggregates losses, ignoring gains, ensuring that net knowledge loss is visible even when overall accuracy improves in some classes (Aljahdali et al., 8 Feb 2024).

Privacy-oriented Example-wise FM in Supervised ML

In the context of privacy attacks, FM is mechanism-agnostic and attack-calibrated:

FMz(k):=SA(θt0+k+,  θt0+k;  z)FM_z(k) := S_{\mathcal{A}}(\theta^{+}_{t_0 + k},\; \theta^{-}_{t_0 + k};\; z)

Here SAS_{\mathcal{A}} is the adversarial success rate distinguishing model inclusion of zz after kk further training steps; zz is (A, α, k)-forgotten if FMz(k)αFM_z(k) \leq \alpha (Jagielski et al., 2022).

Information-structural FM via Model Counting and Probability

In probabilistic logic and knowledge representation, Doherty and Szałas define three interlocked FM loss functions quantifying inferential strength drop following variable forgetting:

lossmNC(Σ,pˉ)=#[FNC(Σ;pˉ)]#[Σ] lossmSC(Σ,pˉ)=#[Σ]#[FSC(Σ;pˉ)] lossmT(Σ,pˉ)=#[FNC(Σ;pˉ)]#[FSC(Σ;pˉ)]\begin{aligned} loss_{m}^{NC}(Σ,\bar p) &= \#\left[F^{NC}(Σ;\bar p)\right] - \#\left[Σ\right] \ loss_{m}^{SC}(Σ,\bar p) &= \#\left[Σ\right] - \#\left[F^{SC}(Σ;\bar p)\right] \ loss_{m}^{T}(Σ,\bar p) &= \#\left[F^{NC}(Σ;\bar p)\right] - \#\left[F^{SC}(Σ;\bar p)\right] \end{aligned}

Their probabilistic analogues replace counts with measures μ\mu (Doherty et al., 3 Apr 2024).

Human Memory: Fractional Dynamics and Power-law FM

In chunk-memory models, forgetting is parameterized by the tail exponent dd in

F(t)tdF(t) \sim t^{-d}

The FM here is precisely dd, also expressible as FM=1(order of Caputo derivative)FM = 1 - (\text{order of Caputo derivative}) (Lubashevsky et al., 2014).

Retroactive Interference Models

Here, FM(t)=1Rn(t)FM(t) = 1 - R_n(t), where Rn(t)R_n(t) is the retention function computed analytically, and the exponent kk of the power law is fit from empirical memory data (Georgiou et al., 2019).

Long-context LM Memorization: Forgetting Curve Gap

The FM for memorization length is operationalized as

F(k)=Rcopy(k)RLM(k)F(k) = R_{copy}(k) - R_{LM}(k)

where RcopyR_{copy} gives token-wise copy accuracy and RLMR_{LM} the baseline LM accuracy under unrelated contexts. LfineL_{fine} and LcoarseL_{coarse} are derived memory lengths (Liu et al., 7 Oct 2024).

2. Domains of Application

Forgetting measures have been developed and deployed in several key areas:

3. Theoretical Insights and Foundations

Forgetting is not a defect but an adaptive, information-theoretic process. The measure Γk(t)\Gamma_k(t) in "Forgetting is Everywhere" quantifies divergence between baseline induced futures and simulated post-update mixtures:

Γk(t)=D(q(Ht+k:Zt1,H0:t1)qk(Ht+k:Zt1,H0:t1))\Gamma_k(t) = D\left( q(H^{t+k:\infty} \mid Z_{t-1}, H_{0:t-1}) \, \| \, q_k^*(H^{t+k:\infty} \mid Z_{t-1}, H_{0:t-1}) \right)

Here, full self-consistency (Γk=0\Gamma_k=0) is achieved only by ideal Bayesian learners; practical algorithms generally violate it. Moderate forgetting is not universally harmful—it often correlates with accelerated adaptation in nonstationary or class-incremental settings (Sanati et al., 6 Nov 2025).

FM in memory models exposes the independence between learning and forgetting exponents, supporting dissociation and flexible tuning of retention versus learning speed (Lubashevsky et al., 2014). Retroactive interference models explain age-dependent stabilization quantitatively (Georgiou et al., 2019).

Model-counting FM establishes that inferential loss is additive for disjoint theories and monotonic in the forgotten variable set, giving formal guarantees for reasoning system modularity and robustness (Doherty et al., 3 Apr 2024).

4. Practical Computation and Benchmarking

Forgetting measures typically require fine-grained bookkeeping:

  • Sample-wise FM: comparison of correctness states pre- and post-update; counting transitions.
  • Class-wise FM: computation of negative deltas per class accuracy; aggregation over rounds or stages.
  • Privacy FM: parallel training runs with and without probe examples (canaries); measurement of attack success rates such as membership inference or canary exposure.
  • Information-structural FM: model counting or probability assignment via logic-program translation (ProbLog), quantifier elimination, and explicit counting/valuation queries (Doherty et al., 3 Apr 2024).
  • Power-law FM: fitting exponents to the empirical decay or retention curve, via log-log regression or closed-form maximum-likelihood estimation (Lubashevsky et al., 2014, Georgiou et al., 2019).
  • Semantic Desktop MB: event-triggered update loop with exponentially decaying "buoyancy" plus contextual fusion (Jilek et al., 2018).
  • Long-context LM FM: iterative measurement of copy vs. LM accuracy at increasing context lengths, automated sample extraction (Liu et al., 7 Oct 2024).

5. Empirical Observations Across Learning Settings

Multiple studies confirm key behaviors:

  • Low-to-moderate forgetting and backward transfer are typical in large-scale LLM post-training; larger model scales consistently mitigate these effects (Harmon et al., 20 Oct 2025).
  • Roundwise FM in federated learning reveals that knowledge acquired by clients is readily lost during aggregation, especially under data heterogeneity; techniques such as dynamic distillation (Flashback) markedly reduce FM and accelerate convergence (Aljahdali et al., 8 Feb 2024).
  • Privacy FM shows that deterministic training leaves all injected probes vulnerable indefinitely, while stochastic SGD enables gradual forgetting; examples injected early in training are forgotten faster (Jagielski et al., 2022).
  • Model-counting FM supports rapid, automated comparison of logic-reduction policies with formal additivity and monotonicity (Doherty et al., 3 Apr 2024).
  • Power-law FM exponents in human and animal memory settle robustly around that observed in recognition data (approx. 0.8 for n=5-dimensional valence models) (Georgiou et al., 2019).
  • Semantic Desktop MB scores decay as expected, triggering graduated forgetting actions; local/global/group MB layers respect stability under context switches (Jilek et al., 2018).
  • Long-context LM FM curves reveal plateau, steep decay, and amnesia phases, with fine memory length under 1–4K tokens for transformers and RNN/SSM architectures dropping rapidly below transformer baselines (Liu et al., 7 Oct 2024).

6. Research Impact and Future Directions

Forgetting measures enable controlled, systematic evaluation of memory loss, offering actionable diagnostics and tuning criteria:

  • In language modeling, FM highlights not only what is forgotten but which transfer (new gains) occurs; straightforward reporting of FM and BT avoids conflation typical of accuracy averages (Harmon et al., 20 Oct 2025).
  • FM frameworks are vital for privacy compliance, allowing empirical confirmation of data removal and attack resilience (Jagielski et al., 2022).
  • Knowledge representation leverages FM for theory abstraction, rule-base pruning, and constraint manipulation, providing a unified loss-calculation interface (Doherty et al., 3 Apr 2024).
  • Federated and continual learning benefit from real-time FM diagnosis and targeted rehearsal or distillation protocols (Aljahdali et al., 8 Feb 2024, Sanati et al., 6 Nov 2025).
  • Future work aims to integrate forgetting penalties within training objectives, enforce retention bursts via synthetic rehearsal, leverage external retrieval systems to offset in-weight knowledge loss, and extend forgetting measures to more complex, relational fact and semantic memory domains (Harmon et al., 20 Oct 2025, Liao et al., 22 Oct 2024).

7. Comparative Summary Table

Domain/Metric FM Definition/Formula Key Use
LM Post-Training (Harmon et al., 20 Oct 2025) F=frac(10 transitions),Ftrue=FFchanceF = \text{frac}(1\to 0 \text{ transitions}),\,F_{true}=F-F_{chance} Knowledge loss/BT diagnostics
Federated Learning (Aljahdali et al., 8 Feb 2024) Ft=1Ccmin(0,AtcAt1c)\mathcal{F}_t = -\frac{1}{C}\sum_c \min(0,\, A^c_t-A^c_{t-1}) Roundwise loss tracking
Privacy (Jagielski et al., 2022) FMz(k)=SA(θ+,θ;z)FM_z(k)=S_{\mathcal{A}}(\theta^+,\theta^-; z) Data removal, attack decay
Knowledge Represent. (Doherty et al., 3 Apr 2024) lossmNC/SC/T;lossploss_m^{NC/SC/T};\,loss_p (model count/probability loss) Theory inferential loss
Human memory (1402.40581907.08946) F(t)tdF(t) \sim t^{-d}, FM=dd, FM(t)=1Rn(t)(t)=1-R_n(t) Recognition, stabilization
Semantic Desktop (Jilek et al., 2018) MBdecay(t)MB^{decay}(t), MBr(t)MB_r(t) (exponential+activation bump) Info relevance, auto-forgetting
Long-context LM (Liu et al., 7 Oct 2024) F(k)=Rcopy(k)RLM(k)F(k) = R_{copy}(k)-R_{LM}(k), Lfine,LcoarseL_{fine},L_{coarse} Memory length quantification

Forgetting measures—by focusing on granular, interpretive, domain-adjusted quantification—have become central instruments in the analysis and control of information retention in both artificial and biological learning systems.

Whiteboard

Follow Topic

Get notified by email when new papers are published related to Forgetting Measure (FM).