Relevance-Optimized Forgetting

Updated 2 May 2026

Relevance-Optimized Forgetting is a strategy that prioritizes memory retention using dynamic signals like access patterns, semantic alignment, and interpretability scores.
It enhances continual learning by maintaining core knowledge while selectively suppressing obsolete or irrelevant information to balance stability and plasticity.
Implementations in LLMs and neural networks include circuit-level edits and parameter masking, achieving precise fact updating and reduced catastrophic forgetting.

A relevance-optimized forgetting strategy is a class of methods that explicitly prioritize, modulate, or schedule the decay, suppression, or removal of information within a memory or parameter system based on an item- or component-specific estimate of its current or anticipated future relevance. Unlike uniform or time-based forgetting, these methods leverage mechanistic or statistical signals drawn from interpretability, access patterns, semantic alignment, or user feedback to selectively retain core information while degrading or suppressing content deemed obsolete or of marginal value. Relevance-optimized forgetting is foundational to mitigating catastrophic forgetting in continual learning, balancing stability and plasticity in dialog/agent memory, scaling knowledge editing in LLMs, and managing retrieval noise or memory costs in information retrieval systems.

1. Theoretical Foundations and Motivation

The conceptual motivation for relevance-optimized forgetting derives from human memory, where forgetting is neither uniform nor purely time-driven but reflects use-dependent decay, interference management, and retrievability conditioned on context and relevance (Yu et al., 2018). Efficient memory systems must balance:

Retention of high-value items to support stable inference or recall;
Suppression or deletion of stale, unused, or disruptive information to minimize interference and free capacity;
Plasticity so that updates to core knowledge or adaptation to new environments are possible without destabilizing prior structure.

This perspective is formalized in computational models such as Memory Buoyancy, where each item's value is a dynamically updated function of stimuli, network effects, and decay (Jilek et al., 2018), and in power-law or exponential forgetting models where optimized schedules of rehearsal, access, or suppression yield minimal average retrieval cost and match classic forgetting curves (Yu et al., 2018, Kang et al., 24 Mar 2025).

2. Mechanistic Implementations in LLMs and Neural Networks

A core family of strategies applies relevance-optimized forgetting to editing, continual learning, or memory management in large neural architectures:

A. Modular Circuit Suppression in LLMs

The "surgical" unlearn-then-learn pipeline targets factual updates or deletions in LLMs by first localizing the minimal set of internal parameters or modules (e.g., attention heads, MLPs) causally implicated in the production of a targeted fact (Ngugi, 9 Aug 2025):

Circuit Localization: Salience scores per module combine activation-magnitude, causal logit-drop, and gradient-norm sensitivity, normalized and aggregated to rank modules (salience(m) = α·Score₁(m) + β·Score₂(m) + γ·Score₃(m)).
Unlearning: For selected modules, Infused Adapter ( $IA^3$ ) scaling vectors are trained to suppress activation relevant to the target, achieving 96% forget rate of the old fact.
Relearning: With the previous fact suppressed, a new fact is injected via re-training adapters in the same loci, achieving 98.5% accuracy for the new knowledge, with collateral catastrophic forgetting on control facts drastically reduced (72% F_control vs. 20–40% for baselines).
Soft Forgetting: Suppressed knowledge is not destructively erased but becomes latent, recoverable only by specialized probing.

This method demonstrates that precise, relevance-driven intervention at the circuit level enables high editability and preservation of unrelated model competence.

B. Relevance Masks and Parameter Partitioning

Relevance Mapping Networks (RMNs) introduce binary relevance-masks over network weights, assigning each task a subset of the parameter space commensurate with its relevance profile (Kaushik et al., 2021). Task-specific masks are learned to minimize overlap unless overlap is beneficial, enabling:

Minimal catastrophic forgetting (overwriting of previous task structure),
Resistance to catastrophic remembering (misclassification of new conditions with old patterns),
Single unified networks that flexibly partition and repurpose internal representations.

The RMN approach outperforms replay-based methods on standard continual learning benchmarks by explicitly aligning parameter allocation with current task relevance.

3. Relevance Estimation and Adaptive Memory in Agents

Recent work in autonomous agents and lifelong learning robots operationalizes relevance-optimized forgetting as an online prioritization and budgeting problem.

A. Budgeted Forgetting with Multidimensional Relevance Scores

In memory-dense environments, where unbounded accumulation of observations or actions is impractical, relevance-optimized forgetting frameworks assign each memory item an importance function integrating recency, frequency, and semantic alignment:

$I(m_i, t) = \alpha\,R(m_i,t) + \beta\,F(m_i) + \gamma\,S(m_i, q_t)$

$R(m_i, t)$ : Exponential decay of recency
$F(m_i)$ : Normalized access frequency
$S(m_i, q_t)$ : Cosine similarity between item and current context/query

At each step, the memory is pruned to a fixed budget by greedily retaining the items with highest $I(m_i, t)$ , ensuring that reasoning performance and retrieval consistency are maintained even under tight storage constraints (Fofadiya et al., 2 Apr 2026, Wei et al., 26 Jan 2026).

B. Hierarchical, Feedback-Driven Relevance in Episodic Memory

The H²-EMV system for robot episodic memory organizes observations in a multi-level tree and sets individual lifetimes per node as a function of default decay and LLM-estimated relevance. When a node expires, an LLM predicts its current utility using a rule set updated based on user feedback. This enables dynamic, user-tailored memory retention with demonstrated improvements in question-answering accuracy and memory efficiency (Bärmann et al., 13 Apr 2026).

4. Relevance-Driven Scheduling in Learning and Replay

Forgetting curve-inspired methods align rehearsal or memory replay with predicted item-specific retention probability—the cornerstone of "spaced repetition."

The view-batch model in continual learning lengthens the recall interval for each sample, balancing longer intervals (slower decay) with self-supervised consolidation on each presentation to optimize memory decay rates (Kang et al., 24 Mar 2025).
Memory replay strategies for LLMs (e.g., FOREVER) align replay triggers to the model's internal parameter evolution, not raw iteration count, and schedule replay strength adaptively to the model's stability, using metrics derived from the magnitude of running updates (Feng et al., 7 Jan 2026).
Distilled Replay keeps a small, highly informative synthetic buffer selected to maximize coverage of the loss gradients of prior tasks, so as to best preserve relevant knowledge (Rosasco et al., 2021).

A table summarizing implementation flavors:

Method/Domain	Strategy	Relevance Signal
Unlearn-then-Learn (LLMs)	Circuit localization + localized adapter edit	Activation, logit-patching, gradients
RMN (CL networks)	Parameter subspace masking	Task-wise, overlap-penalized
Budgeted forgetting	Top-B heap of memory items	Decayed recency, frequency, alignment
H²-EMV (robot)	LLM prompt over memory tree + human rules	Contextual LLM estimate + feedback
View-batch CL	Spaced replay and consolidation	Power-law fit, per-sample recall curve
Distilled replay	Buffer of synthetic exemplars	Gradient-match to loss on old tasks

5. Safety, Efficiency, and Practical Impact

Relevance-optimized forgetting enables:

Precise factual editing and retraction: Suppression of specific knowledge bits with controlled scope, as opposed to global fine-tuning, with quantifiable risk of undesired collateral forgetting (Ngugi, 9 Aug 2025).
Bounded memory and retrieval cost: Enforcement of exact storage budgets with provable selection optimality and empirical evidence for reduced hallucination and clutter in multi-turn dialog agents (Fofadiya et al., 2 Apr 2026, Wei et al., 26 Jan 2026).
Personalization and auditability: Integration of user feedback or empirical access patterns as external relevance signals, facilitating user-specific adaptation and the auditing of latent knowledge via probing or intervention (Bärmann et al., 13 Apr 2026).
Generalization and robustness: Iterative "forget-and-relearn" and masking strategies drive models to concentrate parameters on robust, invariant, and task-relevant structure, yielding systematic improvements in cross-task retention and deployment stability (Zhou et al., 2022, Peng et al., 2021).

6. Limitations and Open Challenges

The construction of relevance signals remains labor- and compute-intensive in many domains, relying on interpretability tools (e.g., circuit tracing, attention attribution), trained LLMs, or finely tuned access statistics (Ngugi, 9 Aug 2025).
Scalability to large numbers of concurrent edits, memory items, or rapidly changing environments exposes limitations in buffer management, interference, and relevance estimation accuracy (Ngugi, 9 Aug 2025, Fofadiya et al., 2 Apr 2026).
Multi-agent and distributed contexts require consensus or negotiation over shared vs. private relevance, motivating investigation into federated or submodular extensions (Fofadiya et al., 2 Apr 2026).

7. Connections to Human Cognition and Future Directions

Relevance-optimized forgetting formalizes in silico many elements of cognitive memory: prioritization under interference, context-conditional retrieval, and adaptive, feedback-driven suppression. Ongoing research explores:

The use of exponential or power-law decay to model context weighting in LLMs, directly paralleling human forgetting curves (Tran et al., 28 Dec 2025, Feng et al., 7 Jan 2026).
Hierarchical and graph-based memory structures reflecting compositional and contextual aspects of human episodic memory (Bärmann et al., 13 Apr 2026, Jilek et al., 2018).
Reinforcement-, meta-, or self-supervised learning of relevance signals, enabling autonomous adjustment aligned with downstream utility.

In sum, relevance-optimized forgetting is now recognized as an indispensable component of scalable, robust, and adaptive neural information systems, with model-theoretic, algorithmic, and applied ramifications documented across continual learning, LLM editing, robotics, and interactive agents (Ngugi, 9 Aug 2025, Jilek et al., 2018, Kang et al., 24 Mar 2025, Kaushik et al., 2021, Tran et al., 28 Dec 2025, Bärmann et al., 13 Apr 2026, Wei et al., 26 Jan 2026, Feng et al., 7 Jan 2026, Peng et al., 2021, Zhou et al., 2022, Rosasco et al., 2021, Fofadiya et al., 2 Apr 2026).