FadeMem: Adaptive Memory for LLM Agents

Updated 2 February 2026

FadeMem is a biologically-inspired memory system that integrates a dual-layer hierarchy and adaptive decay to enhance memory retention in LLM agents.
It utilizes semantic relevance, recency, and frequency modulation to balance long-term retention with context-driven forgetting.
LLM-guided fusion and conflict resolution efficiently consolidate memories while reducing storage overhead and preserving critical context.

FadeMem denotes a biologically-inspired memory architecture for LLM agents that addresses long-term retention, adaptive forgetting, and conflict resolution in agent memory systems. The architecture integrates differential decay, semantic relevance, recency, and frequency modulation within a dual-layer memory hierarchy, supported by LLM-guided memory fusion and contradiction handling. Unlike fixed retention or binary memory protocols, FadeMem operationalizes nuanced, context-driven forgetting, mirroring aspects of human cognitive memory mechanisms (Wei et al., 26 Jan 2026).

1. Dual-Layer Memory Hierarchy and Adaptive Exponential Decay

FadeMem uses a tiered structure consisting of short-term memory (SML) and long-term memory (LML). Each memory slot $m_n$ at time $t$ is a tuple

$m_n(t) = (c_n, s_n, v_n(t), \tau_n, f_n)$

where $c_n \in \mathbb{R}^d$ is the embedding, $s_n$ is the raw text, $v_n(t) \in [0,1]$ is the memory strength, $\tau_n$ the creation timestamp, and $f_n$ the raw access count.

Importance for each memory is continuously evaluated:

$I_n(t) = \alpha\,\mathrm{rel}(c_n, Q_t) + \beta\,\frac{\bar{f}_n(t)}{1 + \bar{f}_n(t)} + \gamma\,\exp[-\delta(t - \tau_n)]$

where $Q_t$ is the current context embedding, $\mathrm{rel}(\cdot)$ is cosine similarity (or attention), and $\bar{f}_n(t)$ is the exponentially decayed access frequency. The thresholds $\theta_\mathrm{promote}$ and $\theta_\mathrm{demote}$ , with hysteresis, determine migration of memories between SML and LML:

$\mathrm{Layer}(m_n) = \begin{cases} \mathrm{LML} & \text{if } I_n(t)\ge \theta_\mathrm{promote} \ \mathrm{SML} & \text{if } I_n(t)< \theta_\mathrm{demote} \end{cases}$

Forgetting is governed by a parameterized exponential decay:

$v_n(t) = v_n(0)\,\exp\left[-\lambda_n(t - \tau_n)^{\beta_n}\right]$

with $\lambda_n = \lambda_{\text{base}}\,\exp(-\mu I_n(t))$ and

$\beta_n = \begin{cases} 0.8, & m_n \in \mathrm{LML} \quad (\text{sub-linear decay}) \ 1.2, & m_n \in \mathrm{SML} \quad (\text{super-linear decay}) \end{cases}$

Strengthening on recall, $v_n(t^+)$ , incorporates diminishing returns from repeated accesses. Pruning occurs if $v_n(t)<\epsilon_{\text{prune}}$ or upon prolonged dormancy. The architecture yields half-lives of approximately $11.25$ days (LML) and $5.02$ days (SML) at $I_n=0$ .

2. Modulation by Semantic Relevance, Frequency, and Temporal Patterns

Semantic relevance is quantitatively assessed via $\mathrm{rel}(c_n, Q_t)$ , using cosine similarity or LLM attention scores. Access frequency replaces raw counts with the decayed frequency $\bar{f}_n(t) = \sum_j \exp[-\kappa(t-t_j)]$ , emphasizing recency and dampening extreme values via $\bar{f}/(1+\bar{f})$ . Recency is incorporated as an exponential term in $I_n(t)$ . The importance-weighted decay rate $\lambda_n$ slows forgetting for crucial memories, with $\mu$ controlling the sensitivity of this adaptation.

3. LLM-Guided Conflict Resolution and Adaptive Memory Fusion

Incoming memories $m_{\text{new}}$ trigger content-based retrieval of potentially conflicting or related slots, $\mathcal{S}$ , defined by similiarity above $\theta_\mathrm{sim}$ . For each pair $(s_\text{new}, s_i)$ , a compact LLM (GPT-4o-mini) classifies the relationship as compatible, contradictory, subsumes, or subsumed.

Compatible: Redundant importance is penalized by $I_i \leftarrow I_i (1 - \omega\,\mathrm{sim}(c_{\text{new}}, c_i))$ .
Contradictory: Older memories are suppressed based on time difference: $v_i(t) \leftarrow v_i(t)\exp[-\rho \mathrm{clip}((\tau_{\text{new}}-\tau_i)/W_{\mathrm{age}},0,1)]$ .
Subsumes/Subsumed: LLM fuses or consolidates the memory contents.

Clusters exceeding a similarity and temporal window are merged by an LLM, with the fused memory strength $v_{\text{fused}}(0) = \max_{i \in \mathcal{C}} v_i(t) + \varepsilon \operatorname{Var}(\{v_i\})$ and an adjusted decay constant:

$\lambda_{\text{fused}} = \frac{\lambda_\text{base}}{1+\log |\mathcal{C}_k|}$

A subsequent LLM validation ensures no loss of distinct high-information facts above $\theta_\text{preserve}$ .

4. Empirical Evaluation and Benchmarks

FadeMem was evaluated across Multi-Session Chat (MSC), LoCoMo, and LTI-Bench. Baselines include fixed-window (4K–16K tokens), retrieval-augmented generation (RAG), Mem0 (unified memory), and MemGPT (hierarchical).

System	Retention (Critical/Context)	Storage (%)	RP@10/TCS (MSC)	F1 (LoCoMo)	FCR (LoCoMo)	SRR
FadeMem	82.1 % / 71.0 %	55.0 %	77.2 % / 0.82	29.43	85.9 %	0.45
Mem0	78.4 % / 69.1 %	100 %	74.8 % / 0.79	28.37	83.6 %	0
MemGPT	75.6 % / 62.8 %	85.3 %	-	-	-	-
Fixed 16K window	~50.2 % / 44.8 %	100 %	-	-	-	-

Conflict resolution (LTI-Bench, 4,075 injected conflicts):

FadeMem macro-averaged accuracy: $68.9\%$ , consistency: $80.4\%$ .
Mem0: $64.2\%$ (accuracy), $75.8\%$ (consistency).
MemGPT: $62.4\%$ (accuracy), $74.6\%$ (consistency).

FadeMem reduces memory usage by approximately $45\%$ with retrieval and reasoning metrics that meet or surpass contemporary baselines.

5. Ablation Studies and Limitations

Ablation on LoCoMo (multi-hop F1):

Full model: $29.43$
Without dual-layer: $19.45$ ( $-33.9\%$ )
Without fusion: $13.63$ ( $-53.7\%$ )
Without conflict resolution: $22.88$ ( $-22.4\%$ )

Omission of key modules most strongly impacts temporal reasoning and open-domain scenarios. Limitations include:

Dependence on LLM inference quality and resulting latency for fusion and conflict adjudication.
Fixed schedules for important decay parameters ( $\mu$ , $\lambda_\text{base}$ ); meta-learning for these is suggested as future work.
Instances of over-compression in memory fusion can occur, potentially eliminating fine-grained causal relationships when $\theta_\text{preserve}$ is not optimally set.

6. Context Within Memory-Efficient AI and Future Perspectives

FadeMem operates in the domain of efficient agent memory management, complementing and improving upon fixed-window and hierarchical memory protocols by embedding cognitive principles of selective forgetting (Wei et al., 26 Jan 2026). Its adoption of dual-layer hierarchy, adaptive decay, and LLM-mediated consolidation reflects emerging trends toward bio-inspired, information-centric memory systems for foundation model agents. Prospective work includes dynamic meta-learning of decay schedules and more granular memory clustering for enhanced preservation of complex, temporally distributed knowledge.

7. Comparison: "FadeMem" in Edge Training Contexts

It is notable that the term "FadeMem" is also referenced in the context of federated adversarial decoupled learning (FADE) for edge device training (Tang et al., 2022). In this paradigm, "FadeMem" characterizes aggressive memory savings achieved by modularizing model training; clients only load one small model module at a time rather than the entire network, yielding $40$– $75\%$ RAM reduction and enabling adversarial training where it was previously infeasible due to memory constraints. This separate usage highlights a distinct application domain (model training on heterogeneous hardware), but converges with agent memory FadeMem on the theme of resource-efficient retention and selective processing.

FadeMem, across both domains, is defined by differentiated, importance-aware management of memory—whether in model weights for federated training or in symbolic/content memory for LLM agents—using principled architectural and algorithmic mechanisms to optimize capability under strict resource bounds.

Markdown Report Issue Upgrade to Chat

References (2)

FadeMem: Biologically-Inspired Forgetting for Efficient Agent Memory (2026)

FADE: Enabling Federated Adversarial Training on Heterogeneous Resource-Constrained Edge Devices (2022)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to FadeMem.

FadeMem: Adaptive Memory for LLM Agents

1. Dual-Layer Memory Hierarchy and Adaptive Exponential Decay

2. Modulation by Semantic Relevance, Frequency, and Temporal Patterns

3. LLM-Guided Conflict Resolution and Adaptive Memory Fusion

4. Empirical Evaluation and Benchmarks

5. Ablation Studies and Limitations

6. Context Within Memory-Efficient AI and Future Perspectives

7. Comparison: "FadeMem" in Edge Training Contexts

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

FadeMem: Adaptive Memory for LLM Agents

1. Dual-Layer Memory Hierarchy and Adaptive Exponential Decay

2. Modulation by Semantic Relevance, Frequency, and Temporal Patterns

3. LLM-Guided Conflict Resolution and Adaptive Memory Fusion

4. Empirical Evaluation and Benchmarks

5. Ablation Studies and Limitations

6. Context Within Memory-Efficient AI and Future Perspectives

7. Comparison: "FadeMem" in Edge Training Contexts

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research