Papers
Topics
Authors
Recent
Search
2000 character limit reached

FadeMem: Adaptive Memory for LLM Agents

Updated 2 February 2026
  • FadeMem is a biologically-inspired memory system that integrates a dual-layer hierarchy and adaptive decay to enhance memory retention in LLM agents.
  • It utilizes semantic relevance, recency, and frequency modulation to balance long-term retention with context-driven forgetting.
  • LLM-guided fusion and conflict resolution efficiently consolidate memories while reducing storage overhead and preserving critical context.

FadeMem denotes a biologically-inspired memory architecture for LLM agents that addresses long-term retention, adaptive forgetting, and conflict resolution in agent memory systems. The architecture integrates differential decay, semantic relevance, recency, and frequency modulation within a dual-layer memory hierarchy, supported by LLM-guided memory fusion and contradiction handling. Unlike fixed retention or binary memory protocols, FadeMem operationalizes nuanced, context-driven forgetting, mirroring aspects of human cognitive memory mechanisms (Wei et al., 26 Jan 2026).

1. Dual-Layer Memory Hierarchy and Adaptive Exponential Decay

FadeMem uses a tiered structure consisting of short-term memory (SML) and long-term memory (LML). Each memory slot mnm_n at time tt is a tuple

mn(t)=(cn,sn,vn(t),τn,fn)m_n(t) = (c_n, s_n, v_n(t), \tau_n, f_n)

where cnRdc_n \in \mathbb{R}^d is the embedding, sns_n is the raw text, vn(t)[0,1]v_n(t) \in [0,1] is the memory strength, τn\tau_n the creation timestamp, and fnf_n the raw access count.

Importance for each memory is continuously evaluated:

In(t)=αrel(cn,Qt)+βfˉn(t)1+fˉn(t)+γexp[δ(tτn)]I_n(t) = \alpha\,\mathrm{rel}(c_n, Q_t) + \beta\,\frac{\bar{f}_n(t)}{1 + \bar{f}_n(t)} + \gamma\,\exp[-\delta(t - \tau_n)]

where QtQ_t is the current context embedding, rel()\mathrm{rel}(\cdot) is cosine similarity (or attention), and fˉn(t)\bar{f}_n(t) is the exponentially decayed access frequency. The thresholds θpromote\theta_\mathrm{promote} and θdemote\theta_\mathrm{demote}, with hysteresis, determine migration of memories between SML and LML:

Layer(mn)={LMLif In(t)θpromote SMLif In(t)<θdemote\mathrm{Layer}(m_n) = \begin{cases} \mathrm{LML} & \text{if } I_n(t)\ge \theta_\mathrm{promote} \ \mathrm{SML} & \text{if } I_n(t)< \theta_\mathrm{demote} \end{cases}

Forgetting is governed by a parameterized exponential decay:

vn(t)=vn(0)exp[λn(tτn)βn]v_n(t) = v_n(0)\,\exp\left[-\lambda_n(t - \tau_n)^{\beta_n}\right]

with λn=λbaseexp(μIn(t))\lambda_n = \lambda_{\text{base}}\,\exp(-\mu I_n(t)) and

βn={0.8,mnLML(sub-linear decay) 1.2,mnSML(super-linear decay)\beta_n = \begin{cases} 0.8, & m_n \in \mathrm{LML} \quad (\text{sub-linear decay}) \ 1.2, & m_n \in \mathrm{SML} \quad (\text{super-linear decay}) \end{cases}

Strengthening on recall, vn(t+)v_n(t^+), incorporates diminishing returns from repeated accesses. Pruning occurs if vn(t)<ϵprunev_n(t)<\epsilon_{\text{prune}} or upon prolonged dormancy. The architecture yields half-lives of approximately $11.25$ days (LML) and $5.02$ days (SML) at In=0I_n=0.

2. Modulation by Semantic Relevance, Frequency, and Temporal Patterns

Semantic relevance is quantitatively assessed via rel(cn,Qt)\mathrm{rel}(c_n, Q_t), using cosine similarity or LLM attention scores. Access frequency replaces raw counts with the decayed frequency fˉn(t)=jexp[κ(ttj)]\bar{f}_n(t) = \sum_j \exp[-\kappa(t-t_j)], emphasizing recency and dampening extreme values via fˉ/(1+fˉ)\bar{f}/(1+\bar{f}). Recency is incorporated as an exponential term in In(t)I_n(t). The importance-weighted decay rate λn\lambda_n slows forgetting for crucial memories, with μ\mu controlling the sensitivity of this adaptation.

3. LLM-Guided Conflict Resolution and Adaptive Memory Fusion

Incoming memories mnewm_{\text{new}} trigger content-based retrieval of potentially conflicting or related slots, S\mathcal{S}, defined by similiarity above θsim\theta_\mathrm{sim}. For each pair (snew,si)(s_\text{new}, s_i), a compact LLM (GPT-4o-mini) classifies the relationship as compatible, contradictory, subsumes, or subsumed.

  • Compatible: Redundant importance is penalized by IiIi(1ωsim(cnew,ci))I_i \leftarrow I_i (1 - \omega\,\mathrm{sim}(c_{\text{new}}, c_i)).
  • Contradictory: Older memories are suppressed based on time difference: vi(t)vi(t)exp[ρclip((τnewτi)/Wage,0,1)]v_i(t) \leftarrow v_i(t)\exp[-\rho \mathrm{clip}((\tau_{\text{new}}-\tau_i)/W_{\mathrm{age}},0,1)].
  • Subsumes/Subsumed: LLM fuses or consolidates the memory contents.

Clusters exceeding a similarity and temporal window are merged by an LLM, with the fused memory strength vfused(0)=maxiCvi(t)+εVar({vi})v_{\text{fused}}(0) = \max_{i \in \mathcal{C}} v_i(t) + \varepsilon \operatorname{Var}(\{v_i\}) and an adjusted decay constant:

λfused=λbase1+logCk\lambda_{\text{fused}} = \frac{\lambda_\text{base}}{1+\log |\mathcal{C}_k|}

A subsequent LLM validation ensures no loss of distinct high-information facts above θpreserve\theta_\text{preserve}.

4. Empirical Evaluation and Benchmarks

FadeMem was evaluated across Multi-Session Chat (MSC), LoCoMo, and LTI-Bench. Baselines include fixed-window (4K–16K tokens), retrieval-augmented generation (RAG), Mem0 (unified memory), and MemGPT (hierarchical).

System Retention (Critical/Context) Storage (%) RP@10/TCS (MSC) F1 (LoCoMo) FCR (LoCoMo) SRR
FadeMem 82.1 % / 71.0 % 55.0 % 77.2 % / 0.82 29.43 85.9 % 0.45
Mem0 78.4 % / 69.1 % 100 % 74.8 % / 0.79 28.37 83.6 % 0
MemGPT 75.6 % / 62.8 % 85.3 % - - - -
Fixed 16K window ~50.2 % / 44.8 % 100 % - - - -

Conflict resolution (LTI-Bench, 4,075 injected conflicts):

  • FadeMem macro-averaged accuracy: 68.9%68.9\%, consistency: 80.4%80.4\%.
  • Mem0: 64.2%64.2\% (accuracy), 75.8%75.8\% (consistency).
  • MemGPT: 62.4%62.4\% (accuracy), 74.6%74.6\% (consistency).

FadeMem reduces memory usage by approximately 45%45\% with retrieval and reasoning metrics that meet or surpass contemporary baselines.

5. Ablation Studies and Limitations

Ablation on LoCoMo (multi-hop F1):

  • Full model: $29.43$
  • Without dual-layer: $19.45$ (33.9%-33.9\%)
  • Without fusion: $13.63$ (53.7%-53.7\%)
  • Without conflict resolution: $22.88$ (22.4%-22.4\%)

Omission of key modules most strongly impacts temporal reasoning and open-domain scenarios. Limitations include:

  • Dependence on LLM inference quality and resulting latency for fusion and conflict adjudication.
  • Fixed schedules for important decay parameters (μ\mu, λbase\lambda_\text{base}); meta-learning for these is suggested as future work.
  • Instances of over-compression in memory fusion can occur, potentially eliminating fine-grained causal relationships when θpreserve\theta_\text{preserve} is not optimally set.

6. Context Within Memory-Efficient AI and Future Perspectives

FadeMem operates in the domain of efficient agent memory management, complementing and improving upon fixed-window and hierarchical memory protocols by embedding cognitive principles of selective forgetting (Wei et al., 26 Jan 2026). Its adoption of dual-layer hierarchy, adaptive decay, and LLM-mediated consolidation reflects emerging trends toward bio-inspired, information-centric memory systems for foundation model agents. Prospective work includes dynamic meta-learning of decay schedules and more granular memory clustering for enhanced preservation of complex, temporally distributed knowledge.

7. Comparison: "FadeMem" in Edge Training Contexts

It is notable that the term "FadeMem" is also referenced in the context of federated adversarial decoupled learning (FADE) for edge device training (Tang et al., 2022). In this paradigm, "FadeMem" characterizes aggressive memory savings achieved by modularizing model training; clients only load one small model module at a time rather than the entire network, yielding $40$–75%75\% RAM reduction and enabling adversarial training where it was previously infeasible due to memory constraints. This separate usage highlights a distinct application domain (model training on heterogeneous hardware), but converges with agent memory FadeMem on the theme of resource-efficient retention and selective processing.

FadeMem, across both domains, is defined by differentiated, importance-aware management of memory—whether in model weights for federated training or in symbolic/content memory for LLM agents—using principled architectural and algorithmic mechanisms to optimize capability under strict resource bounds.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to FadeMem.