FadeMem: Adaptive Memory for LLM Agents
- FadeMem is a biologically-inspired memory system that integrates a dual-layer hierarchy and adaptive decay to enhance memory retention in LLM agents.
- It utilizes semantic relevance, recency, and frequency modulation to balance long-term retention with context-driven forgetting.
- LLM-guided fusion and conflict resolution efficiently consolidate memories while reducing storage overhead and preserving critical context.
FadeMem denotes a biologically-inspired memory architecture for LLM agents that addresses long-term retention, adaptive forgetting, and conflict resolution in agent memory systems. The architecture integrates differential decay, semantic relevance, recency, and frequency modulation within a dual-layer memory hierarchy, supported by LLM-guided memory fusion and contradiction handling. Unlike fixed retention or binary memory protocols, FadeMem operationalizes nuanced, context-driven forgetting, mirroring aspects of human cognitive memory mechanisms (Wei et al., 26 Jan 2026).
1. Dual-Layer Memory Hierarchy and Adaptive Exponential Decay
FadeMem uses a tiered structure consisting of short-term memory (SML) and long-term memory (LML). Each memory slot at time is a tuple
where is the embedding, is the raw text, is the memory strength, the creation timestamp, and the raw access count.
Importance for each memory is continuously evaluated:
where is the current context embedding, is cosine similarity (or attention), and is the exponentially decayed access frequency. The thresholds and , with hysteresis, determine migration of memories between SML and LML:
Forgetting is governed by a parameterized exponential decay:
with and
Strengthening on recall, , incorporates diminishing returns from repeated accesses. Pruning occurs if or upon prolonged dormancy. The architecture yields half-lives of approximately $11.25$ days (LML) and $5.02$ days (SML) at .
2. Modulation by Semantic Relevance, Frequency, and Temporal Patterns
Semantic relevance is quantitatively assessed via , using cosine similarity or LLM attention scores. Access frequency replaces raw counts with the decayed frequency , emphasizing recency and dampening extreme values via . Recency is incorporated as an exponential term in . The importance-weighted decay rate slows forgetting for crucial memories, with controlling the sensitivity of this adaptation.
3. LLM-Guided Conflict Resolution and Adaptive Memory Fusion
Incoming memories trigger content-based retrieval of potentially conflicting or related slots, , defined by similiarity above . For each pair , a compact LLM (GPT-4o-mini) classifies the relationship as compatible, contradictory, subsumes, or subsumed.
- Compatible: Redundant importance is penalized by .
- Contradictory: Older memories are suppressed based on time difference: .
- Subsumes/Subsumed: LLM fuses or consolidates the memory contents.
Clusters exceeding a similarity and temporal window are merged by an LLM, with the fused memory strength and an adjusted decay constant:
A subsequent LLM validation ensures no loss of distinct high-information facts above .
4. Empirical Evaluation and Benchmarks
FadeMem was evaluated across Multi-Session Chat (MSC), LoCoMo, and LTI-Bench. Baselines include fixed-window (4K–16K tokens), retrieval-augmented generation (RAG), Mem0 (unified memory), and MemGPT (hierarchical).
| System | Retention (Critical/Context) | Storage (%) | RP@10/TCS (MSC) | F1 (LoCoMo) | FCR (LoCoMo) | SRR |
|---|---|---|---|---|---|---|
| FadeMem | 82.1 % / 71.0 % | 55.0 % | 77.2 % / 0.82 | 29.43 | 85.9 % | 0.45 |
| Mem0 | 78.4 % / 69.1 % | 100 % | 74.8 % / 0.79 | 28.37 | 83.6 % | 0 |
| MemGPT | 75.6 % / 62.8 % | 85.3 % | - | - | - | - |
| Fixed 16K window | ~50.2 % / 44.8 % | 100 % | - | - | - | - |
Conflict resolution (LTI-Bench, 4,075 injected conflicts):
- FadeMem macro-averaged accuracy: , consistency: .
- Mem0: (accuracy), (consistency).
- MemGPT: (accuracy), (consistency).
FadeMem reduces memory usage by approximately with retrieval and reasoning metrics that meet or surpass contemporary baselines.
5. Ablation Studies and Limitations
Ablation on LoCoMo (multi-hop F1):
- Full model: $29.43$
- Without dual-layer: $19.45$ ()
- Without fusion: $13.63$ ()
- Without conflict resolution: $22.88$ ()
Omission of key modules most strongly impacts temporal reasoning and open-domain scenarios. Limitations include:
- Dependence on LLM inference quality and resulting latency for fusion and conflict adjudication.
- Fixed schedules for important decay parameters (, ); meta-learning for these is suggested as future work.
- Instances of over-compression in memory fusion can occur, potentially eliminating fine-grained causal relationships when is not optimally set.
6. Context Within Memory-Efficient AI and Future Perspectives
FadeMem operates in the domain of efficient agent memory management, complementing and improving upon fixed-window and hierarchical memory protocols by embedding cognitive principles of selective forgetting (Wei et al., 26 Jan 2026). Its adoption of dual-layer hierarchy, adaptive decay, and LLM-mediated consolidation reflects emerging trends toward bio-inspired, information-centric memory systems for foundation model agents. Prospective work includes dynamic meta-learning of decay schedules and more granular memory clustering for enhanced preservation of complex, temporally distributed knowledge.
7. Comparison: "FadeMem" in Edge Training Contexts
It is notable that the term "FadeMem" is also referenced in the context of federated adversarial decoupled learning (FADE) for edge device training (Tang et al., 2022). In this paradigm, "FadeMem" characterizes aggressive memory savings achieved by modularizing model training; clients only load one small model module at a time rather than the entire network, yielding $40$– RAM reduction and enabling adversarial training where it was previously infeasible due to memory constraints. This separate usage highlights a distinct application domain (model training on heterogeneous hardware), but converges with agent memory FadeMem on the theme of resource-efficient retention and selective processing.
FadeMem, across both domains, is defined by differentiated, importance-aware management of memory—whether in model weights for federated training or in symbolic/content memory for LLM agents—using principled architectural and algorithmic mechanisms to optimize capability under strict resource bounds.