Papers
Topics
Authors
Recent
Search
2000 character limit reached

Long-Term Episodic Memory Networks

Updated 11 December 2025
  • Long-Term Episodic Memory Networks (LEMN) are neural architectures that integrate a learnable RNN-based retention agent to manage sparse, unbounded data streams.
  • They leverage spatial and temporal contextualizations through variants like IM-LEMN, S-LEMN, and ST-LEMN to dynamically evaluate and retain critical memory entries.
  • Empirical evaluations in navigation, synthetic QA, and TriviaQA illustrate LEMN’s superior performance over traditional FIFO and LRU approaches with significant improvements in task outcomes.

Long-term Episodic Memory Networks (LEMN) are memory-augmented neural architectures designed to address the scalability limitations of contemporary external-memory–based neural networks, particularly for lifelong learning with unbounded data streams in which informative content is sparse relative to memory capacity. LEMN introduces a learnable, RNN-based memory retention agent that dynamically identifies and retains memory entries of task-generic importance by leveraging both relative and historical information, supporting robust performance in navigation and question-answering domains (Jung et al., 2018).

1. Architectural Overview

LEMN is structured as an augmentation to any external-memory–based network (e.g., MemN2N, BiDAF, Memory Q-Networks) by integrating a memory retention (eviction) agent.

  • External Memory: The memory comprises NN fixed-capacity slots Mt=[mt,1,...,mt,N]M_t = [m_{t,1},...,m_{t,N}] at timestep tt, with each slot mt,iRdm_{t,i} \in \mathbb{R}^d.
  • Input Encoding: Incoming datum xtx_t (which may be a sentence, image, or other modality) is encoded via a learnable function ψ\psi into ct=ψ(xt)Rdc_t = \psi(x_t) \in \mathbb{R}^d, forming the candidate for possible memory inclusion.
  • Memory Retention Agent: Upon memory saturation, a policy π\pi selects a slot for eviction (or enacts a "no-op" to skip writing). LEMN provides three categories of retention agents, leveraging varying depths of spatial and temporal context.

2. Mathematical Formulation of Retention

LEMNs’ policy is formulated as a categorical distribution over memory slots, parameterized via task-specific retention scores.

Memory Retention Pipeline

  • Slot Embeddings: For each slot, per-slot embedding et,i=ϕ(mt,i)e_{t,i} = \phi(m_{t,i}) is computed.
  • Retention Scores: Scores gt,ig_{t,i} are computed per slot using one of the following agent types:
    • Input-Matching LEMN (IM-LEMN): Computes dot-product similarity Mt=[mt,1,...,mt,N]M_t = [m_{t,1},...,m_{t,N}]0, blended with an exponential moving average Mt=[mt,1,...,mt,N]M_t = [m_{t,1},...,m_{t,N}]1 for recency (LRU tracking). A learned forgetting coefficient Mt=[mt,1,...,mt,N]M_t = [m_{t,1},...,m_{t,N}]2 modulates sensitivity.
    • Spatial LEMN (S-LEMN): Utilizes a bidirectional GRU over spatial index Mt=[mt,1,...,mt,N]M_t = [m_{t,1},...,m_{t,N}]3, generating context-aware features Mt=[mt,1,...,mt,N]M_t = [m_{t,1},...,m_{t,N}]4 which are projected to a scalar score.
    • Spatio-Temporal LEMN (ST-LEMN): Extends S-LEMN by incorporating a temporal GRU for each slot, enabling accumulation of historic slot importance.
  • Retention Policy: The categorical policy is Mt=[mt,1,...,mt,N]M_t = [m_{t,1},...,m_{t,N}]5. For interpretation as eviction, the retention probability is Mt=[mt,1,...,mt,N]M_t = [m_{t,1},...,m_{t,N}]6.
Agent Variant Contextual Features Used Mechanism
IM-LEMN current input, slot similarity Dot-product + LRU-based EMA
S-LEMN slot-to-slot (spatial) Bi-GRU over memory at timestep Mt=[mt,1,...,mt,N]M_t = [m_{t,1},...,m_{t,N}]7
ST-LEMN spatial + history (temporal) Bi-GRU (spatial) + GRU (temporal, per-slot)

3. Training Methodology

Memory retention is cast as a policy-gradient RL task:

  • Action: At each Mt=[mt,1,...,mt,N]M_t = [m_{t,1},...,m_{t,N}]8, action Mt=[mt,1,...,mt,N]M_t = [m_{t,1},...,m_{t,N}]9 corresponds to the memory slot chosen for eviction.
  • State: The current memory augmented with the candidate embedding tt0.
  • Reward: At a future step tt1, a downstream task (e.g., question answering, navigation) provides a reward signal tt2 or a dense RL reward.
  • Optimization: The system (base network plus retention agent) is trained end-to-end using Asynchronous Advantage Actor-Critic (A3C) with Generalized Advantage Estimation (GAE). The loss is:

tt3

where tt4 is the advantage estimate. Temporal hidden states in ST-LEMN, tt5, encode historical slot usage, inherently capturing historical importance.

4. Memory Update Procedure

Memory updating is based on the stochastic or deterministic selection of eviction candidates according to the learned policy. The operational pattern for memory management is as follows:

tt7

The agent selectively evicts less relevant entries, as learned through experience and reward.

5. Empirical Evaluation

LEMN has demonstrated effectiveness in three task domains.

5.1 Maze Path-Finding (Memory Q-Networks)

  • Base Agents: MQN (memory Q-network, no context RNN), FRMQN (with context RNN).
  • Tasks:
    • I-Maze: Long corridor where initial indicator color determines goal; corridor up to length 200.
    • Random Maze Single-Goal: Varied maze topologies.
  • Results: With tt6 slots and ST-LEMN retention, MQN+ST-LEMN achieves ~100% success rate across all lengths, outperforming MQN+FIFO (≈0% for length > 40). FRMQN+ST-LEMN maintains ~100% while FRMQN+FIFO degrades with length. Visualization reveals that ST-LEMN learns to retain decision-relevant cues (indicator color) and discard repetitive or irrelevant frames.

5.2 Synthetic QA (bAbI Two-Supporting-Facts Task)

  • Base: MemN2N with 3 hops, position encoding.
  • Datasets:
    • Original: 45 facts + 5 questions, fixed order.
    • Noisy: Inserted noise facts.
    • Large: Episode length extended (20–80).
  • Error Rates (memory=10):
Baseline/Agent Original Noisy Large
FIFO 16.5% 44.1% 32.4%
IM-LEMN 16.1% 18.9% 9.0%
S-LEMN 5.0% 4.8% 5.1%
ST-LEMN 4.6% 3.9% 5.6%

Shuffling spatial memory order causes performance drops, indicating that both absolute and relative ordering of memory contributes to effectiveness. Qualitative analysis confirms that ST-LEMN reliably preserves the minimal supporting facts needed for future queries, filtering out noise.

5.3 Real-World QA (TriviaQA)

  • Base: BiDAF, modified to operate at sentence granularity.
  • Dataset: TriviaQA, Wikipedia split (~2,900 words per doc, truncated to 800 = 40–50 sentences).
  • Metrics: ExactMatch (EM), F1 (on “Distant Supervision” set).
Baseline/Agent EM F1
FIFO 18.5% 20.3%
IM-LEMN 34.9% 38.7%
S-LEMN 43.0% 46.6%
ST-LEMN 45.2% 49.0%

ST-LEMN achieves the highest performance, with qualitative findings indicating selective memory retention of question-relevant sentences.

6. Significance and Comparative Analysis

LEMN’s primary contribution is an adaptive, lightweight retention agent that outperforms both rule-based (FIFO, LRU) and simplistic learned scheduling baselines by combining:

  • Short-term input-matching,
  • Spatial context (relative importance among stored entries),
  • Temporal aggregation (historical usage relevance).

A key insight is that memory retention policies benefiting from both spatial and temporal contextualization provide substantial advantages in both generalization (navigation, long contexts) and robustness to noisy or distractor-heavy environments. Sequential decision-theoretic framing for memory eviction, marshaled through A3C+GAE, yields end-to-end optimizable, task-sensitive memory schedulers without manual heuristics (Jung et al., 2018).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Long-Term Episodic Memory Networks (LEMN).