Long-Term Episodic Memory Networks
- Long-Term Episodic Memory Networks (LEMN) are neural architectures that integrate a learnable RNN-based retention agent to manage sparse, unbounded data streams.
- They leverage spatial and temporal contextualizations through variants like IM-LEMN, S-LEMN, and ST-LEMN to dynamically evaluate and retain critical memory entries.
- Empirical evaluations in navigation, synthetic QA, and TriviaQA illustrate LEMN’s superior performance over traditional FIFO and LRU approaches with significant improvements in task outcomes.
Long-term Episodic Memory Networks (LEMN) are memory-augmented neural architectures designed to address the scalability limitations of contemporary external-memory–based neural networks, particularly for lifelong learning with unbounded data streams in which informative content is sparse relative to memory capacity. LEMN introduces a learnable, RNN-based memory retention agent that dynamically identifies and retains memory entries of task-generic importance by leveraging both relative and historical information, supporting robust performance in navigation and question-answering domains (Jung et al., 2018).
1. Architectural Overview
LEMN is structured as an augmentation to any external-memory–based network (e.g., MemN2N, BiDAF, Memory Q-Networks) by integrating a memory retention (eviction) agent.
- External Memory: The memory comprises fixed-capacity slots at timestep , with each slot .
- Input Encoding: Incoming datum (which may be a sentence, image, or other modality) is encoded via a learnable function into , forming the candidate for possible memory inclusion.
- Memory Retention Agent: Upon memory saturation, a policy selects a slot for eviction (or enacts a "no-op" to skip writing). LEMN provides three categories of retention agents, leveraging varying depths of spatial and temporal context.
2. Mathematical Formulation of Retention
LEMNs’ policy is formulated as a categorical distribution over memory slots, parameterized via task-specific retention scores.
Memory Retention Pipeline
- Slot Embeddings: For each slot, per-slot embedding is computed.
- Retention Scores: Scores are computed per slot using one of the following agent types:
- Input-Matching LEMN (IM-LEMN): Computes dot-product similarity , blended with an exponential moving average for recency (LRU tracking). A learned forgetting coefficient modulates sensitivity.
- Spatial LEMN (S-LEMN): Utilizes a bidirectional GRU over spatial index , generating context-aware features which are projected to a scalar score.
- Spatio-Temporal LEMN (ST-LEMN): Extends S-LEMN by incorporating a temporal GRU for each slot, enabling accumulation of historic slot importance.
- Retention Policy: The categorical policy is . For interpretation as eviction, the retention probability is .
| Agent Variant | Contextual Features Used | Mechanism |
|---|---|---|
| IM-LEMN | current input, slot similarity | Dot-product + LRU-based EMA |
| S-LEMN | slot-to-slot (spatial) | Bi-GRU over memory at timestep |
| ST-LEMN | spatial + history (temporal) | Bi-GRU (spatial) + GRU (temporal, per-slot) |
3. Training Methodology
Memory retention is cast as a policy-gradient RL task:
- Action: At each , action corresponds to the memory slot chosen for eviction.
- State: The current memory augmented with the candidate embedding .
- Reward: At a future step , a downstream task (e.g., question answering, navigation) provides a reward signal or a dense RL reward.
- Optimization: The system (base network plus retention agent) is trained end-to-end using Asynchronous Advantage Actor-Critic (A3C) with Generalized Advantage Estimation (GAE). The loss is:
where is the advantage estimate. Temporal hidden states in ST-LEMN, , encode historical slot usage, inherently capturing historical importance.
4. Memory Update Procedure
Memory updating is based on the stochastic or deterministic selection of eviction candidates according to the learned policy. The operational pattern for memory management is as follows:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
Initialize memory M ← empty list (max size N) Optional: Initialize hidden states {h_i} for ST-LEMN for t in 1…T: c ← ψ(x_t) # encode new input if |M| < N: append c to M # fill until full else: for i in 1…N: e_i ← φ(M[i]) # per-slot embeddings g_i ← retention_scores(e_1…e_N, c) # IM-LEMN, S-LEMN, or ST-LEMN π ← softmax(g_1…g_N) i* ← sample_or_argmax(π) if i* ≠ NOP_index: M[i*] ← c # evict and replace if ST-LEMN: h_{i*} ← 0 # reset temporal state |
The agent selectively evicts less relevant entries, as learned through experience and reward.
5. Empirical Evaluation
LEMN has demonstrated effectiveness in three task domains.
5.1 Maze Path-Finding (Memory Q-Networks)
- Base Agents: MQN (memory Q-network, no context RNN), FRMQN (with context RNN).
- Tasks:
- I-Maze: Long corridor where initial indicator color determines goal; corridor up to length 200.
- Random Maze Single-Goal: Varied maze topologies.
- Results: With slots and ST-LEMN retention, MQN+ST-LEMN achieves ~100% success rate across all lengths, outperforming MQN+FIFO (≈0% for length > 40). FRMQN+ST-LEMN maintains ~100% while FRMQN+FIFO degrades with length. Visualization reveals that ST-LEMN learns to retain decision-relevant cues (indicator color) and discard repetitive or irrelevant frames.
5.2 Synthetic QA (bAbI Two-Supporting-Facts Task)
- Base: MemN2N with 3 hops, position encoding.
- Datasets:
- Original: 45 facts + 5 questions, fixed order.
- Noisy: Inserted noise facts.
- Large: Episode length extended (20–80).
- Error Rates (memory=10):
| Baseline/Agent | Original | Noisy | Large |
|---|---|---|---|
| FIFO | 16.5% | 44.1% | 32.4% |
| IM-LEMN | 16.1% | 18.9% | 9.0% |
| S-LEMN | 5.0% | 4.8% | 5.1% |
| ST-LEMN | 4.6% | 3.9% | 5.6% |
Shuffling spatial memory order causes performance drops, indicating that both absolute and relative ordering of memory contributes to effectiveness. Qualitative analysis confirms that ST-LEMN reliably preserves the minimal supporting facts needed for future queries, filtering out noise.
5.3 Real-World QA (TriviaQA)
- Base: BiDAF, modified to operate at sentence granularity.
- Dataset: TriviaQA, Wikipedia split (~2,900 words per doc, truncated to 800 = 40–50 sentences).
- Metrics: ExactMatch (EM), F1 (on “Distant Supervision” set).
| Baseline/Agent | EM | F1 |
|---|---|---|
| FIFO | 18.5% | 20.3% |
| IM-LEMN | 34.9% | 38.7% |
| S-LEMN | 43.0% | 46.6% |
| ST-LEMN | 45.2% | 49.0% |
ST-LEMN achieves the highest performance, with qualitative findings indicating selective memory retention of question-relevant sentences.
6. Significance and Comparative Analysis
LEMN’s primary contribution is an adaptive, lightweight retention agent that outperforms both rule-based (FIFO, LRU) and simplistic learned scheduling baselines by combining:
- Short-term input-matching,
- Spatial context (relative importance among stored entries),
- Temporal aggregation (historical usage relevance).
A key insight is that memory retention policies benefiting from both spatial and temporal contextualization provide substantial advantages in both generalization (navigation, long contexts) and robustness to noisy or distractor-heavy environments. Sequential decision-theoretic framing for memory eviction, marshaled through A3C+GAE, yields end-to-end optimizable, task-sensitive memory schedulers without manual heuristics (Jung et al., 2018).