Long-Term Episodic Memory Networks

Updated 11 December 2025

Long-Term Episodic Memory Networks (LEMN) are neural architectures that integrate a learnable RNN-based retention agent to manage sparse, unbounded data streams.
They leverage spatial and temporal contextualizations through variants like IM-LEMN, S-LEMN, and ST-LEMN to dynamically evaluate and retain critical memory entries.
Empirical evaluations in navigation, synthetic QA, and TriviaQA illustrate LEMN’s superior performance over traditional FIFO and LRU approaches with significant improvements in task outcomes.

Long-term Episodic Memory Networks (LEMN) are memory-augmented neural architectures designed to address the scalability limitations of contemporary external-memory–based neural networks, particularly for lifelong learning with unbounded data streams in which informative content is sparse relative to memory capacity. LEMN introduces a learnable, RNN-based memory retention agent that dynamically identifies and retains memory entries of task-generic importance by leveraging both relative and historical information, supporting robust performance in navigation and question-answering domains (Jung et al., 2018).

1. Architectural Overview

LEMN is structured as an augmentation to any external-memory–based network (e.g., MemN2N, BiDAF, Memory Q-Networks) by integrating a memory retention (eviction) agent.

External Memory: The memory comprises $N$ fixed-capacity slots $M_t = [m_{t,1},...,m_{t,N}]$ at timestep $t$ , with each slot $m_{t,i} \in \mathbb{R}^d$ .
Input Encoding: Incoming datum $x_t$ (which may be a sentence, image, or other modality) is encoded via a learnable function $\psi$ into $c_t = \psi(x_t) \in \mathbb{R}^d$ , forming the candidate for possible memory inclusion.
Memory Retention Agent: Upon memory saturation, a policy $\pi$ selects a slot for eviction (or enacts a "no-op" to skip writing). LEMN provides three categories of retention agents, leveraging varying depths of spatial and temporal context.

2. Mathematical Formulation of Retention

LEMNs’ policy is formulated as a categorical distribution over memory slots, parameterized via task-specific retention scores.

Memory Retention Pipeline

Slot Embeddings: For each slot, per-slot embedding $e_{t,i} = \phi(m_{t,i})$ is computed.
Retention Scores: Scores $g_{t,i}$ $g_{t, i}$ are computed per slot using one of the following agent types:
- Input-Matching LEMN (IM-LEMN): Computes dot-product similarity $z_{t,i} = e_{t,i}^\top c_t$ , blended with an exponential moving average $v_{t,i}$ for recency (LRU tracking). A learned forgetting coefficient $\gamma_t$ modulates sensitivity.
- Spatial LEMN (S-LEMN): Utilizes a bidirectional GRU over spatial index $i$ , generating context-aware features $f_{t,i}$ which are projected to a scalar score.
- Spatio-Temporal LEMN (ST-LEMN): Extends S-LEMN by incorporating a temporal GRU for each slot, enabling accumulation of historic slot importance.
Retention Policy: The categorical policy is $\pi(m_i | M_t, c_t) = \mathrm{softmax}_i(\{g_{t,i}\})$ . For interpretation as eviction, the retention probability is $p_{\mathrm{retain}}(m_i) = 1 - \pi(m_i | \cdots)$ .

Agent Variant	Contextual Features Used	Mechanism
IM-LEMN	current input, slot similarity	Dot-product + LRU-based EMA
S-LEMN	slot-to-slot (spatial)	Bi-GRU over memory at timestep $t$
ST-LEMN	spatial + history (temporal)	Bi-GRU (spatial) + GRU (temporal, per-slot)

3. Training Methodology

Memory retention is cast as a policy-gradient RL task:

Action: At each $t$ , action $a_t = i$ corresponds to the memory slot chosen for eviction.
State: The current memory augmented with the candidate embedding $s_t = [M_t; c_t]$ .
Reward: At a future step $t_f$ , a downstream task (e.g., question answering, navigation) provides a reward signal $R_\mathcal{T}\in\{+1,-1\}$ or a dense RL reward.
Optimization: The system (base network plus retention agent) is trained end-to-end using Asynchronous Advantage Actor-Critic (A3C) with Generalized Advantage Estimation (GAE). The loss is:

$L = -\mathbb{E}[A_t\log\pi(a_t|s_t)] + \text{value function and entropy regularization terms}$

where $A_t$ is the advantage estimate. Temporal hidden states in ST-LEMN, $h_{t,i}$ , encode historical slot usage, inherently capturing historical importance.

4. Memory Update Procedure

Memory updating is based on the stochastic or deterministic selection of eviction candidates according to the learned policy. The operational pattern for memory management is as follows:

Initialize memory M ← empty list (max size N)
Optional: Initialize hidden states {h_i} for ST-LEMN
for t in 1…T:
    c ← ψ(x_t)                                 # encode new input
    if |M| < N:
        append c to M                          # fill until full
    else:
        for i in 1…N:
            e_i ← φ(M[i])                      # per-slot embeddings
        g_i ← retention_scores(e_1…e_N, c)     # IM-LEMN, S-LEMN, or ST-LEMN
        π  ← softmax(g_1…g_N)
        i* ← sample_or_argmax(π)
        if i* ≠ NOP_index:
            M[i*] ← c                          # evict and replace
            if ST-LEMN: h_{i*} ← 0             # reset temporal state

The agent selectively evicts less relevant entries, as learned through experience and reward.

5. Empirical Evaluation

LEMN has demonstrated effectiveness in three task domains.

5.1 Maze Path-Finding (Memory Q-Networks)

Base Agents: MQN (memory Q-network, no context RNN), FRMQN (with context RNN).
Tasks:
- I-Maze: Long corridor where initial indicator color determines goal; corridor up to length 200.
- Random Maze Single-Goal: Varied maze topologies.
Results: With $N=5$ slots and ST-LEMN retention, MQN+ST-LEMN achieves ~100% success rate across all lengths, outperforming MQN+FIFO (≈0% for length > 40). FRMQN+ST-LEMN maintains ~100% while FRMQN+FIFO degrades with length. Visualization reveals that ST-LEMN learns to retain decision-relevant cues (indicator color) and discard repetitive or irrelevant frames.

5.2 Synthetic QA (bAbI Two-Supporting-Facts Task)

Base: MemN2N with 3 hops, position encoding.
Datasets:
- Original: 45 facts + 5 questions, fixed order.
- Noisy: Inserted noise facts.
- Large: Episode length extended (20–80).
Error Rates (memory=10):

Baseline/Agent	Original	Noisy	Large
FIFO	16.5%	44.1%	32.4%
IM-LEMN	16.1%	18.9%	9.0%
S-LEMN	5.0%	4.8%	5.1%
ST-LEMN	4.6%	3.9%	5.6%

Shuffling spatial memory order causes performance drops, indicating that both absolute and relative ordering of memory contributes to effectiveness. Qualitative analysis confirms that ST-LEMN reliably preserves the minimal supporting facts needed for future queries, filtering out noise.

5.3 Real-World QA (TriviaQA)

Base: BiDAF, modified to operate at sentence granularity.
Dataset: TriviaQA, Wikipedia split (~2,900 words per doc, truncated to 800 = 40–50 sentences).
Metrics: ExactMatch (EM), F1 (on “Distant Supervision” set).

Baseline/Agent	EM	F1
FIFO	18.5%	20.3%
IM-LEMN	34.9%	38.7%
S-LEMN	43.0%	46.6%
ST-LEMN	45.2%	49.0%

ST-LEMN achieves the highest performance, with qualitative findings indicating selective memory retention of question-relevant sentences.

6. Significance and Comparative Analysis

LEMN’s primary contribution is an adaptive, lightweight retention agent that outperforms both rule-based (FIFO, LRU) and simplistic learned scheduling baselines by combining:

Short-term input-matching,
Spatial context (relative importance among stored entries),
Temporal aggregation (historical usage relevance).

A key insight is that memory retention policies benefiting from both spatial and temporal contextualization provide substantial advantages in both generalization (navigation, long contexts) and robustness to noisy or distractor-heavy environments. Sequential decision-theoretic framing for memory eviction, marshaled through A3C+GAE, yields end-to-end optimizable, task-sensitive memory schedulers without manual heuristics (Jung et al., 2018).

PDF Markdown Chat (Pro)

References (1)

Learning What to Remember: Long-term Episodic Memory Networks for Learning from Streaming Data (2018)

Whiteboard

Generate a whiteboard explanation of this topic.

Follow Topic

Get notified by email when new papers are published related to Long-Term Episodic Memory Networks (LEMN).