Papers
Topics
Authors
Recent
2000 character limit reached

Long-Term Episodic Memory Networks

Updated 11 December 2025
  • Long-Term Episodic Memory Networks (LEMN) are neural architectures that integrate a learnable RNN-based retention agent to manage sparse, unbounded data streams.
  • They leverage spatial and temporal contextualizations through variants like IM-LEMN, S-LEMN, and ST-LEMN to dynamically evaluate and retain critical memory entries.
  • Empirical evaluations in navigation, synthetic QA, and TriviaQA illustrate LEMN’s superior performance over traditional FIFO and LRU approaches with significant improvements in task outcomes.

Long-term Episodic Memory Networks (LEMN) are memory-augmented neural architectures designed to address the scalability limitations of contemporary external-memory–based neural networks, particularly for lifelong learning with unbounded data streams in which informative content is sparse relative to memory capacity. LEMN introduces a learnable, RNN-based memory retention agent that dynamically identifies and retains memory entries of task-generic importance by leveraging both relative and historical information, supporting robust performance in navigation and question-answering domains (Jung et al., 2018).

1. Architectural Overview

LEMN is structured as an augmentation to any external-memory–based network (e.g., MemN2N, BiDAF, Memory Q-Networks) by integrating a memory retention (eviction) agent.

  • External Memory: The memory comprises NN fixed-capacity slots Mt=[mt,1,...,mt,N]M_t = [m_{t,1},...,m_{t,N}] at timestep tt, with each slot mt,iRdm_{t,i} \in \mathbb{R}^d.
  • Input Encoding: Incoming datum xtx_t (which may be a sentence, image, or other modality) is encoded via a learnable function ψ\psi into ct=ψ(xt)Rdc_t = \psi(x_t) \in \mathbb{R}^d, forming the candidate for possible memory inclusion.
  • Memory Retention Agent: Upon memory saturation, a policy π\pi selects a slot for eviction (or enacts a "no-op" to skip writing). LEMN provides three categories of retention agents, leveraging varying depths of spatial and temporal context.

2. Mathematical Formulation of Retention

LEMNs’ policy is formulated as a categorical distribution over memory slots, parameterized via task-specific retention scores.

Memory Retention Pipeline

  • Slot Embeddings: For each slot, per-slot embedding et,i=ϕ(mt,i)e_{t,i} = \phi(m_{t,i}) is computed.
  • Retention Scores: Scores gt,ig_{t,i} are computed per slot using one of the following agent types:
    • Input-Matching LEMN (IM-LEMN): Computes dot-product similarity zt,i=et,ictz_{t,i} = e_{t,i}^\top c_t, blended with an exponential moving average vt,iv_{t,i} for recency (LRU tracking). A learned forgetting coefficient γt\gamma_t modulates sensitivity.
    • Spatial LEMN (S-LEMN): Utilizes a bidirectional GRU over spatial index ii, generating context-aware features ft,if_{t,i} which are projected to a scalar score.
    • Spatio-Temporal LEMN (ST-LEMN): Extends S-LEMN by incorporating a temporal GRU for each slot, enabling accumulation of historic slot importance.
  • Retention Policy: The categorical policy is π(miMt,ct)=softmaxi({gt,i})\pi(m_i | M_t, c_t) = \mathrm{softmax}_i(\{g_{t,i}\}). For interpretation as eviction, the retention probability is pretain(mi)=1π(mi)p_{\mathrm{retain}}(m_i) = 1 - \pi(m_i | \cdots).
Agent Variant Contextual Features Used Mechanism
IM-LEMN current input, slot similarity Dot-product + LRU-based EMA
S-LEMN slot-to-slot (spatial) Bi-GRU over memory at timestep tt
ST-LEMN spatial + history (temporal) Bi-GRU (spatial) + GRU (temporal, per-slot)

3. Training Methodology

Memory retention is cast as a policy-gradient RL task:

  • Action: At each tt, action at=ia_t = i corresponds to the memory slot chosen for eviction.
  • State: The current memory augmented with the candidate embedding st=[Mt;ct]s_t = [M_t; c_t].
  • Reward: At a future step tft_f, a downstream task (e.g., question answering, navigation) provides a reward signal RT{+1,1}R_\mathcal{T}\in\{+1,-1\} or a dense RL reward.
  • Optimization: The system (base network plus retention agent) is trained end-to-end using Asynchronous Advantage Actor-Critic (A3C) with Generalized Advantage Estimation (GAE). The loss is:

L=E[Atlogπ(atst)]+value function and entropy regularization termsL = -\mathbb{E}[A_t\log\pi(a_t|s_t)] + \text{value function and entropy regularization terms}

where AtA_t is the advantage estimate. Temporal hidden states in ST-LEMN, ht,ih_{t,i}, encode historical slot usage, inherently capturing historical importance.

4. Memory Update Procedure

Memory updating is based on the stochastic or deterministic selection of eviction candidates according to the learned policy. The operational pattern for memory management is as follows:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Initialize memory M  empty list (max size N)
Optional: Initialize hidden states {h_i} for ST-LEMN
for t in 1T:
    c  ψ(x_t)                                 # encode new input
    if |M| < N:
        append c to M                          # fill until full
    else:
        for i in 1N:
            e_i  φ(M[i])                      # per-slot embeddings
        g_i  retention_scores(e_1e_N, c)     # IM-LEMN, S-LEMN, or ST-LEMN
        π   softmax(g_1g_N)
        i*  sample_or_argmax(π)
        if i*  NOP_index:
            M[i*]  c                          # evict and replace
            if ST-LEMN: h_{i*}  0             # reset temporal state

The agent selectively evicts less relevant entries, as learned through experience and reward.

5. Empirical Evaluation

LEMN has demonstrated effectiveness in three task domains.

5.1 Maze Path-Finding (Memory Q-Networks)

  • Base Agents: MQN (memory Q-network, no context RNN), FRMQN (with context RNN).
  • Tasks:
    • I-Maze: Long corridor where initial indicator color determines goal; corridor up to length 200.
    • Random Maze Single-Goal: Varied maze topologies.
  • Results: With N=5N=5 slots and ST-LEMN retention, MQN+ST-LEMN achieves ~100% success rate across all lengths, outperforming MQN+FIFO (≈0% for length > 40). FRMQN+ST-LEMN maintains ~100% while FRMQN+FIFO degrades with length. Visualization reveals that ST-LEMN learns to retain decision-relevant cues (indicator color) and discard repetitive or irrelevant frames.

5.2 Synthetic QA (bAbI Two-Supporting-Facts Task)

  • Base: MemN2N with 3 hops, position encoding.
  • Datasets:
    • Original: 45 facts + 5 questions, fixed order.
    • Noisy: Inserted noise facts.
    • Large: Episode length extended (20–80).
  • Error Rates (memory=10):
Baseline/Agent Original Noisy Large
FIFO 16.5% 44.1% 32.4%
IM-LEMN 16.1% 18.9% 9.0%
S-LEMN 5.0% 4.8% 5.1%
ST-LEMN 4.6% 3.9% 5.6%

Shuffling spatial memory order causes performance drops, indicating that both absolute and relative ordering of memory contributes to effectiveness. Qualitative analysis confirms that ST-LEMN reliably preserves the minimal supporting facts needed for future queries, filtering out noise.

5.3 Real-World QA (TriviaQA)

  • Base: BiDAF, modified to operate at sentence granularity.
  • Dataset: TriviaQA, Wikipedia split (~2,900 words per doc, truncated to 800 = 40–50 sentences).
  • Metrics: ExactMatch (EM), F1 (on “Distant Supervision” set).
Baseline/Agent EM F1
FIFO 18.5% 20.3%
IM-LEMN 34.9% 38.7%
S-LEMN 43.0% 46.6%
ST-LEMN 45.2% 49.0%

ST-LEMN achieves the highest performance, with qualitative findings indicating selective memory retention of question-relevant sentences.

6. Significance and Comparative Analysis

LEMN’s primary contribution is an adaptive, lightweight retention agent that outperforms both rule-based (FIFO, LRU) and simplistic learned scheduling baselines by combining:

  • Short-term input-matching,
  • Spatial context (relative importance among stored entries),
  • Temporal aggregation (historical usage relevance).

A key insight is that memory retention policies benefiting from both spatial and temporal contextualization provide substantial advantages in both generalization (navigation, long contexts) and robustness to noisy or distractor-heavy environments. Sequential decision-theoretic framing for memory eviction, marshaled through A3C+GAE, yields end-to-end optimizable, task-sensitive memory schedulers without manual heuristics (Jung et al., 2018).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Follow Topic

Get notified by email when new papers are published related to Long-Term Episodic Memory Networks (LEMN).