Neural Experience Replay Samplers

Updated 25 February 2026

Neural Experience Replay Samplers (NERS) are neural network-based methods that dynamically weight past experiences to enhance sample efficiency and learning stability.
NERS use permutation-equivariant architectures and attention mechanisms to assign adaptive priorities, resulting in faster convergence and improved value estimation.
NERS integrate bias correction and importance sampling techniques to mitigate non-uniform sampling, achieving robust performance across RL, continual, and graph-based tasks.

Neural Experience Replay Samplers (NERS) refer to a class of data-driven, neural network–based sampling mechanisms that select and weight past experience samples in experience replay buffers for reinforcement and continual learning. Distinct from heuristic rules or static priorities, these samplers utilize trainable neural architectures—often leveraging attention or permutation-equivariant mechanisms—to compute sampling probabilities or buffer updates by integrating local sample statistics with global batch context. NERS frameworks have demonstrated empirical gains in sample efficiency, stability, robustness to noise, and continual learning performance, with implementations validated in deep Q-learning, actor-critic RL, multi-agent settings, and graph neural network continual learning (Chen et al., 2023, Sarfraz et al., 2023, Zhou et al., 2020, Oh et al., 2020).

1. Neural Architectures and Permutation-Equivariant Scoring

NERS typically employ neural network modules as samplers to compute sampling probabilities or buffer admission weights dynamically.

In off-policy RL, NERS uses permutation-equivariant architectures to process batches of transitions, ensuring that cross-sample relationships inform scoring (Oh et al., 2020). For each batch index $i$ , local features (e.g., $(s_i, a_i, r_i, s_{i+1}, \delta_i)$ ) are embedded by shared MLPs $\phi_\text{loc}$ . Batch-level “global” features are aggregated as $\mathbf{c} = \frac{1}{|I|} \sum_j \phi_\text{glob}(x_j)$ . The final priority score for transition $i$ is computed as $\sigma_i = \phi_\text{score}([\mathbf{h}_i^{\text{local}} ; \mathbf{c}])$ , where $[\,]$ denotes concatenation.
The Attention Loss Adjusted Prioritized (ALAP) framework augments standard DQN/DDPG-like architectures with a “neural sampler” side-branch. It processes mini-batches $X=[(s_1,a_1),\ldots,(s_m,a_m)]$ by self-attention: projecting to queries $Q=XW_Q$ , permuting to form keys $K$ , and computing a normalized “sum-of-projections” similarity $a$ which is mapped via an FC head to an adaptive importance-sampling exponent $\beta\in[0,1]$ (Equation: $\beta = \operatorname{Clip}_{[0,1]}(W_2\,\phi(W_1\,a+b_1)+b_2)$ ) (Chen et al., 2023).

This architectural principle enables the sampler to respond adaptively to buffer diversity, learning phase, and sample redundancy.

2. Mechanisms for Sample Selection and Buffer Update

NERS mechanisms impact both the probability with which existing buffer entries are sampled for replay and the rules by which new samples are added or prioritized in the buffer.

In off-policy RL, the sampling distribution over buffer indices is non-uniform, determined by neural priorities as $p_i = \frac{\sigma_i^\alpha}{\sum_j\sigma_j^\alpha}$ , with $\alpha$ controlling sharpness. The corresponding importance-sampling weights $w_i = \left( \frac{1}{|\mathcal{B}| p_i} \right)^\beta$ (where $\beta$ may also be learned) are applied to each sample (Oh et al., 2020, Chen et al., 2023).
In continual learning, Error-Sensitive Reservoir Sampling (ESRS) integrates model-based loss statistics for candidate filtering: a sample $(x_i,y_i)$ is eligible for buffer insertion iff its stable-model loss $\ell^i_s \leq \beta\mu_\ell$ , where $\mu_\ell$ is the running mean loss under the slow or semantic copy. Standard reservoir sampling is then applied to the filtered candidate stream, maintaining uniformity post-filtering (Sarfraz et al., 2023).
For graph continual learning, candidate selection for buffer updates leverages statistics such as proximity to class means (Mean-of-Feature), inter-class neighborhood sparseness (Coverage-Maximization), or influence on model loss as estimated by Hessian-vector products (Influence-Maximization) (Zhou et al., 2020).

A unified property across these methods is that the sample/batch relationships and model state inform either the selection probability or the admissibility of a sample to the buffer.

3. Bias Correction, Importance Sampling, and Theoretical Properties

Non-uniform sampling introduces bias in the estimation of value gradients or loss surfaces. NERS frameworks integrate explicit debiasing mechanisms:

The ALAP method adjusts the importance-sampling exponent $\beta$ via the neural sampler, ensuring

$w(i) = \left(\frac{1}{N P(i)}\right)^\beta$

approaches $w(i) \propto 1/(N P(i))$ as $\beta \to 1$ , which eliminates sampling-induced bias, guaranteeing that

$\mathbb{E}_{i \sim P}[w(i) \, \nabla_\theta L_i] = \nabla_\theta \left( \frac{1}{N} \sum_{i=1}^N L_i \right).$

The neural sampler adaptively increases $\beta$ as Q-networks converge, dynamically correcting bias throughout training (Chen et al., 2023).

In neural samplers for RL, per-step importance weights $w_i$ are computed and normalized before applying them to prioritizing samples in the loss, explicitly handling the distribution shift induced by the sampling policy (Oh et al., 2020).
ESRS, while primarily a buffer-update mechanism, filters out high-loss (potentially noisy or outlier) transitions, implicitly protecting against catastrophic forgetting without direct bias correction but with measurable improvements in empirical distributional quality (Sarfraz et al., 2023).

4. Empirical Impact: Sample Efficiency, Stability, and Robustness

NERS implementations demonstrate improvements across key metrics:

Metric	ALAP (NERS) (Chen et al., 2023)	Perm-Equiv NERS (Oh et al., 2020)	ESRS (Sarfraz et al., 2023)	ER-GNN Samplers (Zhou et al., 2020)
Convergence speed	2× faster (DQN/CartPole); >30% speedup	10–50% faster (TD3, SAC, Rainbow)	+5–7pp in continual Class-IL tasks	IM sampler reduces Catastrophic Forgetting (FM ↓)
Final return/accuracy	10–20% higher average return	Higher asymptotic return all tasks	Doubled accuracy under label noise	PM up to 95.66% (Cora)
Stability/variance	50–80% reduction in variance	Higher diversity/Std in sampled batches	Lower buffer corruption, less drift	Consistency across GNNs
Noise/label robustness	—	—	>2× accuracy under 50% label noise	—
Generality	Same code for DQN, DDPG, MADDPG	Continuous/discrete/both	Consistent under task/buffer sizes	Specializes to GNN continual learning

Significant findings include sample selection that maintains higher diversity (NERS batch TD/Q-value stds are higher than RANDOM or greedy, supporting the mechanism’s efficacy), improved resistance to label noise (ESRS), and minimization of catastrophic forgetting in continual learning.

5. Algorithmic and Training Considerations

NERS modules are typically trained on-line, as meta-learners or via supervised/policy-gradient objectives.

The permutation-equivariant NERS (Oh et al., 2020) employs REINFORCE-based updates to maximize a replay-improvement reward, the delta in cumulative return for models fine-tuned under current sampler parameters. The NERS policy is thus directly trained to maximize agent performance.
ALAP maintains a mirror buffer with i.i.d. sampling (Double Sampling) to de-correlate the self-attention–driven $\beta$ adjustment from the prioritized batch used for network updates. Updates to the “neural sampler” branch use alternate uniform samples rather than replay-biased ones, preventing positive feedback loops (Chen et al., 2023).
ESRS requires an additional forward pass through the stable model for each incoming sample to assess candidate admissibility; however, update cost is $O(1)$ per sample. Reservoir sampling remains uniform within the pre-filtered stream (Sarfraz et al., 2023).

NERS introduces moderate computational overhead (additional forward/backward passes for the sampler networks), but empirical studies indicate this is amortized by faster convergence and improved final performance.

6. Extensions, Limitations, and Future Directions

NERS design is generalizable across RL and continual learning domains, with instantiations for off-policy actor-critic, multi-agent, DQN/DDPG, and GNN-based learning. Reported limitations include:

Increased computational cost for sampler forward/backward passes, particularly in influence-based samplers for GNNs, where Hessian-vector solves are required (Zhou et al., 2020).
Replay-reward estimation in RL NERS requires multiple full evaluation rollouts, which can be expensive or impractical in real-world settings (Oh et al., 2020).
For ALAP, the adaptive $\beta$ mechanism relies on measuring “batch concentration” via simple self-attention; it may underperform if the buffer’s state is not well reflected in such a statistic (Chen et al., 2023).

Future avenues proposed include incorporating uncertainty/model-based signals into neural sampler features, multi-agent or hierarchical replay, and meta-learned annealing of sample-selection exponents (Oh et al., 2020). A plausible implication is that NERS will become standard in regimes where data efficiency and robustness to nonstationarity are central, especially as online RL and continual learning advance.

Markdown Report Issue Upgrade to Chat

References (4)

Attention Loss Adjusted Prioritized Experience Replay (2023)

Error Sensitivity Modulation based Experience Replay: Mitigating Abrupt Representation Drift in Continual Learning (2023)

Overcoming Catastrophic Forgetting in Graph Neural Networks with Experience Replay (2020)

Learning to Sample with Local and Global Contexts in Experience Replay Buffer (2020)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Neural Experience Replay Samplers (NERS).

Neural Experience Replay Samplers

1. Neural Architectures and Permutation-Equivariant Scoring

2. Mechanisms for Sample Selection and Buffer Update

3. Bias Correction, Importance Sampling, and Theoretical Properties

4. Empirical Impact: Sample Efficiency, Stability, and Robustness

5. Algorithmic and Training Considerations

6. Extensions, Limitations, and Future Directions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Neural Experience Replay Samplers

1. Neural Architectures and Permutation-Equivariant Scoring

2. Mechanisms for Sample Selection and Buffer Update

3. Bias Correction, Importance Sampling, and Theoretical Properties

4. Empirical Impact: Sample Efficiency, Stability, and Robustness

5. Algorithmic and Training Considerations

6. Extensions, Limitations, and Future Directions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research