Multiscale Experience Replay (MER)

Updated 11 January 2026

The paper introduces a multi-scale replay schedule that achieves O(1/T) convergence without requiring knowledge of the Markov chain's mixing time.
MER is a structured algorithm that employs epoch-based, coarse-to-fine buffer sampling to emulate nearly independent, i.i.d. data performance under Markovian noise.
MER outperforms serial stochastic approximation and skip-sampling methods by automatically adapting to the chain’s correlation structure for robust, efficient convergence.

Multiscale Experience Replay (MER) is a provably correct algorithmic framework for solving stochastic variational inequalities (VIs) when sample observations are generated from a Markov chain and stored in a finite replay buffer. MER circumvents the bias and slow convergence rates inherent in standard serial stochastic approximation (SA) under Markovian noise by deploying a multi-scale sampling schedule over the buffer, emulating nearly independent sampling and achieving iteration complexity rates characteristic of i.i.d. scenarios—without requiring any knowledge of the Markov chain’s mixing time (Nakul et al., 4 Jan 2026).

1. Problem Setting: Stochastic VIs with Markovian Data and Buffer Bias

MER addresses the problem of finding $x^* \in X \subset \mathbb{R}^n$ such that the monotone variational inequality

$\langle F(x^*), x - x^* \rangle \ge 0, \quad \forall x \in X$

is satisfied, where $F$ is assumed $L$ –Lipschitz and $\mu$ –strongly monotone: $\langle F(x) - F(y), x - y \rangle \ge \mu \|x - y\|^2, \qquad \forall x, y \in X.$

Instead of direct access to $F(x)$ , only stochastic oracle evaluations $\widetilde{F}(x,\xi)$ with mean $F(x)$ are available, but where the samples $\xi_t$ are correlated through a Markov chain of mixing time $t_{\rm mix}$ . Naive serial SA iterations,

$x_{t+1} = \Pi_X[x_t - \eta\,\widetilde{F}(x_t, \xi_t)],$

accumulate bias proportional to $\bar{\tau}/t$ (with $\bar{\tau}\approx t_{\rm mix}$ ), resulting in suboptimal $O(\bar{\tau}/T)$ convergence. Classical skip-sampling (e.g., CTD) can restore $O(1/T)$ rates with prior knowledge of mixing time, but is brittle to poor tuning.

MER relies solely on standard buffer access—allowing arbitrary selection from a buffer of size $B$ containing recent samples—deploying a principled multi-epoch, multi-scale usage pattern, automatically adapting to the chain’s correlation structure.

2. MER Algorithm: Multi-Scale Epoch-Based Replay

MER operates in $K \approx \log_2 B$ epochs, indexed by $k$ , where each epoch uses a buffer sampling gap $\tau_k = B/2^k$ for exactly $T_k = 2^k$ updates. This geometric progression traverses coarse-to-fine time scales, with early epochs exploiting widely separated samples and later epochs focusing on finer spacings.

Algorithmic steps per epoch $k$ :

Initialize $x^{(k)}_1 \in X$ .
For $t = 1,\ldots,T_k$ $t = 1, \dots, T_{k}$ :
1. Select buffer index $i = t\,\tau_k$ , extract $\xi_i$ .
2. Update:
  
  $x_{t+1}^{(k)} = \arg\min_{x \in X} \left\{ \eta_k \langle \widetilde{F}(x_t^{(k)}, \xi_i), x \rangle + \frac{1}{2} \|x_t^{(k)} - x\|^2 \right\}.$

3. (Online setting) Replace $\xi_i$ with the next incoming chain sample.

Pseudo-code:

Algorithm MER
Input: buffer {ξ₁,…,ξ_B}, epochs K, step sizes {η_k}
for k = 1 to K:
    set τ_k = B/2^k, T_k = 2^k; re-init x₁^(k)
    for t = 1 to T_k:
        pick ξ ← ξ_{t τ_k}
        x_{t+1}^{(k)} = argmin_{x∈X} {η_k⟨\widetilde{F}(x_t^{(k)},ξ), x⟩ + 0.5‖x_t^{(k)}-x‖²}
        delete ξ_{t τ_k}, append new sample
    end
end
Output: x_{T_K + 1}^{(K)} or average

If mixing time were known, a constant gap

\tau_k = t_{\rm mix}

would suffice (skip-sampling/CTD); MER’s geometric approach obviates parameter tuning.

3. Convergence Guarantees and Complexity Bounds

MER’s central theoretical results quantify error and iteration complexity as follows. Let $\tau_M = \frac{\ln(18 C/\mu)}{\ln(1/\rho)}$ ; $\alpha_k = \tau_M/\tau_k$ ; $\bar L = L + \widetilde L_1$ .

Theorem 1 (General Convergence):

If $B = \Omega(\tau_M \log \tau_M)$ and

$\eta_k \asymp \min \left\{ \frac{\mu}{\bar{L}^2(\alpha_k + 1)}, \,\, \frac{\log T_k}{\mu T_k} \right\},$

then

$\E [\|x_{T_k + 1} - x^*\|^2 ] \le M \left( 1 + \frac{3\mu^2}{8 (\alpha_k + 1) (\zeta^2 + 16\bar L^2)} \right)^{-T_k}(D^2 + 1) + \frac{20 C_M \rho^{\tau_M+\tau_k-1}}{\mu} + O\left( \frac{\alpha_k+1}{\mu^2 T_k} \right),$

where $O((\alpha_k+1)/T_k)$ matches i.i.d. SA up to logarithmic factors when $\alpha_k = O(1)$ .

Theorem 2 (i.i.d. Emulation):

For $\tau_k = \beta \tau_M$ (with $\beta > 1$ ),

$|\|\Delta_{T+1}\| - \|\widetilde{\Delta}_{T+1}\|| \le 3c_0 \frac{\widetilde{L}_2}{\widetilde{L}_1} \sqrt{T} \, e^{-\beta},$

so with $\beta = (3/2) \ln T$ , the i.i.d. path is tracked up to $O(1/T)$ accuracy.

These statements show MER recovers $O(1/T)$ stochastic error rates in epochs with separation exceeding mixing time, and automatically transitions across scales, matching i.i.d. performance without explicit knowledge of $t_{\rm mix}$ .

4. Robustness and Comparative Analysis

MER’s robustness is manifest relative to alternatives:

Serial SA (no replay): Yields $O(\bar{\tau}/T)$ rate with Markovian data-induced bias.
Uniform buffer replay: Reduces bias relative to serial, but interleaves scales arbitrarily; lacks structured epoch-wise guarantees.
Skip-sampling (CTD): Requires accurate skip parameter; suboptimal if under- or overshooting $t_{\rm mix}$ , either retaining bias or wasting samples.
MER: Covers all time scales geometrically, harvesting the coarse-scale acceleration early ( $\tau_k \gg t_{\rm mix}$ ), then gracefully exhausting buffer resolution at fine scales.

Empirical comparisons (see Figures 1–4 in (Nakul et al., 4 Jan 2026)) report that MER nearly matches i.i.d. SA in early epochs and surpasses skip-sampling unless perfectly tuned, asymptotically outperforming naive serial SA.

Approach	Parameter dependence	Rate
Serial SA	Markovian chain ( $t_{\rm mix}$ )	$O(\bar{\tau}/T)$
Skip-sampling (CTD)	Requires $t_{\rm mix}$ , fragile	$O(1/T)$ if optimal
Uniform replay	Buffer size, no epoch structure	Varies, lacks guarantee
MER	Buffer size $B$ , no mixing time	$O(1/T)$ whenever possible

5. Applications: RL Policy Evaluation and Generalized Linear Models

MER applies to core estimation problems affected by temporal dependence:

(a) Policy Evaluation (TD(0) with MER):

For a Markov reward process $(\mathcal{S}, P, R, \gamma)$ , value function approximations via projected Bellman VI reduce to finding $\theta$ s.t.

$F(\theta) = 0, \quad \text{with} \quad \widetilde F(\theta, (s,s',R)) = (\langle \psi(s), \theta \rangle - R - \gamma \langle \psi(s'), \theta \rangle) \psi(s).$

Under bounded features and rewards, $F$ is Lipschitz and strongly monotone. Corollary 6.1 (Nakul et al., 4 Jan 2026) gives MER’s iteration complexity: $O \left( \max \left\{ \frac{\alpha_k + 1}{(1 - \gamma)^2} \ln \frac{1}{\epsilon}, \, \frac{\alpha_k + 1}{(1 - \gamma)^2 \epsilon} \right\} \right),$ recapturing i.i.d. SA sample efficiency with no mixing time dependence.

(b) Generalized Linear Models:

For samples $(a_t, y_t)$ with $y_t = f(a_t^\top x^*) + v_t$ , where $a_t$ is a Markov chain and $f$ is Lipschitz and strongly monotone,

$\widetilde F(x, (a,y)) = a f(a^\top x) - a y.$

Corollary 5.1 (Nakul et al., 4 Jan 2026) bounds MER’s sample complexity by

$O \left( \max \left\{ \frac{\alpha_k + 1}{\mu_f^2 \kappa^2} \ln \frac{1}{\epsilon}, \, \frac{\alpha_k + 1}{\mu_f^2 \kappa^2 \epsilon} \right\} \right),$

matching i.i.d. optimality up to logarithmic factors.

6. Summary of Theoretical and Practical Attributes

MER’s epoch-wise replay schedule yields:

$O(1/T)$ stochastic error rates whenever replay separation exceeds mixing time,
Automatic adaptation to mixing dynamics without tuning,
Two-sided guarantees bounding deviation from i.i.d. trajectory during early epochs,
Theoretical and empirical superiority or parity relative to uniform and skip replay,
Best-known mixing-agnostic guarantees for policy evaluation and statistical estimation under Markovian sampling.

By integrating experience replay with a structured multi-scale sequence, MER achieves practical robustness and theoretically optimal rates in a Markovian data setting (Nakul et al., 4 Jan 2026).

PDF Markdown Chat (Pro)

References (1)

Multiscale replay: A robust algorithm for stochastic variational inequalities with a Markovian buffer (2026)

Whiteboard

Generate a whiteboard explanation of this topic.

Topic to Video (Beta)

Generate a video overview of this topic.

Follow Topic

Get notified by email when new papers are published related to Multiscale Experience Replay (MER).