Multiplier Bootstrap in Stochastic Approximation

Updated 28 October 2025

Multiplier bootstrap is a resampling method that assigns i.i.d. multiplicative weights to data samples, preserving the dependence structure for valid inference.
It offers non-asymptotic guarantees with Berry–Esseen-type bounds, achieving an error rate of approximately n⁻¹/4 in online and high-dimensional algorithms.
This computationally efficient technique is applied in online learning, TD-learning, and high-dimensional settings, ensuring precise finite-sample calibration of confidence intervals.

The multiplier bootstrap is a resampling method that constructs random, weighted analogues of empirical statistics by assigning independently generated multiplicative weights (multipliers) to samples. It is widely used for constructing valid inference under complex dependence, high-dimensionality, or non-asymptotic conditions, and serves as a computationally efficient alternative to classical resampling—especially in large-scale, online, or structured-geometry settings.

1. Formulation and Core Principles

Given data observations $\{Z_1, \ldots, Z_n\}$ and a target estimator $\widehat{\theta}_n$ (often the solution to an estimating equation or minimizer of a loss), the multiplier bootstrap creates pseudo-samples by generating i.i.d. multipliers $\{w_i\}$ (typically $\mathbb{E}w_i=1$ , $\mathrm{Var}(w_i)=1$ ) and constructs a "bootstrap world" estimator, $\widehat{\theta}_n^b$ , via the same procedure but with observations reweighted by the $w_i$ .

In Polyak-Ruppert-averaged Linear Stochastic Approximation (LSA), the procedure is:

Main LSA recursion:

$\theta_k = \theta_{k-1} - \alpha_k (A_k \theta_{k-1} - b_k)$

for step size $\alpha_k$ and observed $(A_k, b_k)$ .

Polyak-Ruppert-averaged estimator:

$\bar{\theta}_n = \frac{1}{n} \sum_{k=n}^{2n-1} \theta_k$

Multiplier bootstrap (for each trajectory $1 \leq \ell \leq M$ ):

$\theta_k^{b,\ell} = \theta_{k-1}^{b,\ell} - \alpha_k w_k^\ell (A_k \theta_{k-1}^{b,\ell} - b_k), \quad \theta_n^{b,\ell} = \theta_n$

$\bar{\theta}_n^{b,\ell} = \frac{1}{n} \sum_{k=n}^{2n-1} \theta_k^{b,\ell}$

and the bootstrap law is that of $\,\sqrt{n}(\bar{\theta}_n^b - \bar{\theta}_n)\,$ conditional on the data.

This approach does not resample observations, but randomly perturbs all observations through the i.i.d. multipliers, ensuring preservation of dependence structure and enabling fast computation (see also (Chernozhukov et al., 2012, Adamek et al., 2023)).

2. Non-Asymptotic Theoretical Guarantees

A principal advance is the establishment of Berry–Esseen-type (BE) finite-sample error bounds characterizing the quality of both the Gaussian and multiplier bootstrap approximations in high and moderate dimensions, and for online, non-i.i.d., or stochastic algorithms.

Polyak-Ruppert LSA: Berry-Esseen Bound and Bootstrap Validity

Under i.i.d. data and regularity, for aggressive schedule $\alpha_k = c_0/k^{1/2}$ :

$\sup_{B \in \mathrm{Conv}(\mathbb{R}^d)} \left| P\left(\sqrt{n}(\bar{\theta}_n - \theta^*) \in B\right) - P(\Sigma_{\infty}^{1/2}\eta \in B) \right| \lesssim n^{-1/4}$

$C_A,\Sigma_\epsilon$ are problem constants, and $\Sigma_\infty$ is the limiting covariance.

For the bootstrap (for $M$ independent replicates per trajectory), with high probability ($1 - 6/n$), for $\alpha_k = c_0 / k^{1/2}$ and large enough $n$ :

$\sup_{B \in \mathrm{Conv}(\mathbb{R}^d)} \left| P\left(\sqrt{n}(\bar{\theta}_n^b - \bar{\theta}_n) \in B\right) - P\left(\sqrt{n}(\bar{\theta}_n - \theta^*) \in B\right) \right| \lesssim n^{-1/4}$

(see Theorem 4.2). Lower order terms not shown.

Confidence sets based on bootstrap quantiles achieve finite-sample coverage error of order $n^{-1/4}$ :

$\mathcal{E}(\alpha) = \left\{ \theta : \sqrt n \| B (\theta - \bar{\theta}_n) \| \leq t_\alpha^b \right\}$

$\theta^* \in \mathcal{E}(\alpha)$ with probability $\approx 1-\alpha$ up to $n^{-1/4}$ .

These rates are sharp; for Polyak-Ruppert averaged LSA under the optimal schedule, higher order is generally unachievable due to the associated martingale remainder.

3. Assumptions and Scope

The finite-sample validity holds under the following technical assumptions (summarized from Assumptions 2–4 and 7):

Data is i.i.d. and $z \mapsto A(z)$ is uniformly bounded.
The averaged mapping $A=\mathbb{E}[A(Z)]$ is Hurwitz: $A + A^\top \succ 0$ .
Covariance $\Sigma_\epsilon$ has $\lambda_{\min}(\Sigma_\epsilon)$ bounded away from zero.
Step size: $\alpha_k = c_0 k^{-1/2}$ with $c_0$ small enough for stability, and $n$ large.
For generalizations to non-i.i.d. scenarios (e.g., in time series bootstrap, (Adamek et al., 2023, Chernozhukov et al., 2012)), additional dependence restrictions apply (e.g., mixing, or the VAR structure).
Bootstrap weights are i.i.d. (common choices: standard normal, Rademacher, exponential, Poisson).

4. Analytical Techniques and Key Methods

The core technical tool is a multivariate concentration inequality for nonlinear, non-i.i.d. or martingale-type statistics, together with a careful decomposition of the stochastic approximation error into martingale and negligible remainder terms.

The explicit expansion of the PR-averaged error is analyzed (using equations analogous to (9)–(15) in the source), ensuring both the Gaussian approximation and its bootstrap counterpart are comparable up to the given rate. The proofs combine Berry–Esseen bounds for martingale difference arrays with randomized Lindeberg and permutation arguments, yielding explicit coupling distances.

The multiplier bootstrap inherits all dependencies from the data, so it directly adapts to the nonlinearities and online nature of LSA (and, by extension, stochastic gradient descent—see (Sheshukova et al., 10 Feb 2025)).

5. Implementation, Applications, and Practical Impact

The multiplier bootstrap is natively implementable for online algorithms such as LSA and TD learning:

Online Implementation: At each time step, alongside the main update, bootstrap replicates are advanced in parallel using new multipliers. Only the most recent iterates are stored per trajectory as averages are formed over a rolling window.
Statistical Inference: Finite-sample confidence intervals and sets can be constructed by taking quantiles from the empirical law of $\sqrt{n}(\bar{\theta}^b_n - \bar{\theta}_n)$ . No explicit estimation or inversion of an asymptotic covariance is required.
Temporal Difference (TD) Learning: The results apply directly to TD-learning with linear function approximation for policy evaluation in reinforcement learning. The random trajectories $\{Z_k\}$ encode state–action–next-state transitions; all conditions for the theoretical results are verifiable in standard RL setups.
Finite-Sample Calibration: Numerical experiments confirm the empirical Kolmogorov and other distance metrics between the empirical and normal (as well as bootstrap) distributions decay at rate $n^{-1/4}$ for the correct step size, providing precise quantification of statistical uncertainty from moderate samples.

6. Broader Relations and Theoretical Context

The established results are the first to deliver explicit non-asymptotic Berry–Esseen and multiplier bootstrap validity for Polyak-Ruppert-averaged LSA. They bridge a gap in the theory of online and stochastic approximation procedures by making inference feasible without large-sample or limiting normality assumptions.

Compared to concentration-based bounds, bootstrap-based confidence sets provide much sharper control, especially in the sample sizes typical of modern reinforcement learning or online data streams.

The same multiplier bootstrap structure extends to highly structured and high-dimensional settings, including Bures-Wasserstein space (Kroshnin et al., 2021), high-dimensional means and maxima (Chernozhukov et al., 2012), and non-linear statistics arising in U-statistics and empirical processes (Chen et al., 2017), always with sharp or nearly optimal error bounds.

Summary Table: Key Results in PR-Averaged LSA Multiplier Bootstrap

Result	Formula	Commentary
LSA recursion	$\theta_k = \theta_{k-1} - \alpha_k (A_k \theta_{k-1} - b_k)$	Linear stochastic approx. step
PR average	$\bar{\theta}_n = \tfrac{1}{n} \sum_{k=n}^{2n-1} \theta_k$	Polyak-Ruppert averaging
Multiplier bootstrap	see above, with $w_k$ multipliers	Random weights for each step
CLT	$\sqrt n (\bar{\theta}_n - \theta^*) \xrightarrow{d} \mathcal{N}(0, \Sigma_\infty)$	Standard asymptotic statement
Berry–Esseen bound	$\sup_B \|P(\sqrt n (\bar{\theta}_n - \theta^*) \in B) - P(\Sigma_\infty^{1/2}\eta \in B)\| \lesssim n^{-1/4}$	Non-asymptotic normal approximation
Bootstrap approximation error	$\sup_B \|P(\sqrt n (\bar{\theta}_n^b - \bar{\theta}_n) \in B) - P(\sqrt n (\bar{\theta}_n - \theta^*) \in B)\| \lesssim n^{-1/4}$	Bootstrap-based CI validity

These findings deliver robust, practically implementable, and theoretically optimal methods for finite-sample inference in stochastic approximation, temporal difference learning, and online regression, with the multiplier bootstrap as a cornerstone tool.