Richardson-Romberg Extrapolation

Updated 11 August 2025

Richardson-Romberg extrapolation is a numerical technique that combines estimators at different discretization or step sizes to cancel leading-order bias.
It is widely applied in numerical integration, stochastic approximation, machine learning, and federated learning to improve convergence and computational efficiency.
The method leverages known asymptotic error expansions and solves linear systems for optimal weights, achieving high-precision estimates with controlled bias and variance.

The Richardson-Romberg (RR) extrapolation procedure is a numerical method for accelerating convergence and reducing bias in computed estimators. Originally developed for deterministic numerical integration and extended extensively across stochastic algorithms, RR extrapolation leverages a known error expansion of an estimator in terms of some discretization or tuning parameter (such as time-step, step size, or iteration count), combining results at different parameter values to cancel leading-order error terms. The method is fundamental in numerical analysis, stochastic approximation, statistical estimation, machine learning, and scientific computing, enabling high-precision computation at reduced cost or under difficult statistical conditions.

1. Theoretical Basis and Asymptotic Error Expansions

At the core of RR extrapolation is the presence of a predictable asymptotic expansion for the estimator error as a function of a discretization or hyperparameter. Consider an estimator $x_h$ for a target quantity $x^*$ with discretization parameter $h$ , admitting

$x_h = x^* + C_1 h^\alpha + C_2 h^{2\alpha} + \cdots + o(h^{R\alpha})$

for some positive exponent $\alpha$ and constants $C_j$ . The bias, therefore, is dominated by the leading term $C_1 h^\alpha$ for small $h$ .

RR extrapolation constructs two (or more) estimators at different values of $h$ (e.g., $h$ and $2h$), and forms an appropriate linear combination that cancels the lowest-order bias. For the two-level case: $\tilde{x}_h = \frac{2^\alpha x_{h/2} - x_h}{2^\alpha - 1}$ so that, after substitution, the $O(h^\alpha)$ bias vanishes and the next order ( $O(h^{2\alpha})$ ) dominates. Higher-order RR extrapolation generalizes this using linear systems (often Vandermonde matrices) to set weights so the first $R-1$ terms, for any given expansion order $R$ , are eliminated.

This approach is applicable under mild regularity conditions: the existence of the expansion (i.e., simulation smoothness, or sufficient differentiability in the case of numerical integration), and, in stochastic contexts, the stability of error moments.

2. RR Extrapolation in Stochastic Approximation and Markovian Algorithms

In stochastic approximation (SA) and stochastic optimization, RR extrapolation is instrumental for bias reduction, particularly when constant step size or time discretization is used. In typical constant-step size SA or LSA algorithms under Markovian noise, the averaged iterates admit a bias expansion: $\mathbb{E}[\bar{\theta}_n] - \theta^* = \alpha \Delta + O(\alpha^{3/2})$ where $\bar{\theta}_n$ is a Polyak–Ruppert averaged iterate, $\alpha$ is the fixed step size, and $\Delta$ is a problem-specific bias coefficient. This nonzero bias persists even in large samples and cannot be eliminated merely by averaging (Huo et al., 2022, Levin et al., 7 Aug 2025, Zhang et al., 25 Jan 2024).

The RR estimator is generated via coupled runs with step sizes $\alpha$ and $2\alpha$ : $\theta^{\text{RR}}_n = 2 \bar{\theta}_n^{(\alpha)} - \bar{\theta}_n^{(2\alpha)}$ Plugging in the bias expansions, the linear term in $\alpha$ cancels, yielding: $\mathbb{E}[\theta^{\text{RR}}_n] - \theta^* = O(\alpha^{3/2})$ This reduction has an immediate effect on mean-squared error, especially in practical regimes where bias dominates variance.

Further, the procedure generalizes to multiple step sizes ( $\alpha_1, \ldots, \alpha_m$ ), with weights $h_j$ chosen such that

$\sum_{j} h_j \alpha_j^{\ell} = 0 \quad \text{for} \quad \ell = 1, \ldots, m-1; \qquad \sum_j h_j = 1$

so that the bias after extrapolation is $O(\alpha^m)$ . In practice, geometric or equispaced stepsizes are used, with explicit control of variance inflation by weight norm bounds (Huo et al., 2023).

A central consideration is that RR extrapolation does not alter the leading-order variance term, which (post-extrapolation) achieves the minimax optimal constant (e.g., $\sqrt{\operatorname{Tr}(\Sigma_\infty)} n^{-1/2}$ for Polyak-Ruppert averaged LSA, (Levin et al., 7 Aug 2025)).

3. RR Extrapolation in Deterministic and Monte Carlo Numerical Integration

In numerical integration, RR extrapolation provides a key bias cancellation mechanism within both deterministic and Monte Carlo frameworks.

For example, in composite quadrature (trapezoidal or projection methods), an asymptotic expansion of the error is available: $A_h - \int_a^b f(x) dx = C_1 h^p + C_2 h^q + O(h^r)$ Classically, RR extrapolation uses approximations at $h$ , $2h$ (or more levels), with

$R_h = \frac{A_h - A_{2h}}{2^p - 1}$

to estimate and cancel leading error.

Higher-order extrapolation steps are recursively possible by combining multiple approximations and forming linear combinations so as to eliminate the first $k$ bias terms. The method extends to flexible partitionings (not restricted to powers of two), alternative composite rules, and is accompanied by a full error analysis including precise formulas for the extrapolation weights and remainder terms (Youngberg, 2012, Rakshit et al., 2019).

In stochastic integration contexts, e.g., Monte Carlo simulation of SDEs or nested expectations, the multilevel Richardson–Romberg method achieves higher-order bias elimination analogous to deterministic settings, with an explicit tradeoff analysis between computational cost and RMSE (Lemaire et al., 2014).

4. RR Extrapolation in Machine Learning, Reinforcement Learning, and Federated Optimization

In modern machine learning and reinforcement learning algorithms, RR extrapolation is used to reduce estimator bias arising from fixed hyperparameters (stepsize, regularization strength, iteration budget). For example, in stochastic gradient descent (SGD) with constant stepsize on strongly convex/smooth objectives and Polyak–Ruppert averaging, the estimator has bias of the form

$\mathbb{E}[\bar{\theta}_n^{(\gamma)}] = \theta^* + \gamma\,\Delta_1 + B_1\,\gamma^{3/2} + R_1$

The RR estimator,

$\bar{\theta}_n^{(\text{RR})} = 2\bar{\theta}_n^{(\gamma)} - \bar{\theta}_n^{(2\gamma)}$

removes the linear-in- $\gamma$ bias, with the resulting error expansion achieving the statistical efficiency floor set by the central limit theorem—i.e., the leading term matches the minimax-optimal asymptotic covariance (Sheshukova et al., 7 Oct 2024).

In federated learning (FedAvg), global iterates have a bias, even after averaging, which decomposes into a stochastic component (due to local SGD noise) and a heterogeneity component (due to client differences). Both can be mitigated using RR extrapolation: by combining global estimates from runs with different global step sizes or local-update counts, the leading-order bias ( $O(\gamma H)$ ) can be removed, yielding higher-order residuals (e.g., $O(\gamma^2 H^2 + \gamma^{3/2} H)$ ). Significantly, the RR correction demands no communication overhead and is implemented entirely at the server (Mangold et al., 2 Dec 2024).

In temporal-difference (TD) learning and Q-learning, constant-stepsize algorithms in Markovian data environments converge exponentially fast to a stationary distribution whose mean remains biased. RR extrapolation, via parallel runs with stepsizes $\alpha$ and $2\alpha$ , achieves cancellation of the leading bias: $\widetilde{q}^{(\alpha)} = 2\bar{q}^{(\alpha)} - \bar{q}^{(2\alpha)}$ yielding superior estimates—a phenomenon confirmed both theoretically and empirically (Huo et al., 2022, Zhang et al., 25 Jan 2024).

5. Algorithmic Formulation, Error Dynamics, and High-Order Bounds

The RR extrapolation algorithm, in its general setting, requires parallel computation of estimators at multiple discretization or hyperparameter levels, followed by a weighted linear combination with weights determined to cancel the expansion's first $k$ terms. This is systematically done via:

Constructing the estimator sequence $\{\theta^{(r\cdot n)}\}_{r=1}^R$ for discretization levels $rn$
Computing the weight vector $w = (w_1, ..., w_R)$ by solving a Vandermonde system to set the coefficients of $n^{-\alpha}$ , $n^{-2\alpha}$ , ..., $n^{-(R-1)\alpha}$ to zero and ensure normalization
Combining: $\Theta_{n, R} = \sum_{r=1}^R w_r \theta^{(r n)}$ The procedure is provably optimal with respect to bias order, and the remaining estimator variance is, up to the weight-induced inflation, comparable to the lowest-variance constituent among the run replicates.

In high-order regimes (e.g., for moment bounds), the RR estimator matches the asymptotically optimal variance—specifically, for LSA algorithms with Markovian noise, after RR extrapolation the $L^p$ error is

$\mathbb{E}^{1/p}\left[ \|\theta_n^{\text{RR}} - \theta^*\|^p \right] \leq 2C_1 \sqrt{\operatorname{Tr}(\Sigma_\infty)} p^{1/2} n^{-1/2} + R_n^{(fl)} + R_n^{(tr)}$

where $R_n^{(fl)}$ (fluctuation) and $R_n^{(tr)}$ (transient) terms decay with $n$ , and $\Sigma_\infty$ is the limiting optimal covariance (Levin et al., 7 Aug 2025). Fast geometric forgetting is enabled by constant step sizes, and RR extrapolation is crucial for achieving optimal error scaling in the presence of nonzero bias.

6. Applications, Numerical Verification, and Implementation Guidance

Table: RR Extrapolation Applications in Key Domains

Domain	Error Structure Needed	RR Implementation Mode
Numerical Integration	Expansion in powers of $h$	Multiple quadratures at varying mesh sizes; combine via analytic weights (Youngberg, 2012, Rakshit et al., 2019)
Markovian SA/LSA	Expansion in step size $\alpha$	Multiple SA runs at distinct stepsizes; combine via weights from Vandermonde systems (Huo et al., 2022, Levin et al., 7 Aug 2025)
SGD/ML Optimization	Expansion in stepsize $\gamma$	Paired/coupled SGD chains at $\gamma, 2\gamma$ ; extrapolate after tail-averaging (Sheshukova et al., 7 Oct 2024)
RL (Q-learning, TD)	Expansion in constant stepsize $\alpha$	Parallel agent runs at different $\alpha$ ; combine as above (Zhang et al., 25 Jan 2024)
Federated Learning	Expansion in global/local stepsize/count	Federated server collects different estimates; forms extrapolated global estimate (Mangold et al., 2 Dec 2024)

Empirical and theoretical studies demonstrate that the RR estimator systematically outperforms the baseline (Polyak–Ruppert, vanilla integration, etc.) estimator both in bias and, after accounting for minor variance inflation due to the weights, in root-mean-square convergence rate (Levin et al., 7 Aug 2025, Sheshukova et al., 7 Oct 2024).

Implementation requirements include careful balancing of variance and bias, choice of step sizes (geometric spacing or equidistant spacing per the constraints of the problem), and, in some cases, simultaneous simulation of multiple chains under shared randomness.

7. Limitations, Edge Cases, and Practical Considerations

Expansion Validity: The method critically depends on the veracity of the error expansion (smoothness of the function, regularity of noise). In irregular or highly non-smooth models, the expansion may fail or higher-order terms may dominate for large step sizes, potentially invalidating the extrapolation (Mikkelsen et al., 12 Jun 2024).
Numerical Stability: High-order extrapolation may suffer from ill-conditioned linear systems, causing large variance inflation in the resulting weights (Ascoli, 2018).
Computational Cost: Each added extrapolation level incurs the cost of a new run, possibly offset by parallel execution but imposing memory and scheduling requirements.
Variance-Bias Tradeoff: Extrapolation reduces bias at the expense of increased variance due to the weights (especially as the number of extrapolation nodes grows)—optimal node selection and step size scheduling are therefore crucial (Krebsbach et al., 2022).
Zero-Bias Scenarios: In highly structured noise settings (independent additive noise, certain semi-simulated RL setups), the bias already vanishes and extrapolation is unnecessary (Huo et al., 2023).

8. Significance and Current Research Directions

The RR extrapolation procedure underpins state-of-the-art performance in multiple settings, particularly for stochastic optimization under Markovian noise, constant step sizes, and federated environments where bias otherwise cannot be eliminated by averaging. Its impact is seen in strong theoretical bounds—those that match known optimality thresholds (e.g., covariance structure in CLT settings)—and in empirical evidence for practical improvements in computational statistics and learning. Open directions include adaptive schedules for step size/node selection, finite-sample optimal allocation of computational resources (Krebsbach et al., 2022), and extensions to more complex non-linear or non-Markovian data generating processes.

In summary, RR extrapolation is a cornerstone of modern error reduction strategies, generalizing bias cancellation well beyond its origins in numerical integration to high-dimensional, stochastic, and distributed computation. By leveraging known error expansions and flexible combination strategies, it achieves accelerated convergence and statistically optimal performance across a breadth of applications.