Linear Stochastic Approximation Algorithms

Updated 11 August 2025

LSA algorithms are iterative methods for solving linear systems under stochastic noise, featuring both constant and diminishing step-size schemes for applications like reinforcement learning and regression.
They utilize averaging techniques such as Polyak–Ruppert and bias reduction via Richardson–Romberg extrapolation to enhance convergence and minimize error.
Non-asymptotic error bounds and high-order statistical guarantees enable robust inference and support distributed implementations in networked systems.

Linear Stochastic Approximation (LSA) algorithms constitute a central framework for incremental, iterative solution of linear systems under stochastic uncertainty. In its prototypical form, LSA maintains a vector $\theta_n \in \mathbb{R}^d$ and updates it based on noisy linear measurements, often modeled as arising from independent or Markovian sources. The method underpins a broad range of applications in machine learning, signal processing, control, and, most notably, reinforcement learning (RL), such as temporal-difference (TD) learning and stochastic linear regression. LSA theory provides both asymptotic and nonasymptotic guarantees on convergence, error moments, statistical inference, and algorithmic stability, with rigorous characterizations now available even under Markovian noise and distributed/networked architectures.

1. Core Algorithmic Structure and Recursion

The canonical LSA recursion for estimating the solution to $A\theta = b$ takes the form: $\theta_{n+1} = \theta_n + \alpha_n (b_n - A_n \theta_n),$ where $A_n \in \mathbb{R}^{d \times d}$ , $b_n \in \mathbb{R}^d$ are noisy measurements (possibly depending on an underlying Markov process), and $\alpha_n$ is a step-size.

Two regimes are prevalent:

Constant step-size: $\alpha_n \equiv \alpha > 0$ , often chosen for rapid initial progress and effective "forgetting" of initial conditions (Lakshminarayanan et al., 2017, Huo et al., 2023).
Diminishing step-size: $\alpha_n \to 0$ , for diminishing variance and classical stochastic approximation asymptotics.

Averaged iterates are widely used to control variance, in particular Polyak–Ruppert (PR) averaging: $\bar{\theta}_{n} = \frac{1}{n+1}\sum_{k=0}^{n} \theta_k.$ In RL, the equivalent LSA structure arises in TD-learning scenarios, e.g., $A_n = \phi(s_n)[\phi(s_n) - \gamma\phi(s'_n)]^\top$ , $b_n = r_n \phi(s_n)$ under value function parameterization (Mou et al., 2020).

2. Error Analysis: Bias, Variance, and Convergence

The mean-squared error (MSE) of the averaged LSA iterates exhibits a bias–variance decomposition: $\mathbb{E}\|\bar{\theta}_n - \theta_*\|^2 \leq C_1 \frac{\|\theta_0 - \theta_*\|^2}{(n+1)^2} + C_2 \frac{\sigma^2}{n+1}$ for i.i.d. inputs, Hurwitz $A$ , and suitably chosen $C_1, C_2$ (Lakshminarayanan et al., 2017, Durmus et al., 2022). The variance term dominates for large $n$ , yielding an optimal $O(1/n)$ decay rate in MSE.

In the Markovian noise setting, the steady-state error exhibits additional bias not present in the i.i.d. case; the bias is linear in the step-size $\alpha$ (Huo et al., 2022, Levin et al., 7 Aug 2025): $\mathbb{E}[\bar{\theta}_n] - \theta_* = \alpha B^{(1)} + \alpha^2 B^{(2)} + \cdots$ The leading bias term persists even with PR averaging, and its norm scales proportionally to the mixing time of the underlying Markov process. Bias vanishes only in specific cases such as i.i.d. noise or certain semi-simulated settings (Huo et al., 2023).

The optimal scaling between bias and variance is achieved by carefully balancing step-size decay rates, e.g., $\alpha_n \sim n^{-1/2}$ for Berry–Esseen–optimal normal approximation and finite-sample inference (Samsonov et al., 26 May 2024, Samsonov et al., 25 May 2025).

3. High-Order and Non-Asymptotic Bounds

Finite-time, non-asymptotic analysis provides explicit moment and concentration inequalities:

Mean-square and higher-order moments: For constant $\alpha$ , moments up to order $O(1/(\alpha\tau))$ are finite and match those of a Gaussian. Higher moments diverge, indicating polynomially heavy tails (Srikant et al., 2019).
Non-asymptotic high-probability bounds: For any $u \in S^{d-1}$ ,

$\alpha^{-1/2}|u^\top (\theta_n - \theta_*)| \leq D_1\sqrt{(u^\top\Sigma_\epsilon u)\log(2/\delta)} + \cdots$

with polynomial concentration in $\delta$ due to the boundedness of matrix product moments (Durmus et al., 2021).

Berry–Esseen and Kolmogorov-rate bounds: For PR averages and decreasing step-size,

$\sup_{B \in \operatorname{Conv}(\mathbb{R}^d)}\left|P(\sqrt{n}(\bar{\theta}_n - \theta_*)\in B) - P(\Sigma_\infty^{1/2}\eta\in B)\right| \lesssim n^{-1/4}$

for i.i.d. noise (Samsonov et al., 26 May 2024), extending to $n^{-1/4}$ under Markovian noise (Samsonov et al., 25 May 2025).

The finite-sample performance saturates after an initial transient of size $O((1/\alpha)\log(1/\alpha))$ , with an error floor $O(\alpha \tau)$ determined by step-size and the Markov mixing time (Srikant et al., 2019).

4. Bias Removal via Richardson–Romberg Extrapolation

The bias induced by constant step-size and Markovian data can be effectively eliminated through Richardson–Romberg (RR) extrapolation:

Run multiple LSA trajectories with step-sizes $\alpha_1, ..., \alpha_M$ , and form

$\tilde{\theta} = \sum_{m=1}^M h_m \bar{\theta}^{(\alpha_m)}$

with $\sum h_m = 1$ , $\sum h_m \alpha_m^k = 0$ for $k=1,..,M-1$ (Huo et al., 2023, Huo et al., 2022, Levin et al., 7 Aug 2025).

For $M=2$ , the standard RR estimator is $2\bar{\theta}^{(\alpha)} - \bar{\theta}^{(2\alpha)}$ , which cancels the $O(\alpha)$ bias term. For $M$ step-sizes, the bias is reduced to $O(\alpha^M)$ .

This extrapolation delivers estimators with optimal minimax error up to higher-order terms, both in mean-squared error and in high-probability metrics, aligning the error rate with the asymptotically efficient covariance (Levin et al., 7 Aug 2025).

5. Statistical Inference: Confidence Intervals and Bootstrap

Statistical inference for LSA, particularly under Markovian noise, is enabled by CLT-type results and valid covariance estimation:

CLT and batch-mean inference: Under suitable step-size and mixing conditions, $\sqrt{n}(\bar{\theta}_n - [\theta_\infty]) \Rightarrow N(0, \Sigma^*)$ , where $[\theta_\infty]$ is the stationary mean (possibly biased) (Huo et al., 2023).
Covariance estimation and batching: Fast Markov mixing (with mixing time $\tau_\alpha \sim O(\log(1/\alpha))$ ) ensures that covariance can be reliably estimated from batch means. Asymptotic variance estimation achieves $O(n^{-1/8})$ rate (up to log terms) (Samsonov et al., 25 May 2025).
Multiplier and block bootstrap: Online multiplier-resampling of the iterates provides confidence intervals with quantifiably correct coverage in finite samples:

$\sup_{B \in \operatorname{Conv}(\mathbb{R}^d)} \left|P\Bigl(\sqrt{n}(\bar{\theta}_n^b-\bar{\theta}_n) \in B\Bigl| \mathcal{Z}^{2n}\Bigr) -P\Bigl(\sqrt{n}(\bar{\theta}_n-\theta_*) \in B\Bigr)\right| \lesssim n^{-1/4}$

(Samsonov et al., 26 May 2024, Samsonov et al., 25 May 2025).

These tools yield actionable finite-sample inference, outperforming classical bootstrapping in memory and computation.

6. Distributed and Networked LSA

Extension to networked and distributed architectures is achieved by developing "network-aware" LSA schemes, in which local updates at each agent are coupled via consensus dynamics:

Consensus-based LSA: Agents update local $\theta_i$ using row-stochastic mixing with neighbors and local stochastic approximation. Error bounds depend on both the step-size and network mixing properties (Lin et al., 2021).
Push-sum LSA for unidirectional communication: To achieve uniform averaging under only row-stochastic consensus, push-sum techniques are used, correcting the bias induced by directed/topologically imbalanced graphs. Finite-time error bounds for both consensus and averaging error are provided.
Distributed TD-learning: Distributed TD and policy evaluation instantiate these LSA updates at each agent, delivering value function approximation guarantees with explicit network-dependent error rates.

7. Practical Implications and Extensions

LSA theory underpins methods for:

Policy evaluation in reinforcement learning: All main RL value estimation methods—TD(0), GTD, LSTD—are instances of LSA, and their global convergence and error scaling are understood within this framework (Lakshminarayanan et al., 2017).
Robust high-dimensional inference: LSA with PR averaging and/or RR extrapolation achieves optimal uncertainty quantification in statistical learning under both i.i.d. and Markovian sampling (Huo et al., 2023, Levin et al., 7 Aug 2025).
Off-policy and eligibility trace algorithms: Using the ODE method with diminishing asymptotic rate of change conditions, the almost sure stability of LSA-type algorithms under Markovian (and even non-i.i.d.) noise can be established without explicit projections (Liu et al., 15 Jan 2024).
Control of bias-variance trade-off and algorithmic tuning: RR extrapolation provides a mechanism for bias reduction with minimal overhead, and theoretical results quantify the exact effect of step-size tuning and averaging on convergence and statistical error (Huo et al., 2022, Huo et al., 2023, Levin et al., 7 Aug 2025).

Table 1: Key Theoretical Properties under Different Settings

Setting	Error rate (MSE/CLT)	Bias scaling	Variance scaling	Remarks
i.i.d.	$O(1/n)$ + PR gives CLT	Zero	Optimal (Rao–Cramér bound)	Averaging optimal
Markovian	$O(1/n)$ + PR gives CLT to biased mean	$O(\alpha)$ (linear in step-size)	Optimal (after RR)	RR needed to eliminate bias
Decreasing step	$O(1/\sqrt{n})$ (aggressive)	$O(n^{-1/2})$	Minimax optimal	Used for Berry–Esseen/Bootstrap CIs
Constant step	$O(1/n) + O(\alpha)$ floor	$O(\alpha)$	$O(1)$ to optimal ( $O(\alpha)$ )	Fast mixing; suitable for online

For all cases, detailed moment bounds and finite-sample probabilistic error control have been recently established (Levin et al., 7 Aug 2025, Durmus et al., 2022).

References

(Lakshminarayanan et al., 2017): Linear Stochastic Approximation: Constant Step-Size and Iterate Averaging.
(Srikant et al., 2019): Finite-Time Error Bounds For Linear Stochastic Approximation and TD Learning.
(Mou et al., 2020): On Linear Stochastic Approximation: Fine-grained Polyak-Ruppert and Non-Asymptotic Concentration.
(Durmus et al., 2021): Tight High Probability Bounds for Linear Stochastic Approximation with Fixed Stepsize.
(Huo et al., 2022): Bias and Extrapolation in Markovian Linear Stochastic Approximation with Constant Stepsizes.
(Huo et al., 2023): Effectiveness of Constant Stepsize in Markovian LSA and Statistical Inference.
(Durmus et al., 2022): Finite-time High-probability Bounds for Polyak-Ruppert Averaged Iterates of Linear Stochastic Approximation.
(Samsonov et al., 26 May 2024): Gaussian Approximation and Multiplier Bootstrap for Polyak-Ruppert Averaged Linear Stochastic Approximation with Applications to TD Learning.
(Samsonov et al., 25 May 2025): Statistical inference for Linear Stochastic Approximation with Markovian Noise.
(Levin et al., 7 Aug 2025): High-Order Error Bounds for Markovian LSA with Richardson-Romberg Extrapolation.
(Lin et al., 2021): Finite-Time Error Bounds for Distributed Linear Stochastic Approximation.
(Liu et al., 15 Jan 2024): The ODE Method for Stochastic Approximation and Reinforcement Learning with Markovian Noise.

LSA thus provides a mathematically sharp, algorithmically flexible, and practically crucial foundation for modern online learning and inference in high-dimensional, stochastic, and networked systems.