Finite-Time Analysis of Projected Two-Time-Scale Stochastic Approximation

Published 31 Mar 2026 in eess.SY and cs.LG | (2604.00179v1)

Abstract: We study the finite-time convergence of projected linear two-time-scale stochastic approximation with constant step sizes and Polyak--Ruppert averaging. We establish an explicit mean-square error bound, decomposing it into two interpretable components, an approximation error determined by the constrained subspace and a statistical error decaying at a sublinear rate, with constants expressed through restricted stability margins and a coupling invertibility condition. These constants cleanly separate the effect of subspace choice (approximation errors) from the effect of the averaging horizon (statistical errors). We illustrate our theoretical results through a number of numerical experiments on both synthetic and reinforcement learning problems.

Abstract PDF Upgrade to Chat

Authors (3)

Summary

The paper establishes explicit finite-time error bounds for projected TTSA using PR averaging by decomposing error into statistical and subspace-induced approximation components.
It utilizes constant step sizes and martingale-difference noise to analyze convergence rates, demonstrating an O(1/T) decay validated through synthetic experiments and GTD in policy evaluation.
The study provides actionable insights for selecting subspaces and tuning parameters in high-dimensional reinforcement learning and bilevel optimization settings.

Finite-Time Error Bounds for Projected Two-Time-Scale Stochastic Approximation

Introduction

This paper presents a rigorous finite-time analysis for projected linear two-time-scale stochastic approximation (TTSA) with constant step sizes and Polyak–Ruppert (PR) averaging. The projected variant constrains the iterates $(x_t, y_t)$ to prescribed subspaces $(\mathcal{X}, \mathcal{Y})$ , addressing scenarios where the ambient dimension prohibits full-space updates, as is common in policy evaluation with function approximation or in bilevel optimization. The analysis separates contributions from statistical error and bias due to the subspace constraint, providing explicit mean-square error bounds with interpretable constants tied to subspace geometry and system coupling.

Problem Formulation

TTSA considers coupled systems where variables $x$ (“fast”) and $y$ (“slow”) evolve on distinct time scales under noisy feedback: $g(x^*,y^*) = A_{ff} x^* + A_{fs} y^* - b_1 = 0,\quad h(x^*,y^*) = A_{sf} x^* + A_{ss} y^* - b_2 = 0$ The projected TTSA algorithm enforces $x_t \in \mathcal{X} \subset \mathbb{R}^n$ and $y_t \in \mathcal{Y} \subset \mathbb{R}^m$ via orthogonal projection, and updates the variables using constant step sizes $(\alpha,\beta)$ and martingale-difference noise, then applies PR-averaging: $\bar x_T = \frac{1}{T} \sum_{t=0}^{T-1} x_t,\quad \bar y_T = \frac{1}{T} \sum_{t=0}^{T-1} y_t.$ The analysis is fundamentally concerned with the error relative to the unconstrained solution, which is decomposed as statistical error (from stochasticity and finite $T$ ) plus an inherent subspace-induced approximation error.

Main Theoretical Results

The primary result is a non-asymptotic bound for the mean-square error of the PR averages, establishing that for suitable constant step sizes and under standard stability and martingale difference assumptions,

$(\mathcal{X}, \mathcal{Y})$ 0

where

$(\mathcal{X}, \mathcal{Y})$ 1
$(\mathcal{X}, \mathcal{Y})$ 2

The constants $(\mathcal{X}, \mathcal{Y})$ 3 and $(\mathcal{X}, \mathcal{Y})$ 4 are explicit functions of system matrices, restricted stability margins, and noise variances. The essential interpretation is:

Approximation error ( $(\mathcal{X}, \mathcal{Y})$ 5 terms): Controlled entirely by the expressiveness of the chosen subspaces; cannot be improved by iteration.
Statistical error ( $(\mathcal{X}, \mathcal{Y})$ 6 terms): Vanishes as $(\mathcal{X}, \mathcal{Y})$ 7, with the rate and scaling determined by system coupling, subspace alignment, and noise.

The statistical terms clearly delineate noise propagation through cross-coupled updates and how stability margins amplify error. The constants also cleanly separate the influence of coupling structure (through terms like $(\mathcal{X}, \mathcal{Y})$ 8) from projection error and noise.

Numerical Validation

Theoretical results are validated in two settings: synthetic linear systems and gradient temporal-difference (GTD) learning for policy evaluation with feature constraints.

Synthetic linear system: When iterates are projected to rank-reduced subspaces, the convergence of PR-averaged iterates exhibits an initial $(\mathcal{X}, \mathcal{Y})$ 9 decay followed by plateauing at the approximation error corresponding to the subspace. Fast and slow variables exhibit different transients consistent with their respective time scales.
GTD with Feature Mismatch: The effect of feature subspace alignment is studied on a finite-state Markov Decision Process. For well-aligned features (those capturing most value function mass), statistical error decays rapidly to a small approximation error. For poorly aligned subspaces, the approximation error dominates regardless of the statistical rate, but the $x$ 0 statistical decay is consistent, demonstrating robustness of the statistical component across subspaces.

The off-diagonal constants ( $x$ 1) in the error bounds are empirically verified: even when one subspace approximates well, coupling can induce error from the orthogonal component of the other. The overall empirical evidence supports the bound and its decomposability.

Implications and Future Directions

The explicit $x$ 2 finite-time rate for PR-averaged projected TTSA iterates enables theoretically principled guidance in subspace design, step-size selection, and error prediction for practical applications, including large-scale RL with function approximation, distributed/bilevel optimization, and primal-dual methods. The bias–variance decomposition sharpens the theoretical understanding of the fundamental approximation–estimation tradeoff in these constrained stochastic algorithms.

Potential extensions of this work include:

Characterizing optimal subspace selection for a target error tradeoff.
Extending the analysis to Markovian noise and nonlinear settings.
Leveraging the decoupling of approximation and statistical errors for adaptive algorithms or in meta-learning scenarios.
Integration with model-based or representation learning frameworks to minimize $x$ 3.

Conclusion

The paper establishes finite-time, explicit, and interpretable error bounds for projected linear TTSA with PR averaging, providing precise characterization of how subspace constraints and statistical variability affect convergence in high-dimensional or resource-constrained SA applications. The sharp bias–variance decomposition advances both the theory and practice of constrained stochastic approximation, with direct applications in reinforcement learning and optimization (2604.00179).

Markdown Report Issue