- The paper establishes explicit finite-time error bounds for projected TTSA using PR averaging by decomposing error into statistical and subspace-induced approximation components.
- It utilizes constant step sizes and martingale-difference noise to analyze convergence rates, demonstrating an O(1/T) decay validated through synthetic experiments and GTD in policy evaluation.
- The study provides actionable insights for selecting subspaces and tuning parameters in high-dimensional reinforcement learning and bilevel optimization settings.
Finite-Time Error Bounds for Projected Two-Time-Scale Stochastic Approximation
Introduction
This paper presents a rigorous finite-time analysis for projected linear two-time-scale stochastic approximation (TTSA) with constant step sizes and Polyak–Ruppert (PR) averaging. The projected variant constrains the iterates (xt,yt) to prescribed subspaces (X,Y), addressing scenarios where the ambient dimension prohibits full-space updates, as is common in policy evaluation with function approximation or in bilevel optimization. The analysis separates contributions from statistical error and bias due to the subspace constraint, providing explicit mean-square error bounds with interpretable constants tied to subspace geometry and system coupling.
TTSA considers coupled systems where variables x (“fast”) and y (“slow”) evolve on distinct time scales under noisy feedback: g(x∗,y∗)=Affx∗+Afsy∗−b1=0,h(x∗,y∗)=Asfx∗+Assy∗−b2=0
The projected TTSA algorithm enforces xt∈X⊂Rn and yt∈Y⊂Rm via orthogonal projection, and updates the variables using constant step sizes (α,β) and martingale-difference noise, then applies PR-averaging: xˉT=T1t=0∑T−1xt,yˉT=T1t=0∑T−1yt.
The analysis is fundamentally concerned with the error relative to the unconstrained solution, which is decomposed as statistical error (from stochasticity and finite T) plus an inherent subspace-induced approximation error.
Main Theoretical Results
The primary result is a non-asymptotic bound for the mean-square error of the PR averages, establishing that for suitable constant step sizes and under standard stability and martingale difference assumptions,
(X,Y)0
where
- (X,Y)1
- (X,Y)2
The constants (X,Y)3 and (X,Y)4 are explicit functions of system matrices, restricted stability margins, and noise variances. The essential interpretation is:
- Approximation error ((X,Y)5 terms): Controlled entirely by the expressiveness of the chosen subspaces; cannot be improved by iteration.
- Statistical error ((X,Y)6 terms): Vanishes as (X,Y)7, with the rate and scaling determined by system coupling, subspace alignment, and noise.
The statistical terms clearly delineate noise propagation through cross-coupled updates and how stability margins amplify error. The constants also cleanly separate the influence of coupling structure (through terms like (X,Y)8) from projection error and noise.
Numerical Validation
Theoretical results are validated in two settings: synthetic linear systems and gradient temporal-difference (GTD) learning for policy evaluation with feature constraints.
- Synthetic linear system: When iterates are projected to rank-reduced subspaces, the convergence of PR-averaged iterates exhibits an initial (X,Y)9 decay followed by plateauing at the approximation error corresponding to the subspace. Fast and slow variables exhibit different transients consistent with their respective time scales.
- GTD with Feature Mismatch: The effect of feature subspace alignment is studied on a finite-state Markov Decision Process. For well-aligned features (those capturing most value function mass), statistical error decays rapidly to a small approximation error. For poorly aligned subspaces, the approximation error dominates regardless of the statistical rate, but the x0 statistical decay is consistent, demonstrating robustness of the statistical component across subspaces.
The off-diagonal constants (x1) in the error bounds are empirically verified: even when one subspace approximates well, coupling can induce error from the orthogonal component of the other. The overall empirical evidence supports the bound and its decomposability.
Implications and Future Directions
The explicit x2 finite-time rate for PR-averaged projected TTSA iterates enables theoretically principled guidance in subspace design, step-size selection, and error prediction for practical applications, including large-scale RL with function approximation, distributed/bilevel optimization, and primal-dual methods. The bias–variance decomposition sharpens the theoretical understanding of the fundamental approximation–estimation tradeoff in these constrained stochastic algorithms.
Potential extensions of this work include:
- Characterizing optimal subspace selection for a target error tradeoff.
- Extending the analysis to Markovian noise and nonlinear settings.
- Leveraging the decoupling of approximation and statistical errors for adaptive algorithms or in meta-learning scenarios.
- Integration with model-based or representation learning frameworks to minimize x3.
Conclusion
The paper establishes finite-time, explicit, and interpretable error bounds for projected linear TTSA with PR averaging, providing precise characterization of how subspace constraints and statistical variability affect convergence in high-dimensional or resource-constrained SA applications. The sharp bias–variance decomposition advances both the theory and practice of constrained stochastic approximation, with direct applications in reinforcement learning and optimization (2604.00179).