$O(1/k)$ Finite-Time Bound for Non-Linear Two-Time-Scale Stochastic Approximation (2504.19375v1)

Published 27 Apr 2025 in cs.LG, cs.SY, eess.SY, math.OC, and stat.ML

Abstract: Two-time-scale stochastic approximation is an algorithm with coupled iterations which has found broad applications in reinforcement learning, optimization and game control. While several prior works have obtained a mean square error bound of $O(1/k)$ for linear two-time-scale iterations, the best known bound in the non-linear contractive setting has been $O(1/k^{2/3})$. In this work, we obtain an improved bound of $O(1/k)$ for non-linear two-time-scale stochastic approximation. Our result applies to algorithms such as gradient descent-ascent and two-time-scale Lagrangian optimization. The key step in our analysis involves rewriting the original iteration in terms of an averaged noise sequence which decays sufficiently fast. Additionally, we use an induction-based approach to show that the iterates are bounded in expectation.

Summary

The paper derives an improved finite-time bound of O(1/k) for non-linear two-time-scale stochastic approximation, surpassing previous O(1/k^(2/3)) results.
The O(1/k) bound is achieved through an innovative analytical approach using an averaged noise sequence and induction on modified iterates.
This advancement implies more efficient convergence for non-linear SA algorithms with applications in reinforcement learning and optimization.

O(1/k) Finite-Time Bound for Non-Linear Two-Time-Scale Stochastic Approximation

In this paper, Siddharth Chandak presents an analysis of non-linear two-time-scale stochastic approximation schemes, offering an enhanced finite-time bound of $O(1/k)$ for these algorithms. This represents a significant improvement over the prior $O(1/k^{2/3})$ bounds established in non-linear contractive settings. Two-time-scale stochastic approximation (SA) algorithms, characterized by coupled iterations on varying rates, are integral to various applications in reinforcement learning, optimization, and game control.

Main Contributions

The primary contribution of this paper is the derivation of an $O(1/k)$ mean square error bound for non-linear two-time-scale SA algorithms. This bound applies to algorithms like gradient descent-ascent and two-time-scale Lagrangian optimization. The improved bound stems from an innovative approach: the introduction and analysis of an averaged noise sequence to address the slower time-scale noise components. Notably, the algorithm itself does not require an explicit averaging step; this technique is solely for analytical purposes.

Furthermore, the paper establishes a more robust bound of $O(1/k^a)$ for any $a \in (0.5, 1)$ when the stepsize sequences are chosen as $\alpha_k = O(1/k^a)$ and $\beta_k = O(1/k)$ . This flexibility allows the stepsizes to be selected independently of certain system parameters, making the results practically applicable in diverse scenarios.

Analytical Techniques

The key innovation lies in redefining the noise terms and employing an induction-based approach to ensure that the iterates are bounded in expectation. By recasting the iterations in terms of noise sequences and modified iterates, $(x_k, z_k)$ , the analysis demonstrates that the expected deviation decay rate achieves $O(1/k)$ . Lemma 3 offers a recursive bound for the modified iterates, which ultimately contributes to simplifying the overall proof structure. This methodological advancement circumvents the limitations observed in prior analyses, which struggled with inherent complexities of non-linear and noisy updates.

Implications and Future Research

This advancement has substantial theoretical and practical implications. It promises more efficient convergence in non-linear SA settings and enhances the applicability of two-time-scale algorithms in reinforcement learning and optimization tasks that depend on accurate state-action value estimations or saddle point computations.

Future research directions outlined include extending these finite-time bounds to the non-linear settings impacted by Markovian noise and arbitrary norm contractions. Enhancing the robustness of these results by relaxing contractive assumptions to weaker non-expansive conditions remains an intriguing challenge. Moreover, developing high-probability bounds as opposed to mean square error bounds could broaden the applicability of the results in real-world scenarios where ensuring reliable convergence is critical.