A Simple Finite-Time Analysis of TD Learning with Linear Function Approximation (2403.02476v2)
Abstract: We study the finite-time convergence of TD learning with linear function approximation under Markovian sampling. Existing proofs for this setting either assume a projection step in the algorithm to simplify the analysis, or require a fairly intricate argument to ensure stability of the iterates. We ask: \textit{Is it possible to retain the simplicity of a projection-based analysis without actually performing a projection step in the algorithm?} Our main contribution is to show this is possible via a novel two-step argument. In the first step, we use induction to prove that under a standard choice of a constant step-size $\alpha$, the iterates generated by TD learning remain uniformly bounded in expectation. In the second step, we establish a recursion that mimics the steady-state dynamics of TD learning up to a bounded perturbation on the order of $O(\alpha2)$ that captures the effect of Markovian sampling. Combining these pieces leads to an overall approach that considerably simplifies existing proofs. We conjecture that our inductive proof technique will find applications in the analyses of more complex stochastic approximation algorithms, and conclude by providing some examples of such applications.
- Richard S Sutton. Learning to predict by the methods of temporal differences. Machine learning, 3(1):9–44, 1988.
- An analysis of temporal-difference learning with function approximation. In IEEE Transactions on Automatic Control, 1997.
- Vivek S Borkar. Stochastic approximation: a dynamical systems viewpoint, volume 48. Springer, 2009.
- The ODE method for convergence of stochastic approximation and reinforcement learning. SIAM Journal on Control and Optimization, 38(2):447–469, 2000.
- On TD(0) with function approximation: Concentration bounds and a centered variant with exponential convergence. In Int. conf. on machine learning, pages 626–634. PMLR, 2015.
- C Narayanan and Csaba Szepesvári. Finite time bounds for temporal difference learning with function approximation: Problems with some “state-of-the-art” results. Technical report, Technical report, 2017.
- Linear stochastic approximation: Constant step-size and iterate averaging. arXiv:1709.04073, 2017.
- Finite sample analyses for TD (0) with function approximation. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 32, 2018.
- A finite time analysis of temporal difference learning with linear function approximation. In Conference on learning theory, pages 1691–1692. PMLR, 2018.
- Finite-time error bounds for linear stochastic approximation and TD learning. In Conference on Learning Theory, pages 2803–2830. PMLR, 2019.
- Stochastic approximation with delayed updates: Finite-time rates under Markovian sampling. arXiv:2402.11800, 2024.
- Martin L Puterman. Markov decision processes. Handbooks in operations research and management science, 2:331–434, 1990.
- Markov chains and mixing times, volume 107. American Math. Soc., 2017.
- Performance of Q-learning with linear function approximation: Stability and finite-time analysis. arXiv:1905.11425, page 4, 2019.
- Temporal difference learning with compressed updates: Error-feedback meets reinforcement learning. arXiv:2301.00944, 2023.
- Sebastian U Stich. On communication compression for distributed optimization on heterogeneous data. arXiv:2009.02388, 2020.
- Abhijit Gosavi. Boundedness of iterates in Q-learning. Systems & control letters, 55(4):347–349, 2006.
- Error bounds for constant step-size Q-learning. Systems & control letters, 61(12):1203–1208, 2012.
- Finite-time analysis of asynchronous stochastic approximation and Q-learning. In Conference on Learning Theory, pages 3185–3205. PMLR, 2020.
- Thinh T Doan. Finite-time analysis of markov gradient descent. IEEE Transactions on Automatic Control, 68(4):2140–2153, 2022.
- On the performance of temporal difference learning with neural networks. arXiv:2312.05397, 2023.
- Temporal difference learning as gradient splitting. In International Conference on Machine Learning, pages 6905–6913. PMLR, 2021.