Closing the gap between SVRG and TD-SVRG with Gradient Splitting (2211.16237v4)
Abstract: Temporal difference (TD) learning is a policy evaluation in reinforcement learning whose performance can be enhanced by variance reduction methods. Recently, multiple works have sought to fuse TD learning with Stochastic Variance Reduced Gradient (SVRG) method to achieve a geometric rate of convergence. However, the resulting convergence rate is significantly weaker than what is achieved by SVRG in the setting of convex optimization. In this work we utilize a recent interpretation of TD-learning as the splitting of the gradient of an appropriately chosen function, thus simplifying the algorithm and fusing TD with SVRG. Our main result is a geometric convergence bound with predetermined learning rate of $1/8$, which is identical to the convergence bound available for SVRG in the convex setting. Our theoretical findings are supported by a set of experiments.
- Stop wasting my gradients: Practical svrg. In C. Cortes, N. Lawrence, D. Lee, M. Sugiyama, and R. Garnett (eds.), Advances in Neural Information Processing Systems, volume 28. Curran Associates, Inc., 2015.
- A finite time analysis of temporal difference learning with linear function approximation. In Sébastien Bubeck, Vianney Perchet, and Philippe Rigollet (eds.), Proceedings of the 31st Conference On Learning Theory, volume 75 of Proceedings of Machine Learning Research, pp. 1691–1692. PMLR, 06–09 Jul 2018.
- Openai gym. arXiv preprint arXiv:1606.01540, 2016.
- Finite sample analyses for td(0) with function approximation. Proceedings of the AAAI Conference on Artificial Intelligence, 32(1), Apr. 2018.
- Policy evaluation with temporal differences: A survey and comparison. Journal of Machine Learning Research, 15:809–883, 2014.
- Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In Z. Ghahramani, M. Welling, C. Cortes, N. Lawrence, and K.Q. Weinberger (eds.), Advances in Neural Information Processing Systems, volume 27, 2014.
- Stochastic variance reduction methods for policy evaluation. In Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pp. 1049–1058, 06–11 Aug 2017.
- Accelerating stochastic gradient descent using predictive variance reduction. In C.J. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K.Q. Weinberger (eds.), Advances in Neural Information Processing Systems, volume 26. Curran Associates, Inc., 2013.
- On td(0) with function approximation: Concentration bounds and a centered variant with exponential convergence. In Proceedings of the 32nd International Conference on Machine Learning (ICML), pp. 626–634, 2015.
- Temporal difference learning as gradient splitting. In Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pp. 6905–6913, 18–24 Jul 2021.
- Variance-reduced off-policy tdc learning: Non-asymptotic convergence analysis. Advances in Neural Information Processing Systems, 33:14796–14806, 2020.
- Human-level control through deep reinforcement learning. Nature, 518(7540):529–533, February 2015.
- C Narayanan and Csaba Szepesvári. Finite time bounds for temporal difference learning with function approximation: Problems with some “state-of-the-art” results. Technical report, Technical report, 2017.
- Svrg for policy evaluation with fewer gradient evaluations. In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI-20, pp. 2697–2703, 7 2020. doi: 10.24963/ijcai.2020/374. Main track.
- A stochastic approximation method. The annals of mathematical statistics, pp. 400–407, 1951.
- Minimizing finite sums with the stochastic average gradient, 2013.
- Finite-time error bounds for linear stochastic approximation andtd learning. In Conference on Learning Theory, pp. 2803–2830. PMLR, 2019.
- Richard Sutton. Learning to predict by the methods of temporal differences. Mach Learn, 3, 1988.
- Fast gradient-descent methods for temporal-difference learning with linear function approximation. In Proceedings of the 26th annual international conference on machine learning, pp. 993–1000, 2009.
- Convergent tree backup and retrace with function approximation. In International Conference on Machine Learning, pp. 4955–4964. PMLR, 2018.
- J.N. Tsitsiklis and B. Van Roy. An analysis of temporal-difference learning with function approximation. IEEE Transactions on Automatic Control, 42(5):674–690, 1997. doi: 10.1109/9.580874.
- Reanalysis of variance reduced temporal difference learning. In International Conference on Learning Representations, 2020.