Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
167 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Closing the gap between SVRG and TD-SVRG with Gradient Splitting (2211.16237v4)

Published 29 Nov 2022 in cs.LG

Abstract: Temporal difference (TD) learning is a policy evaluation in reinforcement learning whose performance can be enhanced by variance reduction methods. Recently, multiple works have sought to fuse TD learning with Stochastic Variance Reduced Gradient (SVRG) method to achieve a geometric rate of convergence. However, the resulting convergence rate is significantly weaker than what is achieved by SVRG in the setting of convex optimization. In this work we utilize a recent interpretation of TD-learning as the splitting of the gradient of an appropriately chosen function, thus simplifying the algorithm and fusing TD with SVRG. Our main result is a geometric convergence bound with predetermined learning rate of $1/8$, which is identical to the convergence bound available for SVRG in the convex setting. Our theoretical findings are supported by a set of experiments.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (22)
  1. Stop wasting my gradients: Practical svrg. In C. Cortes, N. Lawrence, D. Lee, M. Sugiyama, and R. Garnett (eds.), Advances in Neural Information Processing Systems, volume 28. Curran Associates, Inc., 2015.
  2. A finite time analysis of temporal difference learning with linear function approximation. In Sébastien Bubeck, Vianney Perchet, and Philippe Rigollet (eds.), Proceedings of the 31st Conference On Learning Theory, volume 75 of Proceedings of Machine Learning Research, pp.  1691–1692. PMLR, 06–09 Jul 2018.
  3. Openai gym. arXiv preprint arXiv:1606.01540, 2016.
  4. Finite sample analyses for td(0) with function approximation. Proceedings of the AAAI Conference on Artificial Intelligence, 32(1), Apr. 2018.
  5. Policy evaluation with temporal differences: A survey and comparison. Journal of Machine Learning Research, 15:809–883, 2014.
  6. Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In Z. Ghahramani, M. Welling, C. Cortes, N. Lawrence, and K.Q. Weinberger (eds.), Advances in Neural Information Processing Systems, volume 27, 2014.
  7. Stochastic variance reduction methods for policy evaluation. In Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pp.  1049–1058, 06–11 Aug 2017.
  8. Accelerating stochastic gradient descent using predictive variance reduction. In C.J. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K.Q. Weinberger (eds.), Advances in Neural Information Processing Systems, volume 26. Curran Associates, Inc., 2013.
  9. On td(0) with function approximation: Concentration bounds and a centered variant with exponential convergence. In Proceedings of the 32nd International Conference on Machine Learning (ICML), pp.  626–634, 2015.
  10. Temporal difference learning as gradient splitting. In Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pp.  6905–6913, 18–24 Jul 2021.
  11. Variance-reduced off-policy tdc learning: Non-asymptotic convergence analysis. Advances in Neural Information Processing Systems, 33:14796–14806, 2020.
  12. Human-level control through deep reinforcement learning. Nature, 518(7540):529–533, February 2015.
  13. C Narayanan and Csaba Szepesvári. Finite time bounds for temporal difference learning with function approximation: Problems with some “state-of-the-art” results. Technical report, Technical report, 2017.
  14. Svrg for policy evaluation with fewer gradient evaluations. In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI-20, pp.  2697–2703, 7 2020. doi: 10.24963/ijcai.2020/374. Main track.
  15. A stochastic approximation method. The annals of mathematical statistics, pp.  400–407, 1951.
  16. Minimizing finite sums with the stochastic average gradient, 2013.
  17. Finite-time error bounds for linear stochastic approximation andtd learning. In Conference on Learning Theory, pp.  2803–2830. PMLR, 2019.
  18. Richard Sutton. Learning to predict by the methods of temporal differences. Mach Learn, 3, 1988.
  19. Fast gradient-descent methods for temporal-difference learning with linear function approximation. In Proceedings of the 26th annual international conference on machine learning, pp.  993–1000, 2009.
  20. Convergent tree backup and retrace with function approximation. In International Conference on Machine Learning, pp.  4955–4964. PMLR, 2018.
  21. J.N. Tsitsiklis and B. Van Roy. An analysis of temporal-difference learning with function approximation. IEEE Transactions on Automatic Control, 42(5):674–690, 1997. doi: 10.1109/9.580874.
  22. Reanalysis of variance reduced temporal difference learning. In International Conference on Learning Representations, 2020.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets