Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Global Convergence of Natural Policy Gradient with Hessian-aided Momentum Variance Reduction (2401.01084v2)

Published 2 Jan 2024 in cs.LG and math.OC

Abstract: Natural policy gradient (NPG) and its variants are widely-used policy search methods in reinforcement learning. Inspired by prior work, a new NPG variant coined NPG-HM is developed in this paper, which utilizes the Hessian-aided momentum technique for variance reduction, while the sub-problem is solved via the stochastic gradient descent method. It is shown that NPG-HM can achieve the global last iterate $\epsilon$-optimality with a sample complexity of $\mathcal{O}(\epsilon{-2})$, which is the best known result for natural policy gradient type methods under the generic Fisher non-degenerate policy parameterizations. The convergence analysis is built upon a relaxed weak gradient dominance property tailored for NPG under the compatible function approximation framework, as well as a neat way to decompose the error when handling the sub-problem. Moreover, numerical experiments on Mujoco-based environments demonstrate the superior performance of NPG-HM over other state-of-the-art policy gradient methods.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (52)
  1. Reinforcement learning based recommender systems: A survey. ACM Computing Surveys, 55(7):1–38, 2022.
  2. On the theory of policy gradient methods: Optimality, approximation, and distribution shift. The Journal of Machine Learning Research, 22(1):4431–4506, 2021.
  3. A novel framework for policy mirror descent with general parametrization and linear convergence. arXiv preprint arXiv:2301.13139, 2023.
  4. Non-strongly-convex smooth stochastic approximation with convergence rate o (1/n). Advances in neural information processing systems, 26, 2013.
  5. Covariant policy search. 2003.
  6. Léon Bottou. Stochastic gradient descent tricks. In Neural networks: Tricks of the trade, pages 421–436. Springer, 2012.
  7. Openai gym. arXiv preprint arXiv:1606.01540, 2016.
  8. Fast global convergence of natural policy gradient methods with entropy regularization. Operations Research, 70(4):2563–2578, 2022.
  9. Decentralized natural policy gradient with variance reduction for collaborative multi-agent reinforcement learning. arXiv preprint arXiv:2209.02179, 2022.
  10. On the global convergence of momentum-based policy gradient. arXiv preprint arXiv:2110.10116, 2021.
  11. Stochastic policy gradient methods: Improved sample complexity for fisher-non-degenerate policies. arXiv preprint arXiv:2302.01734, 2023.
  12. Page-pg: A simple and loopless variance-reduced policy gradient method with probabilistic gradient estimation. In International Conference on Machine Learning, pages 7223–7240. PMLR, 2022.
  13. Dan Garisto. Google ai beats top human players at strategy game starcraft ii. Nature, 2019.
  14. Soft actor-critic algorithms and applications. arXiv preprint arXiv:1812.05905, 2018.
  15. Momentum-based policy gradient methods. In International conference on machine learning, pages 4422–4433. PMLR, 2020.
  16. Contextual decision processes with low bellman rank are pac-learnable. In International Conference on Machine Learning, pages 1704–1713. PMLR, 2017.
  17. Provably efficient reinforcement learning with linear function approximation. In Conference on Learning Theory, pages 2137–2143. PMLR, 2020.
  18. Approximately optimal approximate reinforcement learning. In Proceedings of the Nineteenth International Conference on Machine Learning, pages 267–274, 2002.
  19. Sham M Kakade. A natural policy gradient. Advances in neural information processing systems, 14, 2001.
  20. On the linear convergence of natural policy gradient algorithm. In 2021 60th IEEE Conference on Decision and Control (CDC), pages 3794–3799. IEEE, 2021.
  21. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  22. Reinforcement learning in robotics: A survey. The International Journal of Robotics Research, 32(11):1238–1274, 2013.
  23. Actor-critic algorithms. Advances in neural information processing systems, 12, 1999.
  24. Guanghui Lan. Policy mirror descent for reinforcement learning: Linear convergence, new sampling complexity, and generalized problem classes. Mathematical programming, 198(1):1059–1106, 2023.
  25. Homotopic policy mirror descent: Policy convergence, implicit regularization, and improved sample complexity. arXiv preprint arXiv:2201.09457, 2022.
  26. Projected policy gradient converges in a finite number of iterations. arXiv preprint arXiv:2311.01104, 2023.
  27. An improved analysis of (variance-reduced) policy gradient and natural policy gradient methods. Advances in Neural Information Processing Systems, 33:7624–7636, 2020.
  28. Stochastic second-order methods improve best-known sample complexity of sgd for gradient-dominated functions. Advances in Neural Information Processing Systems, 35:10862–10875, 2022.
  29. Leveraging non-uniformity in first-order non-convex optimization. In International Conference on Machine Learning, pages 7555–7564. PMLR, 2021.
  30. On the global convergence rates of softmax policy gradient methods. In International Conference on Machine Learning, pages 6820–6829. PMLR, 2020.
  31. Improved sample complexity analysis of natural policy gradient algorithm with general parameterization for infinite horizon discounted reward markov decision processes. arXiv preprint arXiv:2310.11677, 2023.
  32. Sarah: A novel method for machine learning problems using stochastic recursive gradient. In International Conference on Machine Learning, pages 2613–2621. PMLR, 2017.
  33. Stochastic variance-reduced policy gradient. In International conference on machine learning, pages 4026–4035. PMLR, 2018.
  34. Natural actor-critic. Neurocomputing, 71(7-9):1180–1190, 2008.
  35. Momentum-based policy gradient with second-order information. 2022.
  36. Adaptive momentum-based policy gradient with second-order information. arXiv preprint arXiv:2205.08253, 2022.
  37. Trust region policy optimization. In International conference on machine learning, pages 1889–1897. PMLR, 2015.
  38. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
  39. Hessian aided policy gradient. In International conference on machine learning, pages 5729–5738. PMLR, 2019.
  40. Reinforcement learning: An introduction. MIT press, 2018.
  41. Mujoco: A physics engine for model-based control. In 2012 IEEE/RSJ international conference on intelligent robots and systems, pages 5026–5033. IEEE, 2012.
  42. Mirror descent policy optimization. arXiv preprint arXiv:2005.09814, 2020.
  43. Neural policy gradient methods: Global optimality and rates of convergence. arXiv preprint arXiv:1909.01150, 2019.
  44. Lin Xiao. On the convergence rates of policy gradient methods. Journal of Machine Learning Research, 23(282):1–36, 2022.
  45. Sample efficient policy gradient methods with recursive variance reduction. arXiv preprint arXiv:1909.08610, 2019.
  46. Sample-optimal parametric q-learning using linearly additive features. In International Conference on Machine Learning, pages 6995–7004. PMLR, 2019.
  47. Stochastic recursive momentum for policy gradient methods. arXiv preprint arXiv:2003.04302, 2020.
  48. Linear convergence of natural policy gradient methods with log-linear policies. arXiv preprint arXiv:2210.01400, 2022.
  49. A general sample complexity analysis of vanilla policy gradient. In International Conference on Artificial Intelligence and Statistics, pages 3332–3380. PMLR, 2022.
  50. Policy mirror descent for regularized reinforcement learning: A generalized framework with linear convergence. arXiv preprint arXiv:2105.11066, 2021.
  51. Variational policy gradient method for reinforcement learning with general utilities. In Advances in Neural Information Processing Systems 33 (NeurIPS 2020), volume 33, pages 4572–4583, 2020.
  52. On the convergence and sample efficiency of variance-reduced policy gradient method. Advances in Neural Information Processing Systems, 34:2228–2240, 2021.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Jie Feng (103 papers)
  2. Ke Wei (40 papers)
  3. Jinchi Chen (17 papers)

Summary

We haven't generated a summary for this paper yet.