Global Convergence of Natural Policy Gradient with Hessian-aided Momentum Variance Reduction (2401.01084v2)
Abstract: Natural policy gradient (NPG) and its variants are widely-used policy search methods in reinforcement learning. Inspired by prior work, a new NPG variant coined NPG-HM is developed in this paper, which utilizes the Hessian-aided momentum technique for variance reduction, while the sub-problem is solved via the stochastic gradient descent method. It is shown that NPG-HM can achieve the global last iterate $\epsilon$-optimality with a sample complexity of $\mathcal{O}(\epsilon{-2})$, which is the best known result for natural policy gradient type methods under the generic Fisher non-degenerate policy parameterizations. The convergence analysis is built upon a relaxed weak gradient dominance property tailored for NPG under the compatible function approximation framework, as well as a neat way to decompose the error when handling the sub-problem. Moreover, numerical experiments on Mujoco-based environments demonstrate the superior performance of NPG-HM over other state-of-the-art policy gradient methods.
- Reinforcement learning based recommender systems: A survey. ACM Computing Surveys, 55(7):1–38, 2022.
- On the theory of policy gradient methods: Optimality, approximation, and distribution shift. The Journal of Machine Learning Research, 22(1):4431–4506, 2021.
- A novel framework for policy mirror descent with general parametrization and linear convergence. arXiv preprint arXiv:2301.13139, 2023.
- Non-strongly-convex smooth stochastic approximation with convergence rate o (1/n). Advances in neural information processing systems, 26, 2013.
- Covariant policy search. 2003.
- Léon Bottou. Stochastic gradient descent tricks. In Neural networks: Tricks of the trade, pages 421–436. Springer, 2012.
- Openai gym. arXiv preprint arXiv:1606.01540, 2016.
- Fast global convergence of natural policy gradient methods with entropy regularization. Operations Research, 70(4):2563–2578, 2022.
- Decentralized natural policy gradient with variance reduction for collaborative multi-agent reinforcement learning. arXiv preprint arXiv:2209.02179, 2022.
- On the global convergence of momentum-based policy gradient. arXiv preprint arXiv:2110.10116, 2021.
- Stochastic policy gradient methods: Improved sample complexity for fisher-non-degenerate policies. arXiv preprint arXiv:2302.01734, 2023.
- Page-pg: A simple and loopless variance-reduced policy gradient method with probabilistic gradient estimation. In International Conference on Machine Learning, pages 7223–7240. PMLR, 2022.
- Dan Garisto. Google ai beats top human players at strategy game starcraft ii. Nature, 2019.
- Soft actor-critic algorithms and applications. arXiv preprint arXiv:1812.05905, 2018.
- Momentum-based policy gradient methods. In International conference on machine learning, pages 4422–4433. PMLR, 2020.
- Contextual decision processes with low bellman rank are pac-learnable. In International Conference on Machine Learning, pages 1704–1713. PMLR, 2017.
- Provably efficient reinforcement learning with linear function approximation. In Conference on Learning Theory, pages 2137–2143. PMLR, 2020.
- Approximately optimal approximate reinforcement learning. In Proceedings of the Nineteenth International Conference on Machine Learning, pages 267–274, 2002.
- Sham M Kakade. A natural policy gradient. Advances in neural information processing systems, 14, 2001.
- On the linear convergence of natural policy gradient algorithm. In 2021 60th IEEE Conference on Decision and Control (CDC), pages 3794–3799. IEEE, 2021.
- Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- Reinforcement learning in robotics: A survey. The International Journal of Robotics Research, 32(11):1238–1274, 2013.
- Actor-critic algorithms. Advances in neural information processing systems, 12, 1999.
- Guanghui Lan. Policy mirror descent for reinforcement learning: Linear convergence, new sampling complexity, and generalized problem classes. Mathematical programming, 198(1):1059–1106, 2023.
- Homotopic policy mirror descent: Policy convergence, implicit regularization, and improved sample complexity. arXiv preprint arXiv:2201.09457, 2022.
- Projected policy gradient converges in a finite number of iterations. arXiv preprint arXiv:2311.01104, 2023.
- An improved analysis of (variance-reduced) policy gradient and natural policy gradient methods. Advances in Neural Information Processing Systems, 33:7624–7636, 2020.
- Stochastic second-order methods improve best-known sample complexity of sgd for gradient-dominated functions. Advances in Neural Information Processing Systems, 35:10862–10875, 2022.
- Leveraging non-uniformity in first-order non-convex optimization. In International Conference on Machine Learning, pages 7555–7564. PMLR, 2021.
- On the global convergence rates of softmax policy gradient methods. In International Conference on Machine Learning, pages 6820–6829. PMLR, 2020.
- Improved sample complexity analysis of natural policy gradient algorithm with general parameterization for infinite horizon discounted reward markov decision processes. arXiv preprint arXiv:2310.11677, 2023.
- Sarah: A novel method for machine learning problems using stochastic recursive gradient. In International Conference on Machine Learning, pages 2613–2621. PMLR, 2017.
- Stochastic variance-reduced policy gradient. In International conference on machine learning, pages 4026–4035. PMLR, 2018.
- Natural actor-critic. Neurocomputing, 71(7-9):1180–1190, 2008.
- Momentum-based policy gradient with second-order information. 2022.
- Adaptive momentum-based policy gradient with second-order information. arXiv preprint arXiv:2205.08253, 2022.
- Trust region policy optimization. In International conference on machine learning, pages 1889–1897. PMLR, 2015.
- Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
- Hessian aided policy gradient. In International conference on machine learning, pages 5729–5738. PMLR, 2019.
- Reinforcement learning: An introduction. MIT press, 2018.
- Mujoco: A physics engine for model-based control. In 2012 IEEE/RSJ international conference on intelligent robots and systems, pages 5026–5033. IEEE, 2012.
- Mirror descent policy optimization. arXiv preprint arXiv:2005.09814, 2020.
- Neural policy gradient methods: Global optimality and rates of convergence. arXiv preprint arXiv:1909.01150, 2019.
- Lin Xiao. On the convergence rates of policy gradient methods. Journal of Machine Learning Research, 23(282):1–36, 2022.
- Sample efficient policy gradient methods with recursive variance reduction. arXiv preprint arXiv:1909.08610, 2019.
- Sample-optimal parametric q-learning using linearly additive features. In International Conference on Machine Learning, pages 6995–7004. PMLR, 2019.
- Stochastic recursive momentum for policy gradient methods. arXiv preprint arXiv:2003.04302, 2020.
- Linear convergence of natural policy gradient methods with log-linear policies. arXiv preprint arXiv:2210.01400, 2022.
- A general sample complexity analysis of vanilla policy gradient. In International Conference on Artificial Intelligence and Statistics, pages 3332–3380. PMLR, 2022.
- Policy mirror descent for regularized reinforcement learning: A generalized framework with linear convergence. arXiv preprint arXiv:2105.11066, 2021.
- Variational policy gradient method for reinforcement learning with general utilities. In Advances in Neural Information Processing Systems 33 (NeurIPS 2020), volume 33, pages 4572–4583, 2020.
- On the convergence and sample efficiency of variance-reduced policy gradient method. Advances in Neural Information Processing Systems, 34:2228–2240, 2021.
- Jie Feng (103 papers)
- Ke Wei (40 papers)
- Jinchi Chen (17 papers)