Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
156 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Momentum-Based Policy Gradient with Second-Order Information (2205.08253v3)

Published 17 May 2022 in cs.LG and cs.AI

Abstract: Variance-reduced gradient estimators for policy gradient methods have been one of the main focus of research in the reinforcement learning in recent years as they allow acceleration of the estimation process. We propose a variance-reduced policy-gradient method, called SHARP, which incorporates second-order information into stochastic gradient descent (SGD) using momentum with a time-varying learning rate. SHARP algorithm is parameter-free, achieving $\epsilon$-approximate first-order stationary point with $O(\epsilon{-3})$ number of trajectories, while using a batch size of $O(1)$ at each iteration. Unlike most previous work, our proposed algorithm does not require importance sampling which can compromise the advantage of variance reduction process. Moreover, the variance of estimation error decays with the fast rate of $O(1/t{2/3})$ where $t$ is the number of iterations. Our extensive experimental evaluations show the effectiveness of the proposed algorithm on various control tasks and its advantage over the state of the art in practice.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (43)
  1. Second-order information in non-convex stochastic optimization: Power and limitations. In Conference on Learning Theory, pp.  242–299. PMLR, 2020.
  2. Infinite-horizon policy-gradient estimation. Journal of Artificial Intelligence Research, 15:319–350, 2001.
  3. Beyond variance reduction: Understanding the true impact of baselines on policy optimization. In International Conference on Machine Learning, pp. 1999–2009. PMLR, 2021.
  4. Momentum-based variance reduction in non-convex sgd. Advances in neural information processing systems, 32, 2019.
  5. A survey on policy search for robotics. Foundations and trends in Robotics, 2(1-2):388–403, 2013.
  6. On the global optimum convergence of momentum-based policy gradient. In International Conference on Artificial Intelligence and Statistics, pp.  1910–1934. PMLR, 2022.
  7. Spider: Near-optimal non-convex optimization via stochastic path integrated differential estimator. arXiv preprint arXiv:1807.01695, 2018.
  8. The garage contributors. Garage: A toolkit for reproducible reinforcement learning research. https://github.com/rlworkgroup/garage, 2019.
  9. Page-pg: A simple and loopless variance-reduced policy gradient method with probabilistic gradient estimation. In International Conference on Machine Learning, 2022.
  10. Stochastic first-and zeroth-order methods for nonconvex stochastic programming. SIAM Journal on Optimization, 23(4):2341–2368, 2013.
  11. Deep reinforcement learning that matters. In Proceedings of the AAAI conference on artificial intelligence, volume 32, 2018.
  12. Momentum-based policy gradient methods. In International Conference on Machine Learning, pp. 4422–4433. PMLR, 2020.
  13. Bregman gradient policy optimization. In International Conference on Learning Representations, 2022.
  14. Accelerating stochastic gradient descent using predictive variance reduction. Advances in neural information processing systems, 26:315–323, 2013.
  15. Efficiently escaping saddle points for non-convex policy optimization. arXiv preprint arXiv:2311.08914, 2023.
  16. Actor-critic algorithms. In Advances in neural information processing systems, pp. 1008–1014, 2000.
  17. Storm+: Fully adaptive sgd with momentum for nonconvex optimization. In 35th Conference on Neural Information Processing Systems (NeurIPS 2021), number CONF, 2021.
  18. Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In International Conference on Machine Learning, pp. 6286–6295. PMLR, 2021.
  19. An improved analysis of (variance-reduced) policy gradient and natural policy gradient methods. In NeurIPS, 2020.
  20. Sarah: A novel method for machine learning problems using stochastic recursive gradient. In International Conference on Machine Learning, pp. 2613–2621. PMLR, 2017.
  21. Stochastic variance-reduced policy gradient. In International conference on machine learning, pp. 4026–4035. PMLR, 2018.
  22. Barak A Pearlmutter. Fast exact multiplication by the hessian. Neural computation, 6(1):147–160, 1994.
  23. A hybrid stochastic policy gradient algorithm for reinforcement learning. In International Conference on Artificial Intelligence and Statistics, pp.  374–385. PMLR, 2020.
  24. Trust region policy optimization. In International conference on machine learning, pp. 1889–1897. PMLR, 2015a.
  25. High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:1506.02438, 2015b.
  26. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
  27. Safe, multi-agent, reinforcement learning for autonomous driving. arXiv preprint arXiv:1610.03295, 2016.
  28. Adaptive trust region policy optimization: Global convergence and faster rates for regularized mdps. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pp.  5668–5675, 2020.
  29. Hessian aided policy gradient. In International conference on machine learning, pp. 5729–5738. PMLR, 2019.
  30. Mastering the game of go without human knowledge. nature, 550(7676):354–359, 2017.
  31. Policy gradient methods for reinforcement learning with function approximation. In Advances in neural information processing systems, pp. 1057–1063, 2000.
  32. Mujoco: A physics engine for model-based control. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp.  5026–5033. IEEE, 2012.
  33. Better sgd using second-order momentum. Advances in Neural Information Processing Systems, 35:3530–3541, 2022.
  34. Spiderboost and momentum: Faster variance reduction algorithms. Advances in Neural Information Processing Systems, 32:2406–2416, 2019.
  35. Ronald J Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning, 8(3):229–256, 1992.
  36. Sample efficient policy gradient methods with recursive variance reduction. arXiv preprint arXiv:1909.08610, 2019.
  37. An improved convergence analysis of stochastic variance-reduced policy gradient. In Uncertainty in Artificial Intelligence, pp.  541–551. PMLR, 2020.
  38. Policy optimization with stochastic mirror descent. arXiv preprint arXiv:1906.10462, 2019.
  39. Policy optimization with stochastic mirror descent. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pp.  8823–8831, 2022.
  40. Stochastic recursive momentum for policy gradient methods. arXiv preprint arXiv:2003.04302, 2020.
  41. On the convergence and sample efficiency of variance-reduced policy gradient method. arXiv preprint arXiv:2102.08607, 2021.
  42. Liang Zhang. Variance reduction for non-convex stochastic optimization: General analysis and new applications. Master’s thesis, ETH Zurich, 2021.
  43. One sample stochastic frank-wolfe. In International Conference on Artificial Intelligence and Statistics, pp.  4012–4023. PMLR, 2020.
Citations (9)

Summary

We haven't generated a summary for this paper yet.