Fisher-Rao Gradient Flows of Linear Programs and State-Action Natural Policy Gradients (2403.19448v2)
Abstract: Kakade's natural policy gradient method has been studied extensively in recent years, showing linear convergence with and without regularization. We study another natural gradient method based on the Fisher information matrix of the state-action distributions which has received little attention from the theoretical side. Here, the state-action distributions follow the Fisher-Rao gradient flow inside the state-action polytope with respect to a linear potential. Therefore, we study Fisher-Rao gradient flows of linear programs more generally and show linear convergence with a rate that depends on the geometry of the linear program. Equivalently, this yields an estimate on the error induced by entropic regularization of the linear program which improves existing results. We extend these results and show sublinear convergence for perturbed Fisher-Rao gradient flows and natural gradient flows up to an approximation error. In particular, these general results cover the case of state-action natural policy gradients.
- GPT-4 Technical Report. arXiv preprint arXiv:2303.08774, 2023.
- On the theory of policy gradient methods: Optimality, approximation, and distribution shift. The Journal of Machine Learning Research, 22(1):4431–4506, 2021.
- Linear convergence for natural policy gradient with log-linear policy parametrization. arXiv preprint arXiv:2209.15382, 2022.
- A novel framework for policy mirror descent with general parameterization and linear convergence. Advances in Neural Information Processing Systems, 36, 2023.
- Hessian Riemannian gradient flows in convex programming. SIAM journal on control and optimization, 43(2):477–501, 2004.
- Shun-ichi Amari. Natural gradient works efficiently in learning. Neural computation, 10(2):251–276, 1998.
- Shun-ichi Amari. Information geometry and its applications, volume 194. Springer, 2016.
- Information geometry, volume 64. Springer, 2017.
- Covariant policy search. In Proceedings of the 18th International Joint Conference on Artificial Intelligence, IJCAI’03, page 1019–1024, San Francisco, CA, USA, 2003. Morgan Kaufmann Publishers Inc.
- On linear convergence of non-euclidean gradient methods without strong convexity and lipschitz gradient continuity. Journal of Optimization Theory and Applications, 182:1068–1087, 2019.
- Dota 2 with large scale deep reinforcement learning. arXiv preprint arXiv:1912.06680, 2019.
- On the linear convergence of policy gradient methods for finite mdps. In International Conference on Artificial Intelligence and Statistics, pages 2386–2394. PMLR, 2021.
- A geometric embedding approach to multiple games and multiple populations. arXiv preprint arXiv:2401.05918, 2024.
- Convex optimization. Cambridge university press, 2004.
- Linear convergence of entropy-regularized natural policy gradient with linear function approximation. arXiv preprint arXiv:2106.04096, 2021.
- Fast global convergence of natural policy gradient methods with entropy regularization. Operations Research, 2021.
- NN Čencov. Algebraic foundation of mathematical statistics. Statistics: A Journal of Theoretical and Applied Statistics, 9(2):267–276, 1978.
- Asymptotic analysis of the exponential penalty trajectory in linear programming. Mathematical Programming, 67:169–187, 1994.
- Cyrus Derman. Finite state Markovian decision processes. Academic Press, Inc., 1970.
- Online learning in markov decision processes with changing cost sequences. In International Conference on Machine Learning, pages 512–520. PMLR, 2014.
- Natural policy gradient primal-dual method for constrained Markov decision processes. Advances in Neural Information Processing Systems, 33:8378–8390, 2020.
- Convergence and sample complexity of natural policy gradient primal-dual methods for constrained MDPs. arXiv preprint arXiv:2206.02346, 2022.
- Discrete-time Markov control processes: basic optimality criteria, volume 30. Springer Science & Business Media, 2012.
- Sham M Kakade. A natural policy gradient. Advances in neural information processing systems, 14, 2001.
- Lodewijk CM Kallenberg. Survey of linear programming for standard and nonstandard markovian control problems. part i: Theory. Zeitschrift für Operations Research, 40:1–42, 1994.
- A Fisher-Rao gradient flow for entropy-regularised Markov decision processes in Polish spaces. arXiv preprint arXiv:2310.02951, 2023.
- On linear and super-linear convergence of natural policy gradient algorithm. Systems & Control Letters, 164:105214, 2022.
- Guanghui Lan. Policy mirror descent for reinforcement learning: Linear convergence, new sampling complexity, and generalized problem classes. Mathematical programming, pages 1–48, 2022.
- On the occupancy measure of non-markovian policies in continuous mdps. In International Conference on Machine Learning, pages 18548–18562. PMLR, 2023.
- Approximate newton policy gradient algorithms. SIAM Journal on Scientific Computing, 45(5):A2585–A2609, 2023.
- Relatively smooth convex optimization by first-order methods, and applications. SIAM Journal on Optimization, 28(1):333–354, 2018.
- Escaping the gravitational pull of softmax. Advances in Neural Information Processing Systems, 33:21130–21140, 2020.
- On the Global Convergence Rates of Softmax Policy Gradient Methods. In International Conference on Machine Learning, pages 6820–6829. PMLR, 2020.
- On the Fisher metric of conditional probability polytopes. Entropy, 16(6):3207–3233, 2014.
- A new natural policy gradient by stationary distribution metric. In Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2008, Antwerp, Belgium, September 15-19, 2008, Proceedings, Part II 19, pages 82–97. Springer, 2008.
- A generalized natural actor-critic algorithm. Advances in neural information processing systems, 22, 2009.
- Johannes Müller. Geometry of Optimization in Markov Decision Processes and Neural Network Based PDE Solvers. PhD thesis, University of Leipzig, 2023.
- The Geometry of Memoryless Stochastic Policy Optimization in Infinite-Horizon POMDPs. In International Conference on Learning Representations, 2022.
- Geometry and convergence of natural policy gradient methods. Information Geometry, 7(1):485–523, 2024.
- Achieving High Accuracy with PINNs via Energy Natural Gradient Descent. In International Conference on Machine Learning, pages 25471–25485. PMLR, 2023.
- A unified view of entropy-regularized Markov decision processes. arXiv preprint arXiv:1705.07798, 2017.
- Relative entropy policy search. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 24, pages 1607–1612, 2010.
- Natural actor-critic. Neurocomputing, 71(7-9):1180–1190, 2008.
- Computational optimal transport: With applications to data science. Foundations and Trends® in Machine Learning, 11(5-6):355–607, 2019.
- C Radhakrishna Rao. Information and accuracy attainable in the estimation of statistical parameters. Bulletin of the Calcutta Mathematical Society, 37(3):81–91, 1945.
- C Radhakrishna Rao. Differential metrics in probability spaces. In Differential geometry in statistical inference, volume 10, pages 217–241. Institute of Mathematical Statistics, 1987.
- Johannes Rauh. Finding the Maximizers of the Information Divergence from an Exponential Family. PhD thesis, University of Leipzig, 2011.
- Trust region policy optimization. In International conference on machine learning, pages 1889–1897. PMLR, 2015.
- Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
- Toric geometry of entropic regularization. Journal of Symbolic Computation, 120:102221, 2024.
- Felipe Suárez Colmenares. Perspectives on Geometry and Optimization: from Measures to Neural Networks. PhD thesis, Massachusetts Institute of Technology, 2023.
- Invariance properties of the natural gradient in overparametrised systems. Information geometry, 6(1):51–67, 2023.
- Li Wang and Ming Yan. Hessian informed mirror descent. Journal of Scientific Computing, 92(3):90, 2022.
- The optimal reward baseline for gradient-based reinforcement learning. In Proceedings of the Seventeenth Conference on Uncertainty in Artificial Intelligence, UAI’01, page 538–545, San Francisco, CA, USA, 2001. Morgan Kaufmann Publishers Inc.
- Jonathan Weed. An explicit analysis of the entropic penalty in linear programming. In Conference On Learning Theory, pages 1841–1855. PMLR, 2018.
- Lin Xiao. On the convergence rates of policy gradient methods. Journal of Machine Learning Research, 23(282):1–36, 2022.
- Linear convergence of natural policy gradient methods with log-linear policies. In The Eleventh International Conference on Learning Representations, 2023.
- Policy mirror descent for regularized reinforcement learning: A generalized framework with linear convergence. SIAM Journal on Optimization, 33(2):1061–1091, 2023.
- Günter M Ziegler. Lectures on polytopes, volume 152. Springer Science & Business Media, 2012.
- Günter M Ziegler. Lecture notes: Discrete Geometry I, 2013.
- Online learning in episodic markovian decision processes by relative entropy policy search. Advances in neural information processing systems, 26, 2013.