Sample-Efficient Constrained Reinforcement Learning with General Parameterization (2405.10624v3)
Abstract: We consider a constrained Markov Decision Problem (CMDP) where the goal of an agent is to maximize the expected discounted sum of rewards over an infinite horizon while ensuring that the expected discounted sum of costs exceeds a certain threshold. Building on the idea of momentum-based acceleration, we develop the Primal-Dual Accelerated Natural Policy Gradient (PD-ANPG) algorithm that ensures an $\epsilon$ global optimality gap and $\epsilon$ constraint violation with $\tilde{\mathcal{O}}((1-\gamma){-7}\epsilon{-2})$ sample complexity for general parameterized policies where $\gamma$ denotes the discount factor. This improves the state-of-the-art sample complexity in general parameterized CMDPs by a factor of $\mathcal{O}((1-\gamma){-1}\epsilon{-2})$ and achieves the theoretical lower bound in $\epsilon{-1}$.
- Deeppool: Distributed model-free algorithm for ride-sharing using deep reinforcement learning. IEEE Transactions on Intelligent Transportation Systems, 20(12):4714–4727, 2019.
- A multi-agent reinforcement learning perspective on distributed traffic engineering. In 2020 IEEE 28th International Conference on Network Protocols (ICNP), pages 1–11. IEEE, 2020.
- Asap: A semi-autonomous precise system for telesurgery during communication delays. IEEE Transactions on Medical Robotics and Bionics, 5(1):66–78, 2023.
- Achieving zero constraint violation for constrained reinforcement learning via conservative natural policy gradient primal-dual algorithm. In Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, pages 6737–6744. 2023.
- An improved analysis of (variance-reduced) policy gradient and natural policy gradient methods. Advances in Neural Information Processing Systems, 33:7624–7636, 2020.
- Accelerating stochastic gradient descent for least squares regression. In Conference On Learning Theory, pages 545–604. 2018.
- Policy optimization for constrained mdps with provable fast global convergence. arXiv preprint arXiv:2111.00552, 2021.
- Finite-time complexity of online primal-dual natural actor-critic algorithm for constrained markov decision processes. In 2022 IEEE 61st Conference on Decision and Control (CDC), pages 4028–4033. IEEE, 2022.
- Natural policy gradient primal-dual method for constrained markov decision processes. Advances in Neural Information Processing Systems, 33:8378–8390, 2020.
- Crpo: A new approach for safe reinforcement learning with convergence guarantee. In International Conference on Machine Learning, pages 11480–11491. 2021.
- Minimax regret bounds for reinforcement learning. In International Conference on Machine Learning, pages 263–272. 2017.
- On the theory of policy gradient methods: Optimality, approximation, and distribution shift. The Journal of Machine Learning Research, 22(1):4431–4506, 2021.
- On the linear convergence of policy gradient methods for finite mdps. In International Conference on Artificial Intelligence and Statistics, pages 2386–2394. 2021.
- Fast global convergence of natural policy gradient methods with entropy regularization. Operations Research, 70(4):2563–2578, 2022.
- Lan, G. Policy mirror descent for reinforcement learning: Linear convergence, new sampling complexity, and generalized problem classes. Mathematical programming, 198(1):1059–1106, 2023.
- Policy mirror descent for regularized reinforcement learning: A generalized framework with linear convergence. SIAM Journal on Optimization, 33(2):1061–1091, 2023.
- An improved convergence analysis of stochastic variance-reduced policy gradient. In Uncertainty in Artificial Intelligence, pages 541–551. 2020.
- Page-pg: A simple and loopless variance-reduced policy gradient method with probabilistic gradient estimation. In International Conference on Machine Learning, pages 7223–7240. 2022.
- Momentum-based policy gradient methods. In International conference on machine learning, pages 4422–4433. 2020.
- Momentum-based policy gradient with second-order information. Transactions on Machine Learning Research, 2024.
- Hessian aided policy gradient. In International conference on machine learning, pages 5729–5738. 2019.
- Finite-sample analysis of off-policy natural actor–critic with linear function approximation. IEEE Control Systems Letters, 6:2611–2616, 2022.
- Sample complexity of policy-based methods under off-policy sampling and linear function approximation. In International Conference on Artificial Intelligence and Statistics, pages 11195–11214. 2022.
- Finite sample analysis of two-time-scale natural actor-critic algorithm. IEEE Transactions on Automatic Control, 2022.
- Stochastic policy gradient methods: Improved sample complexity for fisher-non-degenerate policies. In International Conference on Machine Learning, pages 9827–9869. 2023.
- Stochastic second-order methods improve best-known sample complexity of SGD for gradient-dominated functions. Advances in Neural Information Processing Systems, 35:10862–10875, 2022.
- Improved sample complexity analysis of natural policy gradient algorithm with general parameterization for infinite horizon discounted reward Markov decision processes. In International Conference on Artificial Intelligence and Statistics, pages 3097–3105. 2024.
- Exploration-exploitation in constrained mdps. arXiv preprint arXiv:2003.02189, 2020.
- Learning policies with zero or bounded constraint violation for constrained mdps. Advances in Neural Information Processing Systems, 34:17183–17193, 2021.
- Provably efficient safe exploration via primal-dual policy optimization. In International Conference on Artificial Intelligence and Statistics, pages 3304–3312. 2021.
- Nearly minimax optimal reinforcement learning for discounted mdps. Advances in Neural Information Processing Systems, 34:22288–22300, 2021.
- Triple-q: A model-free algorithm for constrained reinforcement learning with sublinear regret and zero constraint violation. In International Conference on Artificial Intelligence and Statistics, pages 3274–3307. 2022.
- Achieving zero constraint violation for constrained reinforcement learning via primal-dual approach. In Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, pages 3682–3689. 2022.
- Policy gradient methods for reinforcement learning with function approximation. Advances in neural information processing systems, 12, 1999.
- On the convergence and sample efficiency of variance-reduced policy gradient method. Advances in Neural Information Processing Systems, 34:2228–2240, 2021.
- Sample efficient policy gradient methods with recursive variance reduction. In International Conference on Learning Representations. 2019.
- Provably efficient reinforcement learning with linear function approximation. In J. Abernethy, S. Agarwal, eds., Proceedings of Thirty Third Conference on Learning Theory, vol. 125 of Proceedings of Machine Learning Research, pages 2137–2143. PMLR, 2020.
- Neural policy gradient methods: Global optimality and rates of convergence. In International Conference on Learning Representations. 2019.
- Global convergence of policy gradient methods to (almost) locally optimal policies. SIAM Journal on Control and Optimization, 58(6):3586–3612, 2020.
- Mean-field control based approximation of multi-agent reinforcement learning in presence of a non-decomposable shared global state. Transactions on Machine Learning Research, 2023.
- Convergence and sample complexity of natural policy gradient primal-dual methods for constrained mdps. arXiv preprint arXiv:2206.02346, 2022.
- Washim Uddin Mondal (23 papers)
- Vaneet Aggarwal (222 papers)