Learning Deterministic Policies with Policy Gradients in Constrained Markov Decision Processes (2506.05953v1)

Published 6 Jun 2025 in cs.LG

Abstract: Constrained Reinforcement Learning (CRL) addresses sequential decision-making problems where agents are required to achieve goals by maximizing the expected return while meeting domain-specific constraints. In this setting, policy-based methods are widely used thanks to their advantages when dealing with continuous-control problems. These methods search in the policy space with an action-based or a parameter-based exploration strategy, depending on whether they learn the parameters of a stochastic policy or those of a stochastic hyperpolicy. We introduce an exploration-agnostic algorithm, called C-PG, which enjoys global last-iterate convergence guarantees under gradient domination assumptions. Furthermore, under specific noise models where the (hyper)policy is expressed as a stochastic perturbation of the actions or of the parameters of an underlying deterministic policy, we additionally establish global last-iterate convergence guarantees of C-PG to the optimal deterministic policy. This holds when learning a stochastic (hyper)policy and subsequently switching off the stochasticity at the end of training, thereby deploying a deterministic policy. Finally, we empirically validate both the action-based (C-PGAE) and parameter-based (C-PGPE) variants of C-PG on constrained control tasks, and compare them against state-of-the-art baselines, demonstrating their effectiveness, in particular when deploying deterministic policies after training.

Collections

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Generate Now

Learning Deterministic Policies with Policy Gradients in Constrained Markov Decision Processes (2506.05953v1)

Collections

Summary

Follow-up Questions

Authors (5)

Don't miss out on important new AI/ML research

Learning Deterministic Policies with Policy Gradients in Constrained Markov Decision Processes (2506.05953v1)

Collections

Summary

Follow-up Questions

Related Papers

Authors (5)

Don't miss out on important new AI/ML research