Convex Regularization and Convergence of Policy Gradient Flows under Safety Constraints

Published 28 Nov 2024 in cs.LG, cs.AI, math.OC, math.PR, and stat.ML | (2411.19193v1)

Abstract: This paper studies reinforcement learning (RL) in infinite-horizon dynamic decision processes with almost-sure safety constraints. Such safety-constrained decision processes are central to applications in autonomous systems, finance, and resource management, where policies must satisfy strict, state-dependent constraints. We consider a doubly-regularized RL framework that combines reward and parameter regularization to address these constraints within continuous state-action spaces. Specifically, we formulate the problem as a convex regularized objective with parametrized policies in the mean-field regime. Our approach leverages recent developments in mean-field theory and Wasserstein gradient flows to model policies as elements of an infinite-dimensional statistical manifold, with policy updates evolving via gradient flows on the space of parameter distributions. Our main contributions include establishing solvability conditions for safety-constrained problems, defining smooth and bounded approximations that facilitate gradient flows, and demonstrating exponential convergence towards global solutions under sufficient regularization. We provide general conditions on regularization functions, encompassing standard entropy regularization as a special case. The results also enable a particle method implementation for practical RL applications. The theoretical insights and convergence guarantees presented here offer a robust framework for safe RL in complex, high-dimensional decision-making problems.

Abstract PDF HTML Upgrade to Chat

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Paper Prompts

Top Community Prompts

Explain it Like I'm 14

off on

Knowledge Gaps

off on

Practical Applications

off on

Glossary

off on

Conceptual Simplification

off on

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Generate Now

Continue Learning

We haven't generated follow-up questions for this paper yet.

Generate Now

Convex Regularization and Convergence of Policy Gradient Flows under Safety Constraints

Summary

Paper to Video (Beta)

Whiteboard

Paper Prompts

Top Community Prompts

Open Problems

Continue Learning

Authors (5)

Collections

Convex Regularization and Convergence of Policy Gradient Flows under Safety Constraints

Summary

Paper to Video (Beta)

Whiteboard

Paper Prompts

Top Community Prompts

Open Problems

Continue Learning

Related Papers

Authors (5)

Collections