Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Penalized Proximal Policy Optimization for Safe Reinforcement Learning (2205.11814v2)

Published 24 May 2022 in cs.LG, cs.AI, and math.OC

Abstract: Safe reinforcement learning aims to learn the optimal policy while satisfying safety constraints, which is essential in real-world applications. However, current algorithms still struggle for efficient policy updates with hard constraint satisfaction. In this paper, we propose Penalized Proximal Policy Optimization (P3O), which solves the cumbersome constrained policy iteration via a single minimization of an equivalent unconstrained problem. Specifically, P3O utilizes a simple-yet-effective penalty function to eliminate cost constraints and removes the trust-region constraint by the clipped surrogate objective. We theoretically prove the exactness of the proposed method with a finite penalty factor and provide a worst-case analysis for approximate error when evaluated on sample trajectories. Moreover, we extend P3O to more challenging multi-constraint and multi-agent scenarios which are less studied in previous work. Extensive experiments show that P3O outperforms state-of-the-art algorithms with respect to both reward improvement and constraint satisfaction on a set of constrained locomotive tasks.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Linrui Zhang (12 papers)
  2. Li Shen (363 papers)
  3. Long Yang (54 papers)
  4. Shixiang Chen (18 papers)
  5. Bo Yuan (151 papers)
  6. Xueqian Wang (99 papers)
  7. Dacheng Tao (829 papers)
Citations (55)

Summary

We haven't generated a summary for this paper yet.