Absolute Policy Optimization (2310.13230v5)

Published 20 Oct 2023 in cs.LG, cs.AI, and cs.RO

Abstract: In recent years, trust region on-policy reinforcement learning has achieved impressive results in addressing complex control tasks and gaming scenarios. However, contemporary state-of-the-art algorithms within this category primarily emphasize improvement in expected performance, lacking the ability to control over the worst-case performance outcomes. To address this limitation, we introduce a novel objective function, optimizing which leads to guaranteed monotonic improvement in the lower probability bound of performance with high confidence. Building upon this groundbreaking theoretical advancement, we further introduce a practical solution called Absolute Policy Optimization (APO). Our experiments demonstrate the effectiveness of our approach across challenging continuous control benchmark tasks and extend its applicability to mastering Atari games. Our findings reveal that APO as well as its efficient variation Proximal Absolute Policy Optimization (PAPO) significantly outperforms state-of-the-art policy gradient algorithms, resulting in substantial improvements in worst-case performance, as well as expected performance.

References (62)

Authors (6)

Weiye Zhao (24 papers)
Feihan Li (6 papers)
Yifan Sun (183 papers)
Rui Chen (310 papers)
Tianhao Wei (25 papers)
Changliu Liu (134 papers)

Citations (4)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Tweets

https://twitter.com/OWW/status/1796745990587781307

Absolute Policy Optimization (2310.13230v5)

Summary

Related Papers

Tweets