A Prospect-Theoretic Policy Gradient Algorithm for Behavioral Alignment in Reinforcement Learning

Published 3 Oct 2024 in cs.LG and cs.AI | (2410.02605v2)

Abstract: Classical reinforcement learning (RL) typically assumes rational decision-making based on expected utility theory. However, this model has been shown to be empirically inconsistent with actual human preferences, as evidenced in psychology and behavioral economics. Cumulative Prospect Theory (CPT) provides a more nuanced model for human-based decision-making, capturing diverse attitudes and perceptions toward risk, gains, and losses. While prior work has integrated CPT with RL to solve a CPT policy optimization problem, the understanding and practical impact of this formulation remain limited. We revisit the CPT-RL framework, offering new theoretical insights into the nature of optimal policies. We further derive a novel policy gradient theorem for CPT objectives, generalizing the foundational result in standard RL. Building on this theorem, we design a model-free policy gradient algorithm for solving the CPT-RL problem and demonstrate its performance through simulations. Notably, our algorithm scales better to larger state spaces compared to existing zeroth-order methods. This work advances the integration of behavioral decision-making into RL.