Mean-Variance Optimization in Markov Decision Processes

Published 29 Apr 2011 in cs.LG and cs.AI | (1104.5601v1)

Abstract: We consider finite horizon Markov decision processes under performance measures that involve both the mean and the variance of the cumulative reward. We show that either randomized or history-based policies can improve performance. We prove that the complexity of computing a policy that maximizes the mean reward under a variance constraint is NP-hard for some cases, and strongly NP-hard for others. We finally offer pseudopolynomial exact and approximation algorithms.