Constrained Upper Confidence Reinforcement Learning (2001.09377v1)

Published 26 Jan 2020 in cs.LG and stat.ML

Abstract: Constrained Markov Decision Processes are a class of stochastic decision problems in which the decision maker must select a policy that satisfies auxiliary cost constraints. This paper extends upper confidence reinforcement learning for settings in which the reward function and the constraints, described by cost functions, are unknown a priori but the transition kernel is known. Such a setting is well-motivated by a number of applications including exploration of unknown, potentially unsafe, environments. We present an algorithm C-UCRL and show that it achieves sub-linear regret ($ O(T^{{\frac{3}{4}}\sqrt{\log(T/\delta)})$)} with respect to the reward while satisfying the constraints even while learning with probability $1-\delta$. Illustrative examples are provided.

Citations (63)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Constrained Upper Confidence Reinforcement Learning (2001.09377v1)

Summary

Related Papers