A Sample-Efficient Algorithm for Episodic Finite-Horizon MDP with Constraints (2009.11348v1)

Published 23 Sep 2020 in cs.LG, cs.AI, cs.SY, eess.SY, and stat.ML

Abstract: Constrained Markov Decision Processes (CMDPs) formalize sequential decision-making problems whose objective is to minimize a cost function while satisfying constraints on various cost functions. In this paper, we consider the setting of episodic fixed-horizon CMDPs. We propose an online algorithm which leverages the linear programming formulation of finite-horizon CMDP for repeated optimistic planning to provide a probably approximately correct (PAC) guarantee on the number of episodes needed to ensure an $\epsilon$-optimal policy, i.e., with resulting objective value within $\epsilon$ of the optimal value and satisfying the constraints within $\epsilon$-tolerance, with probability at least $1-\delta$. The number of episodes needed is shown to be of the order $\tilde{\mathcal{O}}\big(\frac{|S||A|C^{{2}H^{{2}}{\epsilon^{{2}}\log\frac{1}{\delta}\big)$,}}} where $C$ is the upper bound on the number of possible successor states for a state-action pair. Therefore, if $C \ll |S|$, the number of episodes needed have a linear dependence on the state and action space sizes $|S|$ and $|A|$, respectively, and quadratic dependence on the time horizon $H$.

Citations (49)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

A Sample-Efficient Algorithm for Episodic Finite-Horizon MDP with Constraints (2009.11348v1)

Summary

Related Papers