A Low-rank Approximation for MDPs via Moment Coupling (2009.08966v2)

Published 18 Sep 2020 in math.OC, cs.DS, and math.PR

Abstract: We introduce a framework to approximate a Markov Decision Process that stands on two pillars: state aggregation -- as the algorithmic infrastructure; and central-limit-theorem-type approximations -- as the mathematical underpinning of optimality guarantees. The theory is grounded in recent work Braverman et al (2020} that relates the solution of the BeLLMan equation to that of a PDE where, in the spirit of the central limit theorem, the transition matrix is reduced to its local first and second moments. Solving the PDE is $\textit{not}$ required by our method. Instead, we construct a "sister" (controlled) Markov chain whose two local transition moments are approximately identical with those of the focal chain. Because of this $\textit{moment matching}$, the original chain and its "sister" are coupled through the PDE, a coupling that facilitates optimality guarantees. Embedded into standard soft aggregation algorithms, moment matching provided a disciplined mechanism to tune the aggregation and disaggregation probabilities. The computational gains arise from the reduction of the effective state space from $N$ to $N^{{\frac{1}{2}+\epsilon}$} is as one might intuitively expect from approximations grounded in the central limit theorem.

Summary

We haven't generated a summary for this paper yet.

Summarize Now

A Low-rank Approximation for MDPs via Moment Coupling (2009.08966v2)

Summary

Related Papers