Towards Minimax Optimal Reinforcement Learning in Factored Markov Decision Processes (2006.13405v1)

Published 24 Jun 2020 in cs.LG, math.OC, and stat.ML

Abstract: We study minimax optimal reinforcement learning in episodic factored Markov decision processes (FMDPs), which are MDPs with conditionally independent transition components. Assuming the factorization is known, we propose two model-based algorithms. The first one achieves minimax optimal regret guarantees for a rich class of factored structures, while the second one enjoys better computational complexity with a slightly worse regret. A key new ingredient of our algorithms is the design of a bonus term to guide exploration. We complement our algorithms by presenting several structure-dependent lower bounds on regret for FMDPs that reveal the difficulty hiding in the intricacy of the structures.

Citations (25)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Towards Minimax Optimal Reinforcement Learning in Factored Markov Decision Processes (2006.13405v1)

Summary

Related Papers