Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Provably Efficient Exploration in Constrained Reinforcement Learning:Posterior Sampling Is All You Need (2309.15737v1)

Published 27 Sep 2023 in cs.LG

Abstract: We present a new algorithm based on posterior sampling for learning in constrained Markov decision processes (CMDP) in the infinite-horizon undiscounted setting. The algorithm achieves near-optimal regret bounds while being advantageous empirically compared to the existing algorithms. Our main theoretical result is a Bayesian regret bound for each cost component of \tilde{O} (HS \sqrt{AT}) for any communicating CMDP with S states, A actions, and bound on the hitting time H. This regret bound matches the lower bound in order of time horizon T and is the best-known regret bound for communicating CMDPs in the infinite-horizon undiscounted setting. Empirical results show that, despite its simplicity, our posterior sampling algorithm outperforms the existing algorithms for constrained reinforcement learning.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Danil Provodin (7 papers)
  2. Pratik Gajane (19 papers)
  3. Mykola Pechenizkiy (118 papers)
  4. Maurits Kaptein (18 papers)

Summary

We haven't generated a summary for this paper yet.