Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Sharper Model-free Reinforcement Learning for Average-reward Markov Decision Processes (2306.16394v1)

Published 28 Jun 2023 in cs.LG

Abstract: We develop several provably efficient model-free reinforcement learning (RL) algorithms for infinite-horizon average-reward Markov Decision Processes (MDPs). We consider both online setting and the setting with access to a simulator. In the online setting, we propose model-free RL algorithms based on reference-advantage decomposition. Our algorithm achieves $\widetilde{O}(S5A2\mathrm{sp}(h*)\sqrt{T})$ regret after $T$ steps, where $S\times A$ is the size of state-action space, and $\mathrm{sp}(h*)$ the span of the optimal bias function. Our results are the first to achieve optimal dependence in $T$ for weakly communicating MDPs. In the simulator setting, we propose a model-free RL algorithm that finds an $\epsilon$-optimal policy using $\widetilde{O} \left(\frac{SA\mathrm{sp}2(h)}{\epsilon2}+\frac{S2A\mathrm{sp}(h^)}{\epsilon} \right)$ samples, whereas the minimax lower bound is $\Omega\left(\frac{SA\mathrm{sp}(h*)}{\epsilon2}\right)$. Our results are based on two new techniques that are unique in the average-reward setting: 1) better discounted approximation by value-difference estimation; 2) efficient construction of confidence region for the optimal bias function with space complexity $O(SA)$.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Zihan Zhang (121 papers)
  2. Qiaomin Xie (41 papers)
Citations (14)

Summary

We haven't generated a summary for this paper yet.