Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Policy Zooming: Adaptive Discretization-based Infinite-Horizon Average-Reward Reinforcement Learning (2405.18793v3)

Published 29 May 2024 in cs.LG

Abstract: We study Lipschitz MDPs in the infinite-horizon average-reward reinforcement learning (RL) setup in which an agent can play policies from a given set $\Phi$. The proposed algorithms zoom'' intopromising'' regions of the policy space, thereby achieving adaptivity gains. We upper bound their regret as $\tilde{\mathcal{O}}\big(T{1 - d_{\text{eff.}}{-1}}\big)$, where $d_{\text{eff.}} = d\Phi_z+2$ for model-free algorithm~\textit{PZRL-MF} and $d_{\text{eff.}} = 2d_\mathcal{S} + d\Phi_z + 3$ for model-based algorithm~\textit{PZRL-MB}. Here, $d_\mathcal{S}$ is the dimension of the state space, and $d\Phi_z$ is the zooming dimension. $d\Phi_z$ is a problem-dependent quantity that depends not only on the underlying MDP, but also on the class $\Phi$. This yields us a low regret in case the agent competes against a low-complexity $\Phi$ (that has a small $d\Phi_z$). We note that the preexisting notions of zooming dimension are inept at handling the non-episodic RL and do not yield adaptivity gains. The current work shows how to capture adaptivity gains for infinite-horizon average-reward RL in terms of $d\Phi_z$. When specialized to the case of finite-dimensional policy space, we obtain that $d_{\text{eff.}}$ scales as the dimension of this space under mild technical conditions; and also obtain $d_{\text{eff.}} = 0$, or equivalently $\tilde{\mathcal{O}}(\sqrt{T})$ regret for \textit{PZRL-MF}, under a curvature condition on the average reward function that is commonly used in the multi-armed bandit (MAB) literature. Simulation experiments validate the gains arising due to adaptivity.

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets