Rotting Infinitely Many-armed Bandits (2201.12975v3)

Published 31 Jan 2022 in cs.LG, cs.DS, math.OC, and stat.ML

Abstract: We consider the infinitely many-armed bandit problem with rotting rewards, where the mean reward of an arm decreases at each pull of the arm according to an arbitrary trend with maximum rotting rate $\varrho=o(1)$. We show that this learning problem has an $\Omega(\max{\varrho^{{1/3}T,\sqrt{T}})$} worst-case regret lower bound where $T$ is the horizon time. We show that a matching upper bound $\tilde{O}(\max{\varrho^{{1/3}T,\sqrt{T}})$,} up to a poly-logarithmic factor, can be achieved by an algorithm that uses a UCB index for each arm and a threshold value to decide whether to continue pulling an arm or remove the arm from further consideration, when the algorithm knows the value of the maximum rotting rate $\varrho$. We also show that an $\tilde{O}(\max{\varrho^{{1/3}T,T^{3/4}})$} regret upper bound can be achieved by an algorithm that does not know the value of $\varrho$, by using an adaptive UCB index along with an adaptive threshold value.

PDF HTML Abstract

Summarize Bookmark Chat (Pro)

References (31)

Authors (3)

Jung-hun Kim (9 papers)
Milan Vojnovic (25 papers)
Se-Young Yun (114 papers)

Citations (5)

View on Semantic Scholar

Rotting Infinitely Many-armed Bandits (2201.12975v3)

Related Papers