Multi-Armed Bandits with Metric Movement Costs (1710.08997v1)

Published 24 Oct 2017 in cs.LG

Abstract: We consider the non-stochastic Multi-Armed Bandit problem in a setting where there is a fixed and known metric on the action space that determines a cost for switching between any pair of actions. The loss of the online learner has two components: the first is the usual loss of the selected actions, and the second is an additional loss due to switching between actions. Our main contribution gives a tight characterization of the expected minimax regret in this setting, in terms of a complexity measure $\mathcal{C}$ of the underlying metric which depends on its covering numbers. In finite metric spaces with $k$ actions, we give an efficient algorithm that achieves regret of the form $\widetilde{O}(\max{\mathcal{C}^{{1/3}T^{{2/3},\sqrt{kT}})$,}} and show that this is the best possible. Our regret bound generalizes previous known regret bounds for some special cases: (i) the unit-switching cost regret $\widetilde{\Theta}(\max{k^{{1/3}T^{{2/3},\sqrt{kT}})$}} where $\mathcal{C}=\Theta(k)$, and (ii) the interval metric with regret $\widetilde{\Theta}(\max{T^{{2/3},\sqrt{kT}})$} where $\mathcal{C}=\Theta(1)$. For infinite metrics spaces with Lipschitz loss functions, we derive a tight regret bound of $\widetilde{\Theta}(T^{{\frac{d+1}{d+2}})$} where $d \ge 1$ is the Minkowski dimension of the space, which is known to be tight even when there are no switching costs.

Authors (3)

Tomer Koren (79 papers)
Roi Livni (35 papers)
Yishay Mansour (158 papers)

Citations (21)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Multi-Armed Bandits with Metric Movement Costs (1710.08997v1)

Summary

Related Papers