Regret Minimization in Heavy-Tailed Bandits (2102.03734v1)

Published 7 Feb 2021 in cs.LG and stat.ML

Abstract: We revisit the classic regret-minimization problem in the stochastic multi-armed bandit setting when the arm-distributions are allowed to be heavy-tailed. Regret minimization has been well studied in simpler settings of either bounded support reward distributions or distributions that belong to a single parameter exponential family. We work under the much weaker assumption that the moments of order $(1+\epsilon)$ are uniformly bounded by a known constant B, for some given $\epsilon > 0$. We propose an optimal algorithm that matches the lower bound exactly in the first-order term. We also give a finite-time bound on its regret. We show that our index concentrates faster than the well known truncated or trimmed empirical mean estimators for the mean of heavy-tailed distributions. Computing our index can be computationally demanding. To address this, we develop a batch-based algorithm that is optimal up to a multiplicative constant depending on the batch size. We hence provide a controlled trade-off between statistical optimality and computational cost.

Authors (3)

Shubhada Agrawal (10 papers)
Sandeep Juneja (24 papers)
Wouter M. Koolen (25 papers)

Citations (28)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Regret Minimization in Heavy-Tailed Bandits (2102.03734v1)

Summary

Related Papers