Batched Thompson Sampling for Multi-Armed Bandits (2108.06812v1)

Published 15 Aug 2021 in cs.LG

Abstract: We study Thompson Sampling algorithms for stochastic multi-armed bandits in the batched setting, in which we want to minimize the regret over a sequence of arm pulls using a small number of policy changes (or, batches). We propose two algorithms and demonstrate their effectiveness by experiments on both synthetic and real datasets. We also analyze the proposed algorithms from the theoretical aspect and obtain almost tight regret-batches tradeoffs for the two-arm case.

Authors (2)

Nikolai Karpov (10 papers)
Qin Zhang (98 papers)

Citations (3)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Related Papers

Batched Multi-armed Bandits Problem (2019)
Regret Bounds for Batched Bandits (2019)
Thompson Sampling with Virtual Helping Agents (2022)
Batched Thompson Sampling (2021)
Thompson Sampling for Budgeted Multi-armed Bandits (2015)