Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

MOTS: Minimax Optimal Thompson Sampling (2003.01803v3)

Published 3 Mar 2020 in cs.LG, math.ST, stat.ML, and stat.TH

Abstract: Thompson sampling is one of the most widely used algorithms for many online decision problems, due to its simplicity in implementation and superior empirical performance over other state-of-the-art methods. Despite its popularity and empirical success, it has remained an open problem whether Thompson sampling can match the minimax lower bound $\Omega(\sqrt{KT})$ for $K$-armed bandit problems, where $T$ is the total time horizon. In this paper, we solve this long open problem by proposing a variant of Thompson sampling called MOTS that adaptively clips the sampling instance of the chosen arm at each time step. We prove that this simple variant of Thompson sampling achieves the minimax optimal regret bound $O(\sqrt{KT})$ for finite time horizon $T$, as well as the asymptotic optimal regret bound for Gaussian rewards when $T$ approaches infinity. To our knowledge, MOTS is the first Thompson sampling type algorithm that achieves the minimax optimality for multi-armed bandit problems.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Tianyuan Jin (14 papers)
  2. Pan Xu (68 papers)
  3. Jieming Shi (21 papers)
  4. Xiaokui Xiao (90 papers)
  5. Quanquan Gu (198 papers)
Citations (29)

Summary

We haven't generated a summary for this paper yet.