Papers
Topics
Authors
Recent
2000 character limit reached

Diffusion Approximations for Thompson Sampling (2105.09232v4)

Published 19 May 2021 in cs.LG, math.ST, and stat.TH

Abstract: We study the behavior of Thompson sampling from the perspective of weak convergence. In the regime with small $\gamma > 0$, where the gaps between arm means scale as $\sqrt{\gamma}$ and over time horizons that scale as $1/\gamma$, we show that the dynamics of Thompson sampling evolve according to discrete versions of SDE's and stochastic ODE's. As $\gamma \downarrow 0$, we show that the dynamics converge weakly to solutions of the corresponding SDE's and stochastic ODE's. Our weak convergence theory is developed from first principles using the Continuous Mapping Theorem, and can be easily adapted to analyze other sampling-based bandit algorithms. In this regime, we also show that the weak limits of the dynamics of many sampling-based algorithms -- including Thompson sampling designed for single-parameter exponential family rewards, and algorithms using bootstrap-based sampling to balance exploration and exploitation -- coincide with those of Gaussian Thompson sampling. Moreover, in this regime, these algorithms are generally robust to model mis-specification.

Citations (20)

Summary

We haven't generated a summary for this paper yet.

Whiteboard

Paper to Video (Beta)

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (2)

Collections

Sign up for free to add this paper to one or more collections.