Diffusion Approximations for Thompson Sampling (2105.09232v4)
Abstract: We study the behavior of Thompson sampling from the perspective of weak convergence. In the regime with small $\gamma > 0$, where the gaps between arm means scale as $\sqrt{\gamma}$ and over time horizons that scale as $1/\gamma$, we show that the dynamics of Thompson sampling evolve according to discrete versions of SDE's and stochastic ODE's. As $\gamma \downarrow 0$, we show that the dynamics converge weakly to solutions of the corresponding SDE's and stochastic ODE's. Our weak convergence theory is developed from first principles using the Continuous Mapping Theorem, and can be easily adapted to analyze other sampling-based bandit algorithms. In this regime, we also show that the weak limits of the dynamics of many sampling-based algorithms -- including Thompson sampling designed for single-parameter exponential family rewards, and algorithms using bootstrap-based sampling to balance exploration and exploitation -- coincide with those of Gaussian Thompson sampling. Moreover, in this regime, these algorithms are generally robust to model mis-specification.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.