Efficient and Adaptive Posterior Sampling Algorithms for Bandits (2405.01010v1)

Published 2 May 2024 in cs.LG and stat.ML

Abstract: We study Thompson Sampling-based algorithms for stochastic bandits with bounded rewards. As the existing problem-dependent regret bound for Thompson Sampling with Gaussian priors [Agrawal and Goyal, 2017] is vacuous when $T \le 288 e^{64}$, we derive a more practical bound that tightens the coefficient of the leading term %from $288 e^{64}$ to $1270$. Additionally, motivated by large-scale real-world applications that require scalability, adaptive computational resource allocation, and a balance in utility and computation, we propose two parameterized Thompson Sampling-based algorithms: Thompson Sampling with Model Aggregation (TS-MA-$\alpha$) and Thompson Sampling with Timestamp Duelling (TS-TD-$\alpha$), where $\alpha \in [0,1]$ controls the trade-off between utility and computation. Both algorithms achieve $O \left(K\ln^{{\alpha+1}(T)/\Delta} \right)$ regret bound, where $K$ is the number of arms, $T$ is the finite learning horizon, and $\Delta$ denotes the single round performance loss when pulling a sub-optimal arm.

References (17)

Authors (5)

Bingshan Hu (5 papers)
Zhiming Huang (11 papers)
Tianyue H. Zhang (4 papers)
Nidhi Hegde (15 papers)
Mathias Lécuyer (17 papers)

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Tweets

https://twitter.com/StatMLPapers/status/1786246559002230836

Efficient and Adaptive Posterior Sampling Algorithms for Bandits (2405.01010v1)

Summary

Related Papers

Tweets