Connecting Thompson Sampling and UCB: Towards More Efficient Trade-offs Between Privacy and Regret (2505.02383v2)

Published 5 May 2025 in cs.LG

Abstract: We address differentially private stochastic bandit problems from the angles of exploring the deep connections among Thompson Sampling with Gaussian priors, Gaussian mechanisms, and Gaussian differential privacy (GDP). We propose DP-TS-UCB, a novel parametrized private bandit algorithm that enables to trade off privacy and regret. DP-TS-UCB satisfies $ \tilde{O} \left(T^{{0.25(1-\alpha)}\right)$-GDP} and enjoys an $O \left(K\ln^{{\alpha+1}(T)/\Delta} \right)$ regret bound, where $\alpha \in [0,1]$ controls the trade-off between privacy and regret. Theoretically, our DP-TS-UCB relies on anti-concentration bounds of Gaussian distributions and links exploration mechanisms in Thompson Sampling-based algorithms and Upper Confidence Bound-based algorithms, which may be of independent interest.