Randomized Exploration in Generalized Linear Bandits (1906.08947v3)
Abstract: We study two randomized algorithms for generalized linear bandits. The first, GLM-TSL, samples a generalized linear model (GLM) from the Laplace approximation to the posterior distribution. The second, GLM-FPL, fits a GLM to a randomly perturbed history of past rewards. We analyze both algorithms and derive $\tilde{O}(d \sqrt{n \log K})$ upper bounds on their $n$-round regret, where $d$ is the number of features and $K$ is the number of arms. The former improves on prior work while the latter is the first for Gaussian noise perturbations in non-linear models. We empirically evaluate both GLM-TSL and GLM-FPL in logistic bandits, and apply GLM-FPL to neural network bandits. Our work showcases the role of randomization, beyond posterior sampling, in exploration.
- Improved algorithms for linear stochastic bandits. In Advances in Neural Information Processing Systems 24, pages 2312–2320, 2011.
- Linear Thompson sampling revisited. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, 2017.
- Further optimal regret bounds for Thompson sampling. In Proceedings of the 16th International Conference on Artificial Intelligence and Statistics, pages 99–107, 2013a.
- Thompson sampling for contextual bandits with linear payoffs. In Proceedings of the 30th International Conference on Machine Learning, pages 127–135, 2013b.
- Finite-time analysis of the multiarmed bandit problem. Machine Learning, 47:235–256, 2002.
- An empirical evaluation of Thompson sampling. In Advances in Neural Information Processing Systems 24, pages 2249–2257, 2012.
- Strong consistency of maximum quasi-likelihood estimators in generalized linear models with fixed and adaptive designs. The Annals of Statistics, 27(4):1155–1163, 1999.
- Francois Chollet et al. Keras. https://keras.io, 2015.
- On the performance of thompson sampling on logistic bandits. In Proceedings of the 32nd Annual Conference on Learning Theory, 2019.
- Parametric bandits: The generalized linear case. In Advances in Neural Information Processing Systems 23, pages 586–594, 2010.
- Thompson sampling for complex online problems. In Proceedings of the 31st International Conference on Machine Learning, pages 100–108, 2014.
- James Hannan. Approximation to Bayes risk in repeated play. In Contributions to the Theory of Games, volume 3, pages 97–140. Princeton University Press, Princeton, NJ, 1957.
- Scalable generalized linear bandits: Online computation and hashing. In Advances in Neural Information Processing Systems 30, pages 98–108, 2017.
- Efficient algorithms for online decision problems. Journal of Computer and System Sciences, 71(3):291–307, 2005.
- Efficient Thompson sampling for online matrix-factorization recommendation. In Advances in Neural Information Processing Systems 28, pages 1297–1305, 2015.
- Adam: A method for stochastic optimization. In Proceedings of the 3rd International Conference on Learning Representations, 2015.
- Perturbed-history exploration in stochastic multi-armed bandits. In Proceedings of the 28th International Joint Conference on Artificial Intelligence, 2019a.
- Perturbed-history exploration in stochastic linear bandits. In Proceedings of the 35th Conference on Uncertainty in Artificial Intelligence, 2019b.
- Garbage in, reward out: Bootstrapping exploration in multi-armed bandits. In Proceedings of the 36th International Conference on Machine Learning, pages 3601–3610, 2019c.
- Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics, 6(1):4–22, 1985.
- Bandit Algorithms. Cambridge University Press, 2019.
- Gradient-based learning applied to document recognition. In Proceedings of the IEEE, pages 2278–2324, 1998.
- Provably optimal algorithms for generalized linear contextual bandits. In Proceedings of the 34th International Conference on Machine Learning, pages 2071–2080, 2017.
- BBQ-networks: Efficient exploration in deep reinforcement learning for task-oriented dialogue systems. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence, pages 5237–5244, 2018.
- Customized nonlinear bandits for online response selection in neural conversation models. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence, pages 5245–5252, 2018.
- Xiuyuan Lu and Benjamin Van Roy. Ensemble sampling. In Advances in Neural Information Processing Systems 30, pages 3258–3266, 2017.
- P. McCullagh and J. A. Nelder. Generalized Linear Models. Chapman & Hall, 1989.
- Generalization and exploration via randomized value functions. In Proceedings of the 33rd International Conference on Machine Learning, pages 2377–2386, 2016.
- Deep Bayesian bandits showdown: An empirical comparison of Bayesian deep networks for Thompson sampling. In Proceedings of the 6th International Conference on Learning Representations, 2018.
- A tutorial on Thompson sampling. Foundations and Trends in Machine Learning, 11(1):1–96, 2018.
- William R. Thompson. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika, 25(3-4):285–294, 1933.
- Spectral bandits for smooth graph functions. In Proceedings of the 31st International Conference on Machine Learning, pages 46–54, 2014.
- R. Wolke and H. Schwetlick. Iteratively reweighted least squares: Algorithms, convergence analysis, and numerical comparisons. SIAM Journal on Scientific and Statistical Computing, 9(5):907–921, 1988.
- Fashion-MNIST: A novel image dataset for benchmarking machine learning algorithms. CoRR, abs/1708.07747, 2017. URL http://arxiv.org/abs/1708.07747.
- Adaptive methods for nonconvex optimization. In Advances in Neural Information Processing Systems 31, pages 9793–9803, 2018.
- Online stochastic linear optimization under one-bit feedback. In Proceedings of the 33rd International Conference on Machine Learning, pages 392–401, 2016.