Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
51 tokens/sec
GPT-4o
11 tokens/sec
Gemini 2.5 Pro Pro
52 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
10 tokens/sec
DeepSeek R1 via Azure Pro
33 tokens/sec
2000 character limit reached

Randomized Exploration in Generalized Linear Bandits (1906.08947v3)

Published 21 Jun 2019 in cs.LG and stat.ML

Abstract: We study two randomized algorithms for generalized linear bandits. The first, GLM-TSL, samples a generalized linear model (GLM) from the Laplace approximation to the posterior distribution. The second, GLM-FPL, fits a GLM to a randomly perturbed history of past rewards. We analyze both algorithms and derive $\tilde{O}(d \sqrt{n \log K})$ upper bounds on their $n$-round regret, where $d$ is the number of features and $K$ is the number of arms. The former improves on prior work while the latter is the first for Gaussian noise perturbations in non-linear models. We empirically evaluate both GLM-TSL and GLM-FPL in logistic bandits, and apply GLM-FPL to neural network bandits. Our work showcases the role of randomization, beyond posterior sampling, in exploration.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (36)
  1. Improved algorithms for linear stochastic bandits. In Advances in Neural Information Processing Systems 24, pages 2312–2320, 2011.
  2. Linear Thompson sampling revisited. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, 2017.
  3. Further optimal regret bounds for Thompson sampling. In Proceedings of the 16th International Conference on Artificial Intelligence and Statistics, pages 99–107, 2013a.
  4. Thompson sampling for contextual bandits with linear payoffs. In Proceedings of the 30th International Conference on Machine Learning, pages 127–135, 2013b.
  5. Finite-time analysis of the multiarmed bandit problem. Machine Learning, 47:235–256, 2002.
  6. An empirical evaluation of Thompson sampling. In Advances in Neural Information Processing Systems 24, pages 2249–2257, 2012.
  7. Strong consistency of maximum quasi-likelihood estimators in generalized linear models with fixed and adaptive designs. The Annals of Statistics, 27(4):1155–1163, 1999.
  8. Francois Chollet et al. Keras. https://keras.io, 2015.
  9. On the performance of thompson sampling on logistic bandits. In Proceedings of the 32nd Annual Conference on Learning Theory, 2019.
  10. Parametric bandits: The generalized linear case. In Advances in Neural Information Processing Systems 23, pages 586–594, 2010.
  11. Thompson sampling for complex online problems. In Proceedings of the 31st International Conference on Machine Learning, pages 100–108, 2014.
  12. James Hannan. Approximation to Bayes risk in repeated play. In Contributions to the Theory of Games, volume 3, pages 97–140. Princeton University Press, Princeton, NJ, 1957.
  13. Scalable generalized linear bandits: Online computation and hashing. In Advances in Neural Information Processing Systems 30, pages 98–108, 2017.
  14. Efficient algorithms for online decision problems. Journal of Computer and System Sciences, 71(3):291–307, 2005.
  15. Efficient Thompson sampling for online matrix-factorization recommendation. In Advances in Neural Information Processing Systems 28, pages 1297–1305, 2015.
  16. Adam: A method for stochastic optimization. In Proceedings of the 3rd International Conference on Learning Representations, 2015.
  17. Perturbed-history exploration in stochastic multi-armed bandits. In Proceedings of the 28th International Joint Conference on Artificial Intelligence, 2019a.
  18. Perturbed-history exploration in stochastic linear bandits. In Proceedings of the 35th Conference on Uncertainty in Artificial Intelligence, 2019b.
  19. Garbage in, reward out: Bootstrapping exploration in multi-armed bandits. In Proceedings of the 36th International Conference on Machine Learning, pages 3601–3610, 2019c.
  20. Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics, 6(1):4–22, 1985.
  21. Bandit Algorithms. Cambridge University Press, 2019.
  22. Gradient-based learning applied to document recognition. In Proceedings of the IEEE, pages 2278–2324, 1998.
  23. Provably optimal algorithms for generalized linear contextual bandits. In Proceedings of the 34th International Conference on Machine Learning, pages 2071–2080, 2017.
  24. BBQ-networks: Efficient exploration in deep reinforcement learning for task-oriented dialogue systems. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence, pages 5237–5244, 2018.
  25. Customized nonlinear bandits for online response selection in neural conversation models. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence, pages 5245–5252, 2018.
  26. Xiuyuan Lu and Benjamin Van Roy. Ensemble sampling. In Advances in Neural Information Processing Systems 30, pages 3258–3266, 2017.
  27. P. McCullagh and J. A. Nelder. Generalized Linear Models. Chapman & Hall, 1989.
  28. Generalization and exploration via randomized value functions. In Proceedings of the 33rd International Conference on Machine Learning, pages 2377–2386, 2016.
  29. Deep Bayesian bandits showdown: An empirical comparison of Bayesian deep networks for Thompson sampling. In Proceedings of the 6th International Conference on Learning Representations, 2018.
  30. A tutorial on Thompson sampling. Foundations and Trends in Machine Learning, 11(1):1–96, 2018.
  31. William R. Thompson. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika, 25(3-4):285–294, 1933.
  32. Spectral bandits for smooth graph functions. In Proceedings of the 31st International Conference on Machine Learning, pages 46–54, 2014.
  33. R. Wolke and H. Schwetlick. Iteratively reweighted least squares: Algorithms, convergence analysis, and numerical comparisons. SIAM Journal on Scientific and Statistical Computing, 9(5):907–921, 1988.
  34. Fashion-MNIST: A novel image dataset for benchmarking machine learning algorithms. CoRR, abs/1708.07747, 2017. URL http://arxiv.org/abs/1708.07747.
  35. Adaptive methods for nonconvex optimization. In Advances in Neural Information Processing Systems 31, pages 9793–9803, 2018.
  36. Online stochastic linear optimization under one-bit feedback. In Proceedings of the 33rd International Conference on Machine Learning, pages 392–401, 2016.
Citations (86)

Summary

We haven't generated a summary for this paper yet.