2000 character limit reached
Incentivizing Exploration with Linear Contexts and Combinatorial Actions (2306.01990v3)
Published 3 Jun 2023 in cs.GT and cs.LG
Abstract: We advance the study of incentivized bandit exploration, in which arm choices are viewed as recommendations and are required to be Bayesian incentive compatible. Recent work has shown under certain independence assumptions that after collecting enough initial samples, the popular Thompson sampling algorithm becomes incentive compatible. We give an analog of this result for linear bandits, where the independence of the prior is replaced by a natural convexity condition. This opens up the possibility of efficient and regret-optimal incentivized exploration in high-dimensional action spaces. In the semibandit model, we also improve the sample complexity for the pre-Thompson sampling phase of initial data collection.
- Analysis of thompson sampling for the multi-armed bandit problem. In Conference on learning theory, pages 39–1. JMLR Workshop and Conference Proceedings, 2012.
- Near-optimal regret bounds for thompson sampling. Journal of the ACM (JACM), 64(5):1–24, 2017.
- Incentivising exploration and recommendations for contextual bandits with payments. In Multi-Agent Systems and Agreement Technologies: 17th European Conference, EUMAS 2020, and 7th International Conference, AT 2020, Thessaloniki, Greece, September 14-15, 2020, Revised Selected Papers 17, pages 159–170. Springer, 2020.
- Prior-free and prior-dependent regret bounds for thompson sampling. Advances in neural information processing systems, 26, 2013.
- Information Design: A Unified Perspective. Journal of Economic Literature, 57(1):44–95, 2019.
- First-Order Bayesian Regret Analysis of Thompson Sampling. IEEE Transactions on Information Theory, 2022.
- An Information-Theoretic Analysis for Thompson Sampling with many Actions. Advances in Neural Information Processing Systems, 31, 2018.
- Parametric Bandits: The Generalized Linear Case. In 24th, pages 586–594, 2010.
- Incentivizing exploration. In Proceedings of the fifteenth ACM conference on Economics and computation, pages 5–22, 2014.
- Concentration inequalities and geometry of convex bodies. Analytical and probabilistic methods in the geometry of convex bodies, 2:9–86, 2014.
- Incentivizing Combinatorial Bandit Exploration. Advances in Neural Information Processing Systems (NeurIPS), 2022.
- Bayesian exploration with heterogeneous agents. In The world wide web conference, pages 751–761, 2019.
- Incentivizing Exploration with Selective Data Disclosure. In Proceedings of the 21st ACM Conference on Economics and Computation, pages 647–648, 2020.
- Emir Kamenica. Bayesian Persuasion and Information Design. Annual Review of Economics, 11:249–272, 2019.
- Thompson sampling: An asymptotically optimal finite-time analysis. In International conference on algorithmic learning theory, pages 199–213. Springer, 2012.
- Fairness incentives for myopic agents. In Proceedings of the 2017 ACM Conference on Economics and Computation, pages 369–386, 2017.
- Isoperimetric problems for convex bodies and a localization lemma. Discrete & Computational Geometry, 13(3):541–559, 1995.
- Implementing the “Wisdom of the Crowd”. Journal of Political Economy, 122(5):988–1012, 2014.
- Provably optimal algorithms for generalized linear contextual bandits. In International Conference on Machine Learning, pages 2071–2080. PMLR, 2017.
- Bandit Algorithms. Cambridge University Press, Cambridge, UK, 2020. Versions available at https://banditalgs.com/ since 2018.
- Bayesian Incentive-Compatible Bandit Exploration. Operations Research, 68(4):1132–1161, 2020. Preliminary version in ACM EC 2015.
- Bayesian exploration: Incentivizing exploration in Bayesian games. Operations Research, 70(2), 2022. Preliminary version in ACM EC 2016.
- Lifting the Information Ratio: An Information-Theoretic Analysis of Thompson Sampling for Contextual Bandits. arXiv preprint arXiv:2205.13924, 2022.
- Daniel Russo and Benjamin Van Roy. Learning to optimize via posterior sampling. Mathematics of Operations Research, 39(4):1221–1243, 2014.
- Aleksandrs Slivkins. Introduction to multi-armed bandits. Foundations and Trends®normal-®\circledR® in Machine Learning, 12(1-2):1–286, November 2019. Published with Now Publishers (Boston, MA, USA). Also available at https://arxiv.org/abs/1904.07272. Latest online revision: Jan 2022.
- The Price of Incentivizing Exploration: A Characterization via Thompson Sampling and Sample Complexity. Operations Research, 71(5):1706–1732, 2023.
- William R. Thompson. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika, 25(3-4):285–294, 1933.
- Multi-armed bandits with compensation. Advances in Neural Information Processing Systems, 31, 2018.
- Incentivizing exploration in linear contextual bandits under information gap. In Proceedings of the 17th ACM Conference on Recommender Systems, pages 415–425, 2023.