Incentivized Learning in Principal-Agent Bandit Games (2403.03811v1)
Abstract: This work considers a repeated principal-agent bandit game, where the principal can only interact with her environment through the agent. The principal and the agent have misaligned objectives and the choice of action is only left to the agent. However, the principal can influence the agent's decisions by offering incentives which add up to his rewards. The principal aims to iteratively learn an incentive policy to maximize her own total utility. This framework extends usual bandit problems and is motivated by several practical applications, such as healthcare or ecological taxation, where traditionally used mechanism design theories often overlook the learning aspect of the problem. We present nearly optimal (with respect to a horizon $T$) learning algorithms for the principal's regret in both multi-armed and linear contextual settings. Finally, we support our theoretical guarantees through numerical experiments.
- Associative reinforcement learning using linear probabilistic concepts. In ICML, pp. 3–11, 1999.
- Auer, P. Using confidence bounds for exploitation-exploration trade-offs. Journal of Machine Learning Research, 3(Nov):397–422, 2002.
- Bandit social learning: Exploration under myopic behavior. arXiv preprint arXiv:2302.07425, 2023.
- Principal-agent reward shaping in mdps. arXiv preprint arXiv:2401.00298, 2023.
- Solving convex programs by random walks. Journal of the ACM (JACM), 51(4):540–556, 2004.
- A survey on multi-player bandits. arXiv preprint arXiv:2211.16275, 2022.
- The best of both worlds: Stochastic and adversarial bandits. In Conference on Learning Theory, pp. 42–1. JMLR Workshop and Conference Proceedings, 2012.
- Optimal adaptive policies for sequential allocation problems. Advances in Applied Mathematics, 17(2):122–142, 1996.
- Doubly high-dimensional contextual bandits: An interpretable model for joint assortment-pricing. arXiv preprint arXiv:2309.08634, 2023.
- Online bayesian persuasion. Advances in Neural Information Processing Systems, 33:16188–16198, 2020.
- Learning approximately optimal contracts. In International Symposium on Algorithmic Game Theory, pp. 331–346. Springer, 2022.
- Feature-based dynamic pricing. Management Science, 66(11):4921–4943, 2020.
- Learning algorithms for online principal-agent problems (and selling goods online). In Proceedings of the 23rd international conference on Machine learning, pp. 209–216, 2006.
- Stochastic linear optimization under bandit feedback. 2008.
- Den Boer, A. V. Dynamic pricing and learning: historical origins, current research, and new directions. Surveys in operations research and management science, 20(1):1–18, 2015.
- Estimating and incentivizing imperfect-knowledge agents with hidden rewards. arXiv preprint arXiv:2308.06717, 2023a.
- Repeated principal-agent games with unobserved agent rewards and perfect-knowledge agents. arXiv preprint arXiv:2304.07407, 2023b.
- From data to decisions: helping crop producers build their actionable knowledge. Critical reviews in plant sciences, 36(2):71–88, 2017.
- Gittins, J. C. Bandit processes and dynamic allocation indices. Journal of the Royal Statistical Society Series B: Statistical Methodology, 41(2):148–164, 1979.
- Incentive-aware contextual pricing with non-parametric market noise. In International Conference on Artificial Intelligence and Statistics, pp. 9331–9361. PMLR, 2023.
- Geometric algorithms and combinatorial optimization, volume 2. Springer Science & Business Media, 2012.
- Nearly optimal algorithms for linear contextual bandits with adversarial corruptions. Advances in Neural Information Processing Systems, 35:34614–34625, 2022.
- Incentivizing combinatorial bandit exploration. Advances in Neural Information Processing Systems, 35:37173–37183, 2022.
- Dynamic pricing in high-dimensions. The Journal of Machine Learning Research, 20(1):315–363, 2019.
- Bayesian persuasion. American Economic Review, 101(6):2590–2615, 2011.
- Corruption-tolerant bandit learning. Machine Learning, 108(4):687–715, 2019.
- The theory of incentives: the principal-agent model. In The Theory of Incentives. Princeton University Press, 2009.
- Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics, 6(1):4–22, 1985.
- Bandit Algorithms. Cambridge University Press, 2020.
- A contextual-bandit approach to personalized news article recommendation. In Proceedings of the 19th International Conference on the World Wide Web, pp. 661–670, 2010.
- Multidimensional binary search for contextual decision-making. Operations Research, 66(5):1346–1361, 2018.
- Stochastic bandits robust to adversarial corruptions. In Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, pp. 114–122, 2018.
- Bayesian incentive-compatible bandit exploration. Operations Research, 68(4):1132–1161, 2020.
- Contextual pricing for lipschitz buyers. Advances in Neural Information Processing Systems, 31, 2018.
- Myerson, R. B. Mechanism design. Springer, 1989.
- Rademacher, L. A. Approximating the centroid is hard. In Proceedings of the Twenty-Third Annual Symposium on Computational Geometry, pp. 302–305, 2007.
- Linearly parameterized bandits. Mathematics of Operations Research, 35(2):395–411, 2010.
- The price of incentivizing exploration: A characterization via thompson sampling and sample complexity. In Proceedings of the 22nd ACM Conference on Economics and Computation, pp. 795–796, 2021.
- Exploration and incentives in reinforcement learning. Operations Research, 2023.
- Slivkins, A. et al. Introduction to multi-armed bandits. Foundations and Trends® in Machine Learning, 12(1-2):1–286, 2019.
- How small is a unit ball? Mathematics Magazine, 62(2):101–107, 1989.
- Thompson, W. R. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika, 25(3-4):285–294, 1933.
- Woodroofe, M. A one-armed bandit problem with a concomitant variable. Journal of the American Statistical Association, 74(368):799–806, 1979.
- Reinforcement learning in healthcare: A survey. ACM Computing Surveys (CSUR), 55(1):1–36, 2021.
- Antoine Scheid (5 papers)
- Daniil Tiapkin (24 papers)
- Etienne Boursier (27 papers)
- Aymeric Capitaine (5 papers)
- El Mahdi El Mhamdi (12 papers)
- Eric Moulines (151 papers)
- Michael I. Jordan (438 papers)
- Alain Durmus (98 papers)