Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Robust and Performance Incentivizing Algorithms for Multi-Armed Bandits with Strategic Agents (2312.07929v1)

Published 13 Dec 2023 in cs.GT and cs.LG

Abstract: We consider a variant of the stochastic multi-armed bandit problem. Specifically, the arms are strategic agents who can improve their rewards or absorb them. The utility of an agent increases if she is pulled more or absorbs more of her rewards but decreases if she spends more effort improving her rewards. Agents have heterogeneous properties, specifically having different means and able to improve their rewards up to different levels. Further, a non-empty subset of agents are ''honest'' and in the worst case always give their rewards without absorbing any part. The principal wishes to obtain a high revenue (cumulative reward) by designing a mechanism that incentives top level performance at equilibrium. At the same time, the principal wishes to be robust and obtain revenue at least at the level of the honest agent with the highest mean in case of non-equilibrium behaviour. We identify a class of MAB algorithms which we call performance incentivizing which satisfy a collection of properties and show that they lead to mechanisms that incentivize top level performance at equilibrium and are robust under any strategy profile. Interestingly, we show that UCB is an example of such a MAB algorithm. Further, in the case where the top performance level is unknown we show that combining second price auction ideas with performance incentivizing algorithms achieves performance at least at the second top level while also being robust.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (18)
  1. Fiduciary bandits. In International Conference on Machine Learning, pages 518–527. PMLR, 2020.
  2. Multi-armed bandit problems with strategic arms. In A. Beygelzimer and D. Hsu, editors, Proceedings of the Thirty-Second Conference on Learning Theory, volume 99 of Proceedings of Machine Learning Research, pages 383–416. PMLR, 25–28 Jun 2019. URL https://proceedings.mlr.press/v99/braverman19b.html.
  3. G. Carroll. Robustness and linear contracts. American Economic Review, 105(2):536–63, 2015.
  4. Designing menus of contracts efficiently: The power of randomization. arXiv preprint arXiv:2202.10966, 2022.
  5. M. Chandrasekher. Unraveling in a repeated moral hazard model with multiple agents. Theoretical Economics, 10(1):11–49, 2015.
  6. Simple versus optimal contracts. In Proceedings of the 2019 ACM Conference on Economics and Computation, pages 369–387, 2019.
  7. The intrinsic robustness of stochastic bandits to strategic manipulation. In International Conference on Machine Learning, pages 3092–3101. PMLR, 2020.
  8. A. Ghosh and P. Hummel. A game-theoretic analysis of rank-order mechanisms for user-generated content. In Proceedings of the 12th ACM conference on Electronic commerce, pages 189–198, 2011.
  9. A. Ghosh and P. Hummel. Learning and incentives in user-generated content: Multi-armed bandits with endogenous arms. In Proceedings of the 4th conference on Innovations in Theoretical Computer Science, pages 233–246, 2013.
  10. A. Ghosh and P. McAfee. Incentivizing high-quality user-generated content. In Proceedings of the 20th international conference on World wide web, pages 137–146, 2011.
  11. B. Holmström. Moral hazard and observability. The Bell journal of economics, pages 74–91, 1979.
  12. Designing incentives for online question and answer forums. In Proceedings of the 10th ACM conference on Electronic commerce, pages 129–138, 2009.
  13. Implementing the “wisdom of the crowd”. Journal of Political Economy, 122(5):988–1012, 2014.
  14. Bayesian incentive-compatible bandit exploration. In Proceedings of the Sixteenth ACM Conference on Economics and Computation, pages 565–582, 2015.
  15. M. Mitzenmacher and E. Upfal. Probability and computing: Randomization and probabilistic techniques in algorithms and data analysis. Cambridge university press, 2017.
  16. W. P. Rogerson. Repeated moral hazard. Econometrica: Journal of the Econometric Society, pages 69–76, 1985.
  17. Multi-armed bandit algorithm against strategic replication. In International Conference on Artificial Intelligence and Statistics, pages 403–431. PMLR, 2022.
  18. S. Wang and L. Huang. Multi-armed bandits with compensation. Advances in Neural Information Processing Systems, 31, 2018.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Seyed A. Esmaeili (13 papers)
  2. Suho Shin (15 papers)
  3. Aleksandrs Slivkins (67 papers)
Citations (3)