Papers
Topics
Authors
Recent
Search
2000 character limit reached

Robust and Performance Incentivizing Algorithms for Multi-Armed Bandits with Strategic Agents

Published 13 Dec 2023 in cs.GT and cs.LG | (2312.07929v2)

Abstract: Motivated by applications such as online labor markets we consider a variant of the stochastic multi-armed bandit problem where we have a collection of arms representing strategic agents with different performance characteristics. The platform (principal) chooses an agent in each round to complete a task. Unlike the standard setting, when an arm is pulled it can modify its reward by absorbing it or improving it at the expense of a higher cost. The principle has to solve a mechanism design problem to incentivize the arms to give their best performance. However, since even with an effective mechanism agents may still deviate from rational behavior, the principal wants a robust algorithm that also gives a non-vacuous guarantee on the total accumulated rewards under non-equilibrium behavior. In this paper, we introduce a class of bandit algorithms that meet the two objectives of performance incentivization and robustness simultaneously. We do this by identifying a collection of intuitive properties that a bandit algorithm has to satisfy to achieve these objectives. Finally, we show that settings where the principal has no information about the arms' performance characteristics can be handled by combining ideas from second price auctions with our algorithms.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (18)
  1. Fiduciary bandits. In International Conference on Machine Learning, pages 518–527. PMLR, 2020.
  2. Multi-armed bandit problems with strategic arms. In A. Beygelzimer and D. Hsu, editors, Proceedings of the Thirty-Second Conference on Learning Theory, volume 99 of Proceedings of Machine Learning Research, pages 383–416. PMLR, 25–28 Jun 2019. URL https://proceedings.mlr.press/v99/braverman19b.html.
  3. G. Carroll. Robustness and linear contracts. American Economic Review, 105(2):536–63, 2015.
  4. Designing menus of contracts efficiently: The power of randomization. arXiv preprint arXiv:2202.10966, 2022.
  5. M. Chandrasekher. Unraveling in a repeated moral hazard model with multiple agents. Theoretical Economics, 10(1):11–49, 2015.
  6. Simple versus optimal contracts. In Proceedings of the 2019 ACM Conference on Economics and Computation, pages 369–387, 2019.
  7. The intrinsic robustness of stochastic bandits to strategic manipulation. In International Conference on Machine Learning, pages 3092–3101. PMLR, 2020.
  8. A. Ghosh and P. Hummel. A game-theoretic analysis of rank-order mechanisms for user-generated content. In Proceedings of the 12th ACM conference on Electronic commerce, pages 189–198, 2011.
  9. A. Ghosh and P. Hummel. Learning and incentives in user-generated content: Multi-armed bandits with endogenous arms. In Proceedings of the 4th conference on Innovations in Theoretical Computer Science, pages 233–246, 2013.
  10. A. Ghosh and P. McAfee. Incentivizing high-quality user-generated content. In Proceedings of the 20th international conference on World wide web, pages 137–146, 2011.
  11. B. Holmström. Moral hazard and observability. The Bell journal of economics, pages 74–91, 1979.
  12. Designing incentives for online question and answer forums. In Proceedings of the 10th ACM conference on Electronic commerce, pages 129–138, 2009.
  13. Implementing the “wisdom of the crowd”. Journal of Political Economy, 122(5):988–1012, 2014.
  14. Bayesian incentive-compatible bandit exploration. In Proceedings of the Sixteenth ACM Conference on Economics and Computation, pages 565–582, 2015.
  15. M. Mitzenmacher and E. Upfal. Probability and computing: Randomization and probabilistic techniques in algorithms and data analysis. Cambridge university press, 2017.
  16. W. P. Rogerson. Repeated moral hazard. Econometrica: Journal of the Econometric Society, pages 69–76, 1985.
  17. Multi-armed bandit algorithm against strategic replication. In International Conference on Artificial Intelligence and Statistics, pages 403–431. PMLR, 2022.
  18. S. Wang and L. Huang. Multi-armed bandits with compensation. Advances in Neural Information Processing Systems, 31, 2018.
Citations (3)

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.