Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Replication-proof Bandit Mechanism Design (2312.16896v1)

Published 28 Dec 2023 in cs.GT, cs.AI, and cs.DS

Abstract: We study a problem of designing replication-proof bandit mechanisms when agents strategically register or replicate their own arms to maximize their payoff. We consider Bayesian agents who are unaware of ex-post realization of their own arms' mean rewards, which is the first to study Bayesian extension of Shin et al. (2022). This extension presents significant challenges in analyzing equilibrium, in contrast to the fully-informed setting by Shin et al. (2022) under which the problem simply reduces to a case where each agent only has a single arm. With Bayesian agents, even in a single-agent setting, analyzing the replication-proofness of an algorithm becomes complicated. Remarkably, we first show that the algorithm proposed by Shin et al. (2022), defined H-UCB, is no longer replication-proof for any exploration parameters. Then, we provide sufficient and necessary conditions for an algorithm to be replication-proof in the single-agent setting. These results centers around several analytical results in comparing the expected regret of multiple bandit instances, which might be of independent interest. We further prove that exploration-then-commit (ETC) algorithm satisfies these properties, whereas UCB does not, which in fact leads to the failure of being replication-proof. We expand this result to multi-agent setting, and provide a replication-proof algorithm for any problem instance. The proof mainly relies on the single-agent result, as well as some structural properties of ETC and the novel introduction of a restarting round, which largely simplifies the analysis while maintaining the regret unchanged (up to polylogarithmic factor). We finalize our result by proving its sublinear regret upper bound, which matches that of H-UCB.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (39)
  1. Analysis of thompson sampling for the multi-armed bandit problem. In Conference on learning theory, pages 39–1. JMLR Workshop and Conference Proceedings, 2012.
  2. Best arm identification in multi-armed bandits. In COLT, pages 41–53, 2010.
  3. Finite-time analysis of the multiarmed bandit problem. Machine learning, 47:235–256, 2002.
  4. Resourceful contextual bandits. In Conference on Learning Theory, pages 1109–1134. PMLR, 2014.
  5. Bandits with knapsacks. Journal of the ACM (JACM), 65(3):1–55, 2018.
  6. Learning from neighbours. The review of economic studies, 65(3):595–621, 1998.
  7. Abhijit V Banerjee. A simple model of herd behavior. The quarterly journal of economics, 107(3):797–817, 1992.
  8. Bandit social learning: Exploration under myopic behavior. arXiv preprint arXiv:2302.07425, 2023.
  9. Introduction to bandits in recommender systems. In Proceedings of the 14th ACM Conference on Recommender Systems, pages 748–750, 2020.
  10. Bandit problems with infinitely many arms. The Annals of Statistics, pages 2103–2116, 1997.
  11. Multi-armed bandit problems with strategic arms. In Conference on Learning Theory, pages 383–416. PMLR, 2019.
  12. Simple regret for infinitely many armed bandits. In International Conference on Machine Learning, pages 1133–1141. PMLR, 2015.
  13. Recommender systems as mechanisms for social learning. The Quarterly Journal of Economics, 133(2):871–925, 2018.
  14. Incentivizing exploration by heterogeneous users. In Conference On Learning Theory, pages 798–818. PMLR, 2018.
  15. A survey on practical applications of multi-armed and contextual bandits. arXiv preprint arXiv:1904.10040, 2019.
  16. Robust and performance incentivizing algorithms for multi-armed bandits with strategic agents. arXiv preprint arXiv:2312.07929, 2023.
  17. The intrinsic robustness of stochastic bandits to strategic manipulation. In International Conference on Machine Learning, pages 3092–3101. PMLR, 2020.
  18. Incentivizing exploration. In Proceedings of the fifteenth ACM conference on Economics and computation, pages 5–22, 2014.
  19. On explore-then-commit strategies. Advances in Neural Information Processing Systems, 29, 2016.
  20. Learning and incentives in user-generated content: Multi-armed bandits with endogenous arms. In Proceedings of the 4th conference on Innovations in Theoretical Computer Science, pages 233–246, 2013.
  21. Regret analysis of repeated delegated choice. arXiv preprint arXiv:2310.04884, 2023.
  22. Incentivizing exploration with heterogeneous value of money. In Web and Internet Economics: 11th International Conference, WINE 2015, Amsterdam, The Netherlands, December 9-12, 2015, Proceedings 11, pages 370–383. Springer, 2015.
  23. Jason D Hartline et al. Bayesian mechanism design. Foundations and Trends® in Theoretical Computer Science, 8(3):143–263, 2013.
  24. Incentivizing exploration with selective data disclosure. arXiv preprint arXiv:1811.06026, 2018.
  25. Delegated search approximates efficient search. In Proceedings of the 2018 ACM Conference on Economics and Computation, pages 287–302, 2018.
  26. Implementing the “wisdom of the crowd”. Journal of Political Economy, 122(5):988–1012, 2014.
  27. The network structure of exploration and exploitation. Administrative science quarterly, 52(4):667–694, 2007.
  28. Competing bandits in matching markets. In International Conference on Artificial Intelligence and Statistics, pages 1618–1628. PMLR, 2020.
  29. Bandit learning in decentralized matching markets. The Journal of Machine Learning Research, 22(1):9612–9645, 2021.
  30. Bayesian incentive-compatible bandit exploration. In Proceedings of the Sixteenth ACM Conference on Economics and Computation, pages 565–582, 2015.
  31. Algorithmic game theory. Cambridge university press, 2007.
  32. The price of incentivizing exploration: A characterization via thompson sampling and sample complexity. In Proceedings of the 22nd ACM Conference on Economics and Computation, pages 795–796, 2021.
  33. Multi-armed bandit algorithm against strategic replication. In International Conference on Artificial Intelligence and Statistics, pages 403–431. PMLR, 2022.
  34. Aleksandrs Slivkins et al. Introduction to multi-armed bandits. Foundations and Trends® in Machine Learning, 12(1-2):1–286, 2019.
  35. Pathological outcomes of observational learning. Econometrica, 68(2):371–398, 2000.
  36. Ensemble contextual bandits for personalized recommendation. In Proceedings of the 8th ACM Conference on Recommender Systems, pages 73–80, 2014.
  37. Infinitely many-armed bandits. In Advances in Neural Information Processing Systems, 2008.
  38. Yaming Yu. Stochastic ordering of exponential family distributions and their mixturesxk. Journal of Applied Probability, 46(1):244–254, 2009.
  39. On regret with multiple best arms. arXiv preprint arXiv:2006.14785, 2020.
Citations (1)

Summary

We haven't generated a summary for this paper yet.