Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
158 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Improved Bandits in Many-to-one Matching Markets with Incentive Compatibility (2401.01528v2)

Published 3 Jan 2024 in cs.LG and cs.GT

Abstract: Two-sided matching markets have been widely studied in the literature due to their rich applications. Since participants are usually uncertain about their preferences, online algorithms have recently been adopted to learn them through iterative interactions. An existing work initiates the study of this problem in a many-to-one setting with responsiveness. However, their results are far from optimal and lack guarantees of incentive compatibility. We first extend an existing algorithm for the one-to-one setting to this more general setting and show it achieves a near-optimal bound for player-optimal regret. Nevertheless, due to the substantial requirement for collaboration, a single player's deviation could lead to a huge increase in its own cumulative rewards and a linear regret for others. In this paper, we aim to enhance the regret bound in many-to-one markets while ensuring incentive compatibility. We first propose the adaptively explore-then-deferred-acceptance (AETDA) algorithm for responsiveness setting and derive an upper bound for player-optimal stable regret while demonstrating its guarantee of incentive compatibility. To the best of our knowledge, it constitutes the first polynomial player-optimal guarantee in matching markets that offers such robust assurances without known $\Delta$, where $\Delta$ is some preference gap among players and arms. We also consider broader substitutable preferences, one of the most general conditions to ensure the existence of a stable matching and cover responsiveness. We devise an online DA (ODA) algorithm and establish an upper bound for the player-pessimal stable regret for this setting.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (34)
  1. Abdulkadiroğlu, A. 2005. College admissions with affirmative action. International Journal of Game Theory, 33(4): 535–549.
  2. House allocation with existing tenants. Journal of Economic Theory, 88(2): 233–260.
  3. Analysis of thompson sampling for the multi-armed bandit problem. In Conference on Learning Theory, 39–1.
  4. Finite-time analysis of the multiarmed bandit problem. Machine learning, 47(2): 235–256.
  5. UCB revisited: Improved regret bounds for the stochastic multi-armed bandit problem. Periodica Mathematica Hungarica, 61(1-2): 55–65.
  6. Beyond log2⁡(T)superscript2𝑇\log^{2}(T)roman_log start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_T ) regret for decentralized bandits in matching markets. In International Conference on Machine Learning, 705–715.
  7. Distributed multi-player bandits-a game of thrones approach. Advances in Neural Information Processing Systems, 7222–7232.
  8. Cooperative and stochastic multi-player multi-armed bandit: Optimal regret with neither communication nor collisions. In Conference on Learning Theory, 821–822. PMLR.
  9. Stable matching in large economies. Econometrica, 87(1): 65–110.
  10. Two-sided bandits and the dating market. In International Joint Conference on Artificial Intelligence, 947–952.
  11. Machiavelli and the Gale-Shapley algorithm. The American Mathematical Monthly, 88(7): 485–494.
  12. Admission, tuition, and financial aid policies in the market for higher education. Econometrica, 74(4): 885–928.
  13. Efficiency and stability under substitutable priorities with ties. Journal of Economic Theory, 184: 104950.
  14. Fu, C. 2014. Equilibrium tuition, applications, admissions, and enrollment in the college market. Journal of Political Economy, 122(2): 225–281.
  15. College admissions and the stability of marriage. The American Mathematical Monthly, 69(1): 9–15.
  16. On explore-then-commit strategies. Advances in Neural Information Processing Systems, 29: 784–792.
  17. Decentralized Competing Bandits in Non-Stationary Matching Markets. arXiv preprint arXiv:2206.00120.
  18. Job matching, coalition formation, and gross substitutes. Econometrica: Journal of the Econometric Society, 1483–1504.
  19. Player-optimal Stable Regret for Bandit Learning in Matching Markets. In Proceedings of the 2023 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA). SIAM.
  20. Thompson Sampling for Bandit Learning in Matching Markets. In International Joint Conference on Artificial Intelligence.
  21. Bandit algorithms. Cambridge University Press.
  22. Competing bandits in matching markets. In International Conference on Artificial Intelligence and Statistics, 1618–1628. PMLR.
  23. Bandit learning in decentralized matching markets. Journal of Machine Learning Research, 22(211): 1–34.
  24. Decentralized, Communication-and Coordination-free Learning in Structured Matching Markets. In Advances in Neural Information Processing Systems.
  25. Multi-player bandits–a musical chairs approach. In International Conference on Machine Learning, 155–163. PMLR.
  26. Roth, A. E. 1982. The economics of matching: Stability and incentives. Mathematics of operations research, 7(4): 617–628.
  27. Roth, A. E. 1984a. The evolution of the labor market for medical interns and residents: a case study in game theory. Journal of political Economy, 92(6): 991–1016.
  28. Roth, A. E. 1984b. Stability and polarization of interests in job matching. Econometrica: Journal of the Econometric Society, 47–57.
  29. Two-sided matching. Handbook of game theory with economic applications, 1: 485–541.
  30. Dominate or Delete: Decentralized Competing Bandits in Serial Dictatorship. In International Conference on Artificial Intelligence and Statistics, 1252–1260. PMLR.
  31. Heterogeneous Multi-player Multi-armed Bandits: Closing the Gap and Generalization. Advances in Neural Information Processing Systems, 34.
  32. Manipulating gale-shapley algorithm: preserving stability and remaining inconspicuous. In Proceedings of the 26th International Joint Conference on Artificial Intelligence, 437–443.
  33. Bandit Learning in Many-to-One Matching Markets. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management, 2088–2097.
  34. Matching in Multi-arm Bandit with Collision. In Advances in Neural Information Processing Systems.
Citations (3)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets