Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Impact of Decentralized Learning on Player Utilities in Stackelberg Games (2403.00188v2)

Published 29 Feb 2024 in cs.LG and cs.GT

Abstract: When deployed in the world, a learning agent such as a recommender system or a chatbot often repeatedly interacts with another learning agent (such as a user) over time. In many such two-agent systems, each agent learns separately and the rewards of the two agents are not perfectly aligned. To better understand such cases, we examine the learning dynamics of the two-agent system and the implications for each agent's objective. We model these systems as Stackelberg games with decentralized learning and show that standard regret benchmarks (such as Stackelberg equilibrium payoffs) result in worst-case linear regret for at least one player. To better capture these systems, we construct a relaxed regret benchmark that is tolerant to small learning errors by agents. We show that standard learning algorithms fail to provide sublinear regret, and we develop algorithms to achieve near-optimal $O(T{2/3})$ regret for both players with respect to these benchmarks. We further design relaxed environments under which faster learning ($O(\sqrt{T})$) is possible. Altogether, our results take a step towards assessing how two-agent interactions in sequential and decentralized learning environments affect the utility of both agents.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (68)
  1. Corralling a band of bandit algorithms. In Proceedings of the 30th Conference on Learning Theory, COLT 2017, Amsterdam, The Netherlands, 7-10 July 2017, volume 65 of Proceedings of Machine Learning Research, pages 12–38. PMLR, 2017.
  2. Online recommendations for agents with discounted adaptive preferences. CoRR, abs/2302.06014, 2023.
  3. The strategic perceptron. In EC ’21: The 22nd ACM Conference on Economics and Computation, Budapest, Hungary, July 18-23, 2021, pages 6–25. ACM, 2021.
  4. Near-optimal no-regret learning for correlated equilibria in multi-player general-sum games. In STOC ’22: 54th Annual ACM SIGACT Symposium on Theory of Computing, Rome, Italy, June 20 - 24, 2022, pages 736–749. ACM, 2022.
  5. Competing bandits: The perils of exploration under competition. CoRR, abs/2007.10144, 2020.
  6. Finite-time analysis of the multiarmed bandit problem. Mach. Learn., 47(2-3):235–256, 2002.
  7. Sample-efficient learning of stackelberg equilibria in general-sum games. Advances in Neural Information Processing Systems, 34:25799–25811, 2021.
  8. Fine-tuning language models to find agreement among humans with diverse preferences. In Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28 - December 9, 2022, 2022.
  9. Commitment without regrets: Online learning in stackelberg security games. In Proceedings of the sixteenth ACM conference on economics and computation, pages 61–78, 2015.
  10. Learning in repeated auctions with budgets: Regret minimization and equilibrium. Manag. Sci., 65(9):3952–3968, 2019.
  11. Does the whole exceed its parts? the effect of AI explanations on complementary team performance. In CHI ’21: CHI Conference on Human Factors in Computing Systems, Virtual Event / Yokohama, Japan, May 8-13, 2021, pages 81:1–81:16. ACM, 2021.
  12. Dynamics of bid optimization in online advertisement auctions. In Proceedings of the 16th International Conference on World Wide Web, WWW 2007, Banff, Alberta, Canada, May 8-12, 2007, pages 531–540. ACM, 2007.
  13. Selling to a no-regret buyer. In Proceedings of the 2018 ACM Conference on Economics and Computation, Ithaca, NY, USA, June 18-22, 2018, pages 523–538. ACM, 2018.
  14. Is learning in games good for the learners? Advances in Neural Information Processing Systems, 36, 2023.
  15. Mechanisms for a no-regret agent: Beyond the common prior. In 2020 ieee 61st annual symposium on foundations of computer science (focs), pages 259–270. IEEE, 2020.
  16. The assistive multi-armed bandit. In 2019 14th ACM/IEEE International Conference on Human-Robot Interaction (HRI), pages 354–363. IEEE, 2019.
  17. Unleashing the potential of prompt engineering in large language models: a comprehensive review. CoRR, abs/2310.14735, 2023.
  18. Learning strategy-aware linear classifiers. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, 2020.
  19. Efficient stackelberg strategies for finitely repeated games. In Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2023, London, United Kingdom, 29 May 2023 - 2 June 2023, pages 643–651. ACM, 2023a.
  20. Efficient prior-free mechanisms for no-regret agents. arXiv preprint arXiv:2311.07754, 2023b.
  21. Human-aligned calibration for ai-assisted decision making. Advances in Neural Information Processing Systems, 36, 2024.
  22. Near-optimal no-regret algorithms for zero-sum games. In Dana Randall, editor, Proceedings of the Twenty-Second Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2011, San Francisco, California, USA, January 23-25, 2011, pages 235–254. SIAM, 2011.
  23. Near-optimal no-regret learning in general games. In Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual, pages 27604–27616, 2021.
  24. Strategizing against no-regret learners. Advances in neural information processing systems, 32, 2019.
  25. When are two lists better than one?: Benefits and harms in joint decision-making. AAAI ’24’, 2024.
  26. Strategic classification from revealed preferences. In Proceedings of the 2018 ACM Conference on Economics and Computation, Ithaca, NY, USA, June 18-22, 2018, pages 55–70. ACM, 2018.
  27. Behaviorism is not enough: Better recommendations through listening to users. In Shilad Sen, Werner Geyer, Jill Freyne, and Pablo Castells, editors, Proceedings of the 10th ACM Conference on Recommender Systems, Boston, MA, USA, September 15-19, 2016, pages 221–224. ACM, 2016.
  28. Pac bounds for multi-armed bandit and markov decision processes. In COLT, volume 2, pages 255–270. Springer, 2002.
  29. Convergence of learning dynamics in stackelberg games. arXiv preprint arXiv:1906.01217, 2019.
  30. Robust stackelberg equilibria. arXiv preprint arXiv:2304.14990, 2023.
  31. Leveraging reviews: Learning to price with buyer and seller uncertainty. In Proceedings of the 24th ACM Conference on Economics and Computation, EC 2023, London, United Kingdom, July 9-12, 2023, 2023.
  32. Contracting with a learning agent. CoRR, abs/2401.16198, 2024.
  33. Learning in stackelberg games with non-myopic agents. In Proceedings of the 23rd ACM Conference on Economics and Computation, pages 917–918, 2022.
  34. Calibrated stackelberg games: Learning optimal commitments against calibrated agents. arXiv preprint arXiv:2306.02704, 2023.
  35. Regret analysis of repeated delegated choice. arXiv preprint arXiv:2310.04884, 2023.
  36. Learning in online principal-agent interactions: The power of menus. CoRR, abs/2312.09869, 2023.
  37. Regret minimization in stackelberg games with side information. CoRR, abs/2402.08576, 2024.
  38. Zero-shot goal-directed dialogue via RL on imagined conversations. CoRR, abs/2311.05584, 2023.
  39. Competition, alignment, and equilibria in digital marketplaces. In Thirty-Seventh AAAI Conference on Artificial Intelligence, AAAI 2023, pages 5689–5696, 2023.
  40. Decentralized cooperative reinforcement learning with hierarchical information structure. In International Conference on Algorithmic Learning Theory, 29 March - 1 April 2022, Paris, France, volume 167 of Proceedings of Machine Learning Research, pages 573–605. PMLR, 2022.
  41. Gerard Jounghyun Kim. Human-computer interaction: fundamentals and practice. CRC press, 2015.
  42. Delegated search approximates efficient search. In Proceedings of the 2018 ACM Conference on Economics and Computation, pages 287–302, 2018.
  43. The challenge of understanding what users want: Inconsistent preferences and engagement optimization. CoRR, abs/2202.11776, 2022.
  44. How and why to manipulate your own agent: On the incentives of users of learning agents. Advances in Neural Information Processing Systems, 35:28080–28094, 2022.
  45. Bandit Algorithms. University of Cambridge ESOL Examinations, 2020. ISBN 9781108571401.
  46. Research methods in human-computer interaction. Morgan Kaufmann, 2017.
  47. Learning and approximating the optimal strategy to commit to. In Algorithmic Game Theory, Second International Symposium, SAGT 2009, Paphos, Cyprus, October 18-20, 2009. Proceedings, volume 5814 of Lecture Notes in Computer Science, pages 250–262. Springer, 2009.
  48. Autobidders with budget and ROI constraints: Efficiency, regret, and pacing dynamics. CoRR, abs/2301.13306, 2023.
  49. I. Scott MacKenzie. Human-computer interaction: An empirical research perspective. 2024.
  50. From optimizing engagement to measuring value. In FAccT ’21: 2021 ACM Conference on Fairness, Accountability, and Transparency, Virtual Event / Toronto, Canada, March 3-10, 2021, pages 714–722. ACM, 2021.
  51. Econometrics for learning agents. In Proceedings of the sixteenth acm conference on economics and computation, pages 1–18, 2015.
  52. An experimental evaluation of regret-based econometrics. In Proceedings of the 26th International Conference on World Wide Web, WWW ’17, page 73–81. International World Wide Web Conferences Steering Committee, 2017.
  53. Bid prediction in repeated auctions with learning. In Proceedings of the Web Conference 2021, WWW ’21, page 3953–3964, New York, NY, USA, 2021. Association for Computing Machinery.
  54. Model selection in contextual stochastic bandit problems. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, 2020.
  55. Feedback loops with language models drive in-context reward hacking. CoRR, abs/2402.06627, 2024.
  56. Human-computer interaction. Addison-Wesley Longman Ltd., 1994.
  57. Aleksandrs Slivkins. Introduction to multi-armed bandits. Found. Trends Mach. Learn., 12(1-2):1–286, 2019.
  58. Designing decision support systems using counterfactual prediction sets. arXiv preprint arXiv:2306.03928, 2023.
  59. Improving expert predictions with conformal prediction. In International Conference on Machine Learning, pages 32633–32653. PMLR, 2023.
  60. What are you optimizing for? aligning recommender systems with human values. CoRR, abs/2107.10939, 2021.
  61. Improving screening processes via calibrated subset selection. In International Conference on Machine Learning, pages 22702–22726. PMLR, 2022.
  62. Federated machine learning: Concept and applications. ACM Trans. Intell. Syst. Technol., 10(2):12:1–12:19, 2019.
  63. Multi-agent reinforcement learning: A selective overview of theories and algorithms. CoRR, abs/1911.10635, 2019.
  64. Online learning in stackelberg games with an omniscient follower. In International Conference on Machine Learning, pages 42304–42316. PMLR, 2023.
  65. The sample complexity of online contract design. In Proceedings of the 24th ACM Conference on Economics and Computation, EC 2023, London, United Kingdom, July 9-12, 2023, page 1188. ACM, 2023.
  66. Consequences of misaligned AI. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, 2020.
  67. Who leads and who follows in strategic classification? In Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual, pages 15257–15269, 2021.
  68. Optimal machine strategies to commit to in two-person repeated games. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, January 25-30, 2015, Austin, Texas, USA, pages 1071–1078. AAAI Press, 2015.
Citations (2)

Summary

We haven't generated a summary for this paper yet.