Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Pareto-Optimal Algorithms for Learning in Games (2402.09549v1)

Published 14 Feb 2024 in cs.GT

Abstract: We study the problem of characterizing optimal learning algorithms for playing repeated games against an adversary with unknown payoffs. In this problem, the first player (called the learner) commits to a learning algorithm against a second player (called the optimizer), and the optimizer best-responds by choosing the optimal dynamic strategy for their (unknown but well-defined) payoff. Classic learning algorithms (such as no-regret algorithms) provide some counterfactual guarantees for the learner, but might perform much more poorly than other learning algorithms against particular optimizer payoffs. In this paper, we introduce the notion of asymptotically Pareto-optimal learning algorithms. Intuitively, if a learning algorithm is Pareto-optimal, then there is no other algorithm which performs asymptotically at least as well against all optimizers and performs strictly better (by at least $\Omega(T)$) against some optimizer. We show that well-known no-regret algorithms such as Multiplicative Weights and Follow The Regularized Leader are Pareto-dominated. However, while no-regret is not enough to ensure Pareto-optimality, we show that a strictly stronger property, no-swap-regret, is a sufficient condition for Pareto-optimality. Proving these results requires us to address various technical challenges specific to repeated play, including the fact that there is no simple characterization of how optimizers who are rational in the long-term best-respond against a learning algorithm over multiple rounds of play. To address this, we introduce the idea of the asymptotic menu of a learning algorithm: the convex closure of all correlated distributions over strategy profiles that are asymptotically implementable by an adversary. We show that all no-swap-regret algorithms share the same asymptotic menu, implying that all no-swap-regret algorithms are ``strategically equivalent''.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (35)
  1. Near-optimal no-regret learning for correlated equilibria in multi-player general-sum games. In ACM Symposium on Theory of Computing, 2022.
  2. On last-iterate convergence beyond zero-sum games. In International Conference on Machine Learning, 2022.
  3. Commitment without regrets: Online learning in stackelberg security games. In Proceedings of the Sixteenth ACM Conference on Economics and Computation, EC ’15, page 61–78, New York, NY, USA, 2015. Association for Computing Machinery.
  4. David Blackwell. An analog of the minimax theorem for vector payoffs. 1956.
  5. From external to internal regret. Journal of Machine Learning Research, 8(6), 2007.
  6. Selling to a no-regret buyer. In Proceedings of the 2018 ACM Conference on Economics and Computation, pages 523–538, 2018.
  7. Is learning in games good for the learners? In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
  8. Efficient stackelberg strategies for finitely repeated games. In Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems, pages 643–651, 2023.
  9. Prediction, learning, and games. Cambridge university press, 2006.
  10. Persuading a behavioral agent: Approximately best responding and learning. arXiv preprint arXiv:2302.03719, 2023.
  11. Computing the optimal strategy to commit to. In Proceedings of the 7th ACM conference on Electronic commerce, pages 82–90, 2006.
  12. Selling to multiple no-regret buyers. In International Conference on Web and Internet Economics, pages 113–129. Springer, 2023.
  13. From external to swap regret 2.0: An efficient reduction and oblivious adversary for large action spaces. arXiv preprint arXiv:2310.19786, 2023.
  14. Prior-free dynamic auctions with low regret buyers. Advances in Neural Information Processing Systems, 32, 2019.
  15. Strategizing against no-regret learners. Advances in Neural Information Processing Systems, 32, 2019.
  16. Near-optimal no-regret learning dynamics for general convex games. In Neural Information Processing Systems (NeurIPS), 2022.
  17. The theory of learning in games, volume 2. MIT press, 1998.
  18. Calibrated learning and correlated equilibrium. Games and Economic Behavior, 21(1-2):40, 1997.
  19. Contracting with a learning agent. arXiv preprint arXiv:2401.16198, 2024.
  20. Lower bounds on individual sequence regret. Machine Learning, 103:1–26, 2016.
  21. Elad Hazan. 10 the convex optimization approach to regret minimization. Optimization for machine learning, page 287, 2012.
  22. Jeff Henrikson. Completeness and total boundedness of the hausdorff metric. MIT Undergraduate Journal of Mathematics, 1(69-80):10, 1999.
  23. Learning in stackelberg games with non-myopic agents. In Proceedings of the 23rd ACM Conference on Economics and Computation, pages 917–918, 2022.
  24. A simple adaptive procedure leading to correlated equilibrium. Econometrica, 68(5):1127–1150, 2000.
  25. Auctions between regret-minimizing agents. In Proceedings of the ACM Web Conference 2022, pages 100–111, 2022.
  26. How and why to manipulate your own agent: On the incentives of users of learning agents. Advances in Neural Information Processing Systems, 35:28080–28094, 2022.
  27. No-regret learning in dynamic stackelberg games, 2022.
  28. Strategizing against learners in bayesian games. In Conference on Learning Theory, pages 5221–5252. PMLR, 2022.
  29. Playing repeated stackelberg games with unknown opponents. In Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 2, AAMAS ’12, page 821–828, Richland, SC, 2012. International Foundation for Autonomous Agents and Multiagent Systems.
  30. Fast swap regret minimization and applications to approximate correlated equilibria. arXiv preprint arXiv:2310.19647, 2023.
  31. Beyond time-average convergence: Near-optimal uncoupled online learning via clairvoyant multiplicative weights update. Advances in Neural Information Processing Systems, 35:22258–22269, 2022.
  32. Learning optimal strategies to commit to. Proceedings of the AAAI Conference on Artificial Intelligence, 33(01):2149–2156, Jul. 2019.
  33. Fast convergence of regularized learning in games. Advances in Neural Information Processing Systems, 28, 2015.
  34. H Peyton Young. Strategic learning and its limits. OUP Oxford, 2004.
  35. Computing optimal equilibria and mechanisms via learning in zero-sum extensive-form games. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Eshwar Ram Arunachaleswaran (14 papers)
  2. Natalie Collina (14 papers)
  3. Jon Schneider (50 papers)
Citations (4)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com