Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Is Learning in Games Good for the Learners? (2305.19496v2)

Published 31 May 2023 in cs.GT and cs.LG

Abstract: We consider a number of questions related to tradeoffs between reward and regret in repeated gameplay between two agents. To facilitate this, we introduce a notion of $\textit{generalized equilibrium}$ which allows for asymmetric regret constraints, and yields polytopes of feasible values for each agent and pair of regret constraints, where we show that any such equilibrium is reachable by a pair of algorithms which maintain their regret guarantees against arbitrary opponents. As a central example, we highlight the case one agent is no-swap and the other's regret is unconstrained. We show that this captures an extension of $\textit{Stackelberg}$ equilibria with a matching optimal value, and that there exists a wide class of games where a player can significantly increase their utility by deviating from a no-swap-regret algorithm against a no-swap learner (in fact, almost any game without pure Nash equilibria is of this form). Additionally, we make use of generalized equilibria to consider tradeoffs in terms of the opponent's algorithm choice. We give a tight characterization for the maximal reward obtainable against $\textit{some}$ no-regret learner, yet we also show a class of games in which this is bounded away from the value obtainable against the class of common "mean-based" no-regret algorithms. Finally, we consider the question of learning reward-optimal strategies via repeated play with a no-regret agent when the game is initially unknown. Again we show tradeoffs depending on the opponent's learning algorithm: the Stackelberg strategy is learnable in exponential time with any no-regret agent (and in polynomial time with any no-$\textit{adaptive}$-regret agent) for any game where it is learnable via queries, and there are games where it is learnable in polynomial time against any no-swap-regret agent but requires exponential time against a mean-based no-regret agent.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (25)
  1. Near-optimal no-regret learning for correlated equilibria in multi-player general-sum games. In Proceedings of the 54th Annual ACM SIGACT Symposium on Theory of Computing, pages 736–749, 2022.
  2. The multiplicative weights update method: a meta-algorithm and applications. Theory of Computing, 8(6):121–164, 2012.
  3. Commitment without regrets: Online learning in stackelberg security games. In Proceedings of the Sixteenth ACM Conference on Economics and Computation, EC ’15, page 61–78, New York, NY, USA, 2015. Association for Computing Machinery.
  4. Regret minimization and the price of total anarchy. In Proceedings of the fortieth annual ACM symposium on Theory of computing, pages 373–382, 2008.
  5. Selling to a no-regret buyer. In Proceedings of the 2018 ACM Conference on Economics and Computation, pages 523–538, 2018.
  6. Computing the optimal strategy to commit to. In Proceedings of the 7th ACM Conference on Electronic Commerce, EC ’06, page 82–90, New York, NY, USA, 2006. Association for Computing Machinery.
  7. Near-optimal no-regret algorithms for zero-sum games. In Proceedings of the twenty-second annual ACM-SIAM symposium on Discrete Algorithms, pages 235–254. SIAM, 2011.
  8. Near-optimal no-regret learning in general games. Advances in Neural Information Processing Systems, 34:27604–27616, 2021.
  9. On learning algorithms for nash equilibria. In SAGT, pages 114–125. Springer, 2010.
  10. Strategizing against no-regret learners. Advances in Neural Information Processing Systems, 32, 2019.
  11. Asymptotic calibration. Biometrika, 85(2):379–390, 1998.
  12. Robust no-regret learning in min-max stackelberg games, 2022.
  13. No-regret learning in convex games. In Proceedings of the 25th International Conference on Machine Learning, ICML ’08, page 360–367, New York, NY, USA, 2008. Association for Computing Machinery.
  14. A general class of no-regret learning algorithms and game-theoretic equilibria. In Bernhard Schölkopf and Manfred K. Warmuth, editors, Learning Theory and Kernel Machines, pages 2–12, Berlin, Heidelberg, 2003. Springer Berlin Heidelberg.
  15. Learning in stackelberg games with non-myopic agents, 2022.
  16. A simple adaptive procedure leading to correlated equilibrium. Econometrica, 68(5):1127–1150, 2000.
  17. No-regret learning in bayesian games. Advances in Neural Information Processing Systems, 28, 2015.
  18. Learning optimal reserve price against non-myopic bidders. CoRR, abs/1804.11060, 2018.
  19. Learning and approximating the optimal strategy to commit to. In Algorithmic Game Theory, 2009.
  20. Beating the best nash without regret. SIGecom Exchanges, 10(1):23–26, 2011.
  21. Strategizing against learners in bayesian games. In Conference on Learning Theory, pages 5221–5252. PMLR, 2022.
  22. Cycles in adversarial regularized learning. In Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 2703–2717. SIAM, 2018.
  23. Strategically zero-sum games: the class of games whose completely mixed equilibria cannot be improved upon. International Journal of Game Theory, 7:201–221, 1978.
  24. Learning optimal strategies to commit to. In Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence and Thirty-First Innovative Applications of Artificial Intelligence Conference and Ninth AAAI Symposium on Educational Advances in Artificial Intelligence, AAAI’19/IAAI’19/EAAI’19. AAAI Press, 2019.
  25. Dynamic regret of strongly adaptive methods. In International Conference on Machine Learning, 2017.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. William Brown (15 papers)
  2. Jon Schneider (50 papers)
  3. Kiran Vodrahalli (13 papers)
Citations (10)