Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
156 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

LOQA: Learning with Opponent Q-Learning Awareness (2405.01035v1)

Published 2 May 2024 in cs.GT, cs.AI, and cs.LG

Abstract: In various real-world scenarios, interactions among agents often resemble the dynamics of general-sum games, where each agent strives to optimize its own utility. Despite the ubiquitous relevance of such settings, decentralized machine learning algorithms have struggled to find equilibria that maximize individual utility while preserving social welfare. In this paper we introduce Learning with Opponent Q-Learning Awareness (LOQA), a novel, decentralized reinforcement learning algorithm tailored to optimizing an agent's individual utility while fostering cooperation among adversaries in partially competitive environments. LOQA assumes the opponent samples actions proportionally to their action-value function Q. Experimental results demonstrate the effectiveness of LOQA at achieving state-of-the-art performance in benchmark scenarios such as the Iterated Prisoner's Dilemma and the Coin Game. LOQA achieves these outcomes with a significantly reduced computational footprint, making it a promising approach for practical multi-agent applications.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (22)
  1. Reinforcement Learning: Theory and Algorithms. 2021.
  2. Best response shaping. arXiv preprint arXiv:2404.06519, 2024.
  3. Continuous adaptation via meta-learning in nonstationary and competitive environments, 2018.
  4. Robert Axelrod. Effective choice in the prisoner’s dilemma. Journal of Conflict Resolution, 24(1):3–25, 1980.
  5. JAX: composable transformations of Python+NumPy programs, 2018. URL http://github.com/google/jax.
  6. Meta-value learning: a general framework for learning with learning awareness, 2023.
  7. Loaded dice: Trading off bias and variance in any-order score function estimators for reinforcement learning, 2019.
  8. Dice: The infinitely differentiable monte-carlo estimator, 2018a.
  9. Learning with opponent-learning awareness, 2018b.
  10. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor, 2018.
  11. A policy gradient algorithm for learning to learn in multiagent reinforcement learning, 2021.
  12. Actor-critic algorithms, 2000.
  13. Maintaining cooperation in complex social dilemmas using deep reinforcement learning, 2018.
  14. Stable opponent shaping in differentiable games, 2021.
  15. Model-free opponent shaping, 2022.
  16. Playing atari with deep reinforcement learning, 2013.
  17. Proximal policy optimization algorithms, 2017.
  18. Lloyd Shapley. Stochastic games. Proceedings of the national academy of sciences, 39(10):1095–1100, 1953.
  19. John von Neumann. On the theory of games of strategy. Mathematische Annalen, 100:295–320, 1928.
  20. Cola: Consistent learning with opponent-learning awareness, 2022.
  21. Ronald Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning, 8:229–256, 1992.
  22. Proximal learning with opponent-learning awareness, 2022.
Citations (2)

Summary

We haven't generated a summary for this paper yet.