Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Self-Play PSRO: Toward Optimal Populations in Two-Player Zero-Sum Games (2207.06541v1)

Published 13 Jul 2022 in cs.GT, cs.LG, and cs.MA

Abstract: In competitive two-agent environments, deep reinforcement learning (RL) methods based on the \emph{Double Oracle (DO)} algorithm, such as \emph{Policy Space Response Oracles (PSRO)} and \emph{Anytime PSRO (APSRO)}, iteratively add RL best response policies to a population. Eventually, an optimal mixture of these population policies will approximate a Nash equilibrium. However, these methods might need to add all deterministic policies before converging. In this work, we introduce \emph{Self-Play PSRO (SP-PSRO)}, a method that adds an approximately optimal stochastic policy to the population in each iteration. Instead of adding only deterministic best responses to the opponent's least exploitable population mixture, SP-PSRO also learns an approximately optimal stochastic policy and adds it to the population as well. As a result, SP-PSRO empirically tends to converge much faster than APSRO and in many games converges in just a few iterations.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Stephen McAleer (41 papers)
  2. JB Lanier (6 papers)
  3. Kevin Wang (41 papers)
  4. Pierre Baldi (89 papers)
  5. Roy Fox (39 papers)
  6. Tuomas Sandholm (119 papers)
Citations (17)

Summary

We haven't generated a summary for this paper yet.