Empirical Policy Optimization for $n$-Player Markov Games (2110.08979v1)

Published 18 Oct 2021 in cs.GT

Abstract: In single-agent Markov decision processes, an agent can optimize its policy based on the interaction with environment. In multi-player Markov games (MGs), however, the interaction is non-stationary due to the behaviors of other players, so the agent has no fixed optimization objective. In this paper, we treat the evolution of player policies as a dynamical process and propose a novel learning scheme for Nash equilibrium. The core is to evolve one's policy according to not just its current in-game performance, but an aggregation of its performance over history. We show that for a variety of MGs, players in our learning scheme will provably converge to a point that is an approximation to Nash equilibrium. Combined with neural networks, we develop the \emph{empirical policy optimization} algorithm, that is implemented in a reinforcement-learning framework and runs in a distributed way, with each player optimizing its policy based on own observations. We use two numerical examples to validate the convergence property on small-scale MGs with $n\ge 2$ players, and a pong example to show the potential of our algorithm on large games.

Authors (4)

Yuanheng Zhu (17 papers)
Dongbin Zhao (62 papers)
Mengchen Zhao (8 papers)
Dong Li (429 papers)

Citations (10)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Empirical Policy Optimization for $n$-Player Markov Games (2110.08979v1)

Summary

Related Papers