2000 character limit reached
Multiagent Soft Q-Learning (1804.09817v1)
Published 25 Apr 2018 in cs.AI
Abstract: Policy gradient methods are often applied to reinforcement learning in continuous multiagent games. These methods perform local search in the joint-action space, and as we show, they are susceptable to a game-theoretic pathology known as relative overgeneralization. To resolve this issue, we propose Multiagent Soft Q-learning, which can be seen as the analogue of applying Q-learning to continuous controls. We compare our method to MADDPG, a state-of-the-art approach, and show that our method achieves better coordination in multiagent cooperative tasks, converging to better local optima in the joint action space.
- Ermo Wei (2 papers)
- Drew Wicke (2 papers)
- David Freelan (1 paper)
- Sean Luke (3 papers)