Energy-based Surprise Minimization for Multi-Agent Value Factorization (2009.09842v4)

Published 16 Sep 2020 in cs.LG, cs.MA, and stat.ML

Abstract: Multi-Agent Reinforcement Learning (MARL) has demonstrated significant success in training decentralised policies in a centralised manner by making use of value factorization methods. However, addressing surprise across spurious states and approximation bias remain open problems for multi-agent settings. Towards this goal, we introduce the Energy-based MIXer (EMIX), an algorithm which minimizes surprise utilizing the energy across agents. Our contributions are threefold; (1) EMIX introduces a novel surprise minimization technique across multiple agents in the case of multi-agent partially-observable settings. (2) EMIX highlights a practical use of energy functions in MARL with theoretical guarantees and experiment validations of the energy operator. Lastly, (3) EMIX extends Maxmin Q-learning for addressing overestimation bias across agents in MARL. In a study of challenging StarCraft II micromanagement scenarios, EMIX demonstrates consistent stable performance for multiagent surprise minimization. Moreover, our ablation study highlights the necessity of the energy-based scheme and the need for elimination of overestimation bias in MARL. Our implementation of EMIX can be found at karush17.github.io/emix-web/.

PDF Abstract

Summarize Bookmark Chat (Pro)

Authors (4)

Karush Suri (12 papers)
Xiao Qi Shi (5 papers)
Konstantinos Plataniotis (16 papers)
Yuri Lawryshyn (12 papers)

Citations (1)

View on Semantic Scholar

Energy-based Surprise Minimization for Multi-Agent Value Factorization (2009.09842v4)

Related Papers