MESA: Cooperative Meta-Exploration in Multi-Agent Learning through Exploiting State-Action Space Structure (2405.00902v1)

Published 1 May 2024 in cs.LG, cs.AI, and cs.MA

Abstract: Multi-agent reinforcement learning (MARL) algorithms often struggle to find strategies close to Pareto optimal Nash Equilibrium, owing largely to the lack of efficient exploration. The problem is exacerbated in sparse-reward settings, caused by the larger variance exhibited in policy learning. This paper introduces MESA, a novel meta-exploration method for cooperative multi-agent learning. It learns to explore by first identifying the agents' high-rewarding joint state-action subspace from training tasks and then learning a set of diverse exploration policies to "cover" the subspace. These trained exploration policies can be integrated with any off-policy MARL algorithm for test-time tasks. We first showcase MESA's advantage in a multi-step matrix game. Furthermore, experiments show that with learned exploration policies, MESA achieves significantly better performance in sparse-reward tasks in several multi-agent particle environments and multi-agent MuJoCo environments, and exhibits the ability to generalize to more challenging tasks at test time.

PDF HTML Abstract

Summarize PDF Markdown Bookmark Chat (Pro)

References (44)

Authors (4)

Zhicheng Zhang (76 papers)
Yancheng Liang (8 papers)
Yi Wu (171 papers)
Fei Fang (103 papers)

Citations (2)

View on Semantic Scholar

MESA: Cooperative Meta-Exploration in Multi-Agent Learning through Exploiting State-Action Space Structure (2405.00902v1)

Related Papers

Tweets