Papers
Topics
Authors
Recent
Search
2000 character limit reached

CURO: Curriculum Learning for Relative Overgeneralization

Published 6 Dec 2022 in cs.LG, cs.AI, and cs.MA | (2212.02733v3)

Abstract: Relative overgeneralization (RO) is a pathology that can arise in cooperative multi-agent tasks when the optimal joint action's utility falls below that of a sub-optimal joint action. RO can cause the agents to get stuck into local optima or fail to solve cooperative tasks requiring significant coordination between agents within a given timestep. In this work, we empirically find that, in multi-agent reinforcement learning (MARL), both value-based and policy gradient MARL algorithms can suffer from RO and fail to learn effective coordination policies. To better overcome RO, we propose a novel approach called curriculum learning for relative overgeneralization (CURO). To solve a target task that exhibits strong RO, in CURO, we first fine-tune the reward function of the target task to generate source tasks to train the agent. Then, to effectively transfer the knowledge acquired in one task to the next, we use a transfer learning method that combines value function transfer with buffer transfer, which enables more efficient exploration in the target task. CURO is general and can be applied to both value-based and policy gradient MARL methods. We demonstrate that, when applied to QMIX, HAPPO, and HATRPO, CURO can successfully overcome severe RO, achieve improved performance, and outperform baseline methods in a variety of challenging cooperative multi-agent tasks.

Citations (1)

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (3)

Collections

Sign up for free to add this paper to one or more collections.