Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
156 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Massively Multiagent Minigames for Training Generalist Agents (2406.05071v1)

Published 7 Jun 2024 in cs.AI, cs.LG, and cs.MA

Abstract: We present Meta MMO, a collection of many-agent minigames for use as a reinforcement learning benchmark. Meta MMO is built on top of Neural MMO, a massively multiagent environment that has been the subject of two previous NeurIPS competitions. Our work expands Neural MMO with several computationally efficient minigames. We explore generalization across Meta MMO by learning to play several minigames with a single set of weights. We release the environment, baselines, and training code under the MIT license. We hope that Meta MMO will spur additional progress on Neural MMO and, more generally, will serve as a useful benchmark for many-agent generalization.

Summary

  • The paper introduces Meta MMO as a benchmark for evaluating many-agent generalization in reinforcement learning across diverse minigames and settings.
  • It demonstrates up to 3x faster training efficiency using a single RTX 4090 GPU with methods like PPO and curriculum learning.
  • The study shows that generalist agents can match or exceed specialist performance, paving the way for more versatile and resource-efficient multiagent RL research.

Massively Multiagent Minigames for Training Generalist Agents

The presented paper introduces Meta MMO, an extension to the Neural MMO environment designed to facilitate the training and evaluation of generalist reinforcement learning (RL) agents. Meta MMO builds upon Neural MMO, which is an established massively multiagent platform known for its diverse, open-ended challenges. The authors aim to enhance the capabilities of neural datasets and RL frameworks by introducing computationally efficient minigames that encapsulate various aspects and goals within the broader environment. These minigames offer a robust benchmark for evaluating many-agent generalization.

Main Contributions and Methodology

Meta MMO contributes to the field in several key ways:

  1. Benchmark for Many-Agent Generalization: Meta MMO serves as a platform to evaluate many-agent RL across different scenarios. Its minigames include free-for-all and team settings, domain randomization, and adaptive difficulty, which are crucial for generalization studies.
  2. Optimized Training Efficiency: Training on Meta MMO is computationally efficient, achieving up to 3x faster training speeds than previous iterations. This is accomplished on standard desktop hardware equipped with a single RTX 4090 GPU.
  3. Generalist Agent Capabilities: The platform facilitates the training of generalist agents that use a single policy to handle multiple minigames. Using Proximal Policy Optimization (PPO) and curriculum learning methods, the research demonstrates effective training protocols that enable these agents to generalize across diverse tasks.

Meta MMO Environment and Implementation

Meta MMO's design allows fine-grained control over various gameplay elements, such as combat rules, NPC behavior, market dynamics, and map characteristics. This configurability is driven by several subsystems, each modularly controlled, enhancing the environment's flexibility and adaptability.

Several minigames were implemented to showcase Meta MMO's capabilities:

  • Survival: Agents aim to survive the longest in an environment, engaging in foraging and combat.
  • Team Battle: Teams compete to be the last standing, replicating the 2022 NeurIPS challenge.
  • Multi-task Training/Evaluation: Agents are assessed on their ability to generalize to new tasks, opponents, and maps, replicating the 2023 NeurIPS challenge.
  • New Minigames: These include Protect the King, Race to the Center, King of the Hill, and Sandwich, each introducing unique gameplay mechanics and strategic requirements.

Experimental Validation

The paper's experiments aimed to validate the efficacy of training generalist agents against specialists for both "full" and "mini" configurations:

  • Full Configuration: Included comprehensive subsystems like resource collection, combat, professions, and trade, aimed at complex multitask learning. After training, generalist policies were shown to match or exceed specialist policies even when considering sampling ratios.
  • Mini Configuration: Focused more narrowly on subsets of subsystems, enhancing training speeds and efficiency. This configuration allowed exploring optimization techniques and quick testing.

The training and evaluation procedures revealed that generalists could perform comparably to specialists across multiple tasks using additional auxiliary data without significant degradation in performance. This supports the paper's hypothesis that generalist policies can benefit from task transfer, even when trained on a relatively small set of tasks.

Implications for Future Research and Developments

The implications of Meta MMO are both practical and theoretical:

  • Practical: By optimizing training protocols and reducing computational overhead, Meta MMO makes it feasible for broader research applications, even with limited computational resources.
  • Theoretical: The findings suggest that many-agent RL can benefit significantly from task transfer learning. The structural design of Meta MMO, with its adaptive difficulties and domain randomization, paves the way for future studies into curriculum learning, coordination strategies, and more advanced policy adaptation mechanisms.

Conclusion

Meta MMO represents a significant step forward in the paper of many-agent RL environments. By providing a rich, configurable, and computationally efficient benchmarking platform, it facilitates robust experimentation and drives forward the development of agents capable of generalizing across diverse and complex tasks. The contributions outlined in this paper are expected to propel future research in multi-agent systems, generalist agent competencies, and large-scale environment interactions.

Youtube Logo Streamline Icon: https://streamlinehq.com
Reddit Logo Streamline Icon: https://streamlinehq.com