PettingZoo API for Multi-Agent RL

Updated 30 March 2026

PettingZoo API is a Python library for MARL that implements the Agent-Environment-Cycle (AEC) model to ensure sequential agent interactions and eliminate common implementation errors.
It provides standardized methods such as reset(), step(), and agent_iter(), which facilitate modular experiment design and support heterogeneous agent spaces.
The API includes robust environment wrappers for parallel and raw interfacing, enabling reproducible research and seamless integration with prevalent RL frameworks.

PettingZoo is a Python-based library and API for multi-agent reinforcement learning (MARL) environments, designed to be a universal and elegant interface analogous in spirit to Gym but tailored specifically for MARL. At its core, PettingZoo is defined by the Agent-Environment-Cycle (AEC) game model, offering a formalism that encapsulates a broad class of MARL scenarios—including simultaneous moves, turn-taking, agent lifecycle changes, and nature interventions—while eliminating classes of implementation errors common in alternative frameworks. PettingZoo provides a standardized API, wrappers for parallel and raw environment interfaces, and a diverse catalog of over 60 environments spanning varied domains, all adhering strictly to the AEC paradigm (Terry et al., 2020).

1. Agent-Environment-Cycle Game Model

The AEC game model establishes the formal semantics underlying PettingZoo environments. Formally, an AEC game is a tuple:

$G = \bigl\langle \mathcal{A}, \mathcal{S}, s_0, \{\mathcal{A}_i\}_{i\in\mathcal{A}}, \delta, \{O_i\}_{i\in\mathcal{A}}, \{R_i\}_{i\in\mathcal{A}}, \mathit{next} \bigr\rangle$

where:

$\mathcal{A}$ denotes the finite set of agents, augmented by a distinguished “environment” actor.
$\mathcal{S}$ is the state space, with $s_0$ as the initial state.
$\mathcal{A}_i$ is the action set available to actor $i$ .
$\delta$ is the deterministic state transition function.
$O_i$ maps the global state to agent $i$ ’s observation.
$R_i$ computes the reward to agent $i$ as a function of the current state and the most recent action.
$\mathit{next}$ determines the actor for the upcoming step.

This model operates by selecting, at each discrete step, a single actor ( $a_t = \mathit{next}(s_t)$ ), supplying the actor’s observation, requesting an action, updating the state, and broadcasting the resulting reward to all agents. This sequentialization is intentionally exposed in the API, providing the following key advantages:

Elimination of dangling no-ops and dummy actions required in simultaneous-move frameworks.
Prevention of ambiguous reward attribution and reward-mixing bugs.
Removal of race conditions in environmental tie-breaking scenarios.
Straightforward support for agent birth/death and nature/random moves.
Compatibility with variable agent participation sets and dynamic environments.

These properties render AEC games as expressive as partially observable stochastic games (POSGs) and extensive form games in theory, while providing a minimal and unambiguous model pragmatic for code implementation (Terry et al., 2020).

2. Core API Methods and Structure

PettingZoo environments are required to implement a concise set of methods and attributes:

Method/Attribute	Purpose
`reset()`	Initialize environment for a new episode
`step(action)`	Apply `action` for the current agent
`agent_iter(max_iter=None)`	Generator over agent names (turn order)
`last()`	Tuple of (obs, reward, done, info) for current agent
`observe(agent_name)`	Observation for any agent at current timestep
`render(mode="human")`	Visualization or RGB output
`close()`	Cleanup resources
`agents`	List of active agents in this episode
`possible_agents`	All agents that could ever participate
`rewards`	Current-step reward for each agent
`dones`	Termination flag per agent
`infos`	Debug metadata per agent
`action_space(agent)`	Valid Gym space for agent’s actions
`observation_space(agent)`	Gym space for agent’s observations

The agent_iter() method yields the next agent for the current step, and last() provides the relevant interface tuple. For environments requiring turn-taking, Polymorphic or variable agent presence, this approach guarantees ordering and correctness.

Illustrative example:

from pettingzoo.butterfly import pistonball_v0
env = pistonball_v0.env()
env.reset()
for agent in env.agent_iter():
    obs, reward, done, info = env.last()
    if not done:
        action = my_policy(obs, agent)
    else:
        action = None
    env.step(action)
    env.render()
env.close()

A key property enforced is that env.step(None) must still be called for terminated agents, ensuring that the agent loop remains invariant to episode terminations.

3. Spaces, Data Organization, and Agent Dynamics

Observation and action spaces must be Gym-compliant, either discrete or continuous. The API supports individual spaces per agent, ensuring flexibility for heterogeneous agent populations. Example invocations:

env.observation_space(agent)
env.action_space(agent)
for ag in env.possible_agents:
    print(ag, env.observation_space(ag), env.action_space(ag))

PettingZoo accommodates variable agent populations cleanly. Agents can be added or removed from the agents list at any step or reset, with no need for external reshaping or bookkeeping outside the environment. Where multiple agents share identical observation or action spaces, object identity is used by default (env.observation_space("red_1") is env.observation_space("red_2")).

For environments with large agent sets, the native AEC API can be wrapped into a parallel interface for simultaneous stepping, enabling reduced Python call overhead when interfacing with batch RL systems (Terry et al., 2020).

4. Environment Wrappers and Interface Transformation

PettingZoo environments expose minimal primitives. Higher-level interfaces—such as the AEC-to-parallel or AEC-to-raw transformations—are realized via utility wrappers. This modularity enables direct adaptation to integration targets such as RLlib or Gym vector environments.

Parallel API usage is provided as follows:

from pettingzoo.utils import aec_to_parallel
par_env = aec_to_parallel(aec_env)
par_env.reset()
actions = {ag: par_env.action_space(ag).sample() for ag in par_env.agents}
obs, rews, dones, infos = par_env.step(actions)

Similarly, a lower-level “raw” API for state-action-state interaction (AEC-to-raw) is available for advanced control. This suggests a design intent of minimal base requirements with maximal extensibility through composable utilities.

5. Reproducibility, Debugging, and Best Practices

PettingZoo supports standard experimental best practices for RL research:

Deterministic experiments: Use env.seed(seed) prior to reset(), and seed all auxiliary frameworks (NumPy, PyTorch, TensorFlow, Python random).
Debugging: Directly inspect agent observations using env.observe(agent_name); monitor environment metadata via env.infos[agent_name].
Visualization: env.render(mode="human") displays state, while "rgb_array" enables frame capture.
Turn-order and race-condition testing: Custom wrappers can arbitrarily randomize next, stress-testing the environment for ordering artifacts.

An explicit multi-agent training loop example demonstrates structured seeding, episodic reward logging, and stepwise interactions. This prescribes a workflow for reproducible and debuggable MARL experimentation.

6. Environment Coverage and Scope

PettingZoo provides a default catalog of 63 AEC environments, grouped into six major families:

Classic board and card games: Chess, Go, Hanabi, Uno, Connect4, etc.
Atari-style arcade games: “Butterfly” series (Pistonball, Pong Duel, etc.).
Particle-world benchmarks: MPE (Multi-Agent Particle Environments), SSP, Pursuit.
Large many-agent systems: MAgent battle, gather, etc.
Continuous control robotics: SISL suite (multi-robot Swimmer, Ant, etc.).
Social dilemma grid worlds: Cleanup, Harvest, etc.

All environments implement the same AEC API, ensuring complete script portability across this ecosystem. Only the import path changes when switching between environments. This approach guarantees broad empirical coverage, a critical requirement for benchmarking MARL algorithms.

7. Significance and Adoption

PettingZoo addresses several conceptual and practical deficiencies of prior MARL environment APIs. The formal AEC model eliminates multiple classes of subtle bugs and ambiguities endemic to simultaneous-move or extensive-form implementations, while maintaining expressive equivalence with established MARL game models. Environments, interfaces, and experiment logic become directly interchangeable, significantly lowering friction for algorithm development, benchmarking, and reproducibility (Terry et al., 2020). Its design, API uniformity, and extensibility have established PettingZoo as the de-facto standard MARL environment API.

Markdown Report Issue Upgrade to Chat

References (1)

PettingZoo: Gym for Multi-Agent Reinforcement Learning (2020)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to PettingZoo API.