Google Research Football: A Novel Reinforcement Learning Environment (1907.11180v2)

Published 25 Jul 2019 in cs.LG and stat.ML

Abstract: Recent progress in the field of reinforcement learning has been accelerated by virtual learning environments such as video games, where novel algorithms and ideas can be quickly tested in a safe and reproducible manner. We introduce the Google Research Football Environment, a new reinforcement learning environment where agents are trained to play football in an advanced, physics-based 3D simulator. The resulting environment is challenging, easy to use and customize, and it is available under a permissive open-source license. In addition, it provides support for multiplayer and multi-agent experiments. We propose three full-game scenarios of varying difficulty with the Football Benchmarks and report baseline results for three commonly used reinforcement algorithms (IMPALA, PPO, and Ape-X DQN). We also provide a diverse set of simpler scenarios with the Football Academy and showcase several promising research directions.

PDF Abstract

The Google Research Football Environment (GFootball) provides a reinforcement learning platform based on a modified version of the commercially available video game Gameplay Football, offering a challenging, physics-based 3D simulation for training agents in the domain of football (soccer) (Kurach et al., 2019 ). Its primary goal is to facilitate research in RL, particularly in areas demanding complex control, long-term credit assignment, and multi-agent coordination, within a reproducible and accessible framework.

Environment Architecture and Features

GFootball is built upon a C++ engine derived from Gameplay Football, incorporating the Bullet Physics library for realistic physical interactions. Communication between the game engine and the RL agent is managed via a Python interface, leveraging pybind11 for efficient binding.

Key Architectural Components:

Game Engine: A C++ core responsible for simulation state, physics, rendering (optional), and game logic execution. It supports deterministic simulation, crucial for debugging and reproducibility, although stochasticity can be introduced through agent policies or environment configurations.
Python API: Provides the interface for RL agents. This allows agents implemented in popular frameworks like TensorFlow or PyTorch to interact with the environment using standard Gym-like methods (reset, step).
Observations: The environment offers flexible observation spaces. The default representation includes:
- Simple115: A 115-dimensional feature vector containing information such as ball position/velocity, player coordinates/velocities, teammate/opponent locations, active player identification, and game state (score, game mode). This fixed-size vector is suitable for feedforward networks.
- Pixels: Raw pixel data from the game screen, enabling end-to-end learning approaches using convolutional neural networks (CNNs). Customizable resolution and color channels are supported.
- SMM (Super Mini Map): A low-resolution representation rendering player positions, ball location, and field markings as distinct channels, offering a compromise between raw pixels and feature vectors.
Actions: The action space is discrete, mapping to 19 distinct commands performed by the currently controlled player, such as movement directions (idle, left, top-left, etc.), shooting, passing (short/long), sprinting, and dribbling.
Rewards: The default reward signal is sparse, providing +1 for scoring a goal and -1 for conceding one. Dense reward shaping functions can be readily implemented via wrappers to facilitate learning.
Multi-Agent/Multiplayer: The environment natively supports multi-agent scenarios. Agents can control single players, multiple players on a team, or even entire teams. Configuration allows for N vs. M setups, including asymmetric teams. Control can be centralized (a single policy outputs actions for all controlled players) or decentralized (multiple independent policies interact within the environment).

The separation of the game engine and the Python API allows for efficient batching of simulations by running multiple engine instances in parallel processes, communicating with a central agent policy or learner.

Scenarios and Benchmarks

GFootball includes two main categories of tasks: Football Benchmarks for evaluating full-game performance and Football Academy for targeted skill learning.

Football Benchmarks:

These are designed to evaluate agent performance in the full 11 vs. 11 game setting over standard-length matches. Three difficulty levels are provided, achieved by varying the proficiency of the built-in opponent AI rule-based system:

football.easy: Opponent AI exhibits basic behavior.
football.medium: Opponent AI employs more coordinated tactics.
football.hard: Opponent AI presents a significant challenge, requiring sophisticated agent strategies.

The primary metric for these benchmarks is the score difference averaged over multiple evaluation episodes. The sparse reward structure (+1/-1 for goals) makes these scenarios particularly challenging, demanding effective exploration and long-term credit assignment.

Football Academy:

This suite comprises simpler, targeted scenarios designed to isolate specific football skills or tactical situations. Examples include:

academy_empty_goal_close: Learn to shoot into an undefended goal from close range.
academy_run_to_score: Navigate past a single defender to score.
academy_3_vs_1_with_keeper: Coordinate three attackers against one defender and a goalkeeper.
academy_counterattack_easy: Execute a counterattack scenario.

These scenarios often feature dense reward functions tailored to the specific task (e.g., rewarding ball possession, proximity to goal, successful passes) to accelerate learning. They serve as valuable tools for curriculum learning, skill transfer experiments, and debugging agent behaviors before tackling the full game.

Baseline Algorithms and Results

The paper establishes baseline performance on the Football Benchmarks using three prominent deep RL algorithms, trained using the Simple115 state representation and the sparse goal-based reward. The algorithms selected represent different approaches within the RL landscape:

IMPALA (Importance Weighted Actor-Learner Architecture): An off-policy actor-critic method known for its scalability and data efficiency in distributed settings.
PPO (Proximal Policy Optimization): An on-policy actor-critic algorithm widely used for its stability and reliable performance across various domains.
Ape-X DQN (Distributed Prioritized Experience Replay): An off-policy value-based method extending DQN with distributed replay and prioritization, achieving strong results in Atari environments.

Training was conducted using significant computational resources (details often specified in associated code releases or subsequent works). Key reported results (average score difference against built-in AI after ~25 million environment steps per actor, aggregated over multiple seeds) indicated:

Scenario	IMPALA	PPO	Ape-X DQN
`football.easy`	~0.7	~0.6	~0.2
`football.medium`	~0.1	~0.0	~-0.5
`football.hard`	~-0.6	~-0.8	~-0.9

Note: These values are approximate based on figures in the paper; precise scores depend on exact training hyperparameters and duration.

These baselines demonstrate that while standard RL algorithms can achieve some level of competence, particularly on the easier settings, mastering the full 11 vs. 11 game, especially against hard opponents with sparse rewards, remains a substantial challenge. Performance degrades significantly as opponent difficulty increases, highlighting the complexity introduced by coordinated team play and sophisticated opponent strategies. The results underscore the environment's capacity to drive research in areas like hierarchical RL, multi-agent learning, and intrinsic motivation.

Implementation Details and Usage

GFootball is open-sourced under the Apache 2.0 license, promoting accessibility and modification. Installation typically involves standard Python package management (pip install gfootball).

Core Usage Pattern:

import gfootball.env as football_env

env = football_env.create_environment(
    env_name='11_vs_11_easy_stochastic', # Stochastic version of easy benchmark
    representation='simple115',         # Observation type
    rewards='scoring',                  # Default sparse rewards
    render=False                        # Disable rendering for training
)

obs = env.reset()
done = False
while not done:
    # Agent selects an action (e.g., randomly or via policy)
    # Assuming a single agent controls the first player
    action = env.action_space.sample()

    # Step the environment
    obs, reward, done, info = env.step(action)

    # Use obs, reward for agent learning
    # ...

env.close()

Customization:

The create_environment function offers extensive customization:

env_name: Selects predefined benchmarks or academy scenarios. Custom scenarios can also be defined.
representation: Choose between 'simple115', 'pixels', 'pixels_gray', 'extracted', 'smm'.
rewards: Specify 'scoring' (default) or custom reward schemes via wrapper functions or scenario configuration.
logdir: Directory for saving replays, crucial for analysis and debugging.
write_goal_demos: Save trajectories of successful goals.
write_full_episode_demos: Save complete episode trajectories.
render: Enable graphical rendering (requires display).
number_of_left_players_agent_controls: Define how many players the agent controls on the left team (e.g., 1 for single-player control, 11 for full-team control).
number_of_right_players_agent_controls: Similarly for the right team.

Reproducibility:

The environment provides mechanisms for reproducibility:

Scenario Definitions: Fixed definitions for benchmarks and academy tasks.
Random Seeds: Setting random seeds for environment initialization and agent policies.
Deterministic Engine: The underlying C++ engine can operate deterministically, although interactions with stochastic policies or specific configurations might introduce variability. Stochastic versions of benchmarks (_stochastic suffix) are provided, incorporating randomness in initial states and potentially opponent behavior.

Research Directions

GFootball is positioned to catalyze research in several challenging RL areas:

Long-Term Credit Assignment: The sparse reward structure of the full game necessitates methods that can link actions (e.g., a defensive tackle) to distant outcomes (scoring a goal). Hierarchical RL, intrinsic motivation, and reward shaping are relevant research directions.
Multi-Agent Coordination: The 11 vs. 11 setting is a natural testbed for multi-agent RL (MARL). Challenges include communication protocols, emergent cooperation/competition, centralized vs. decentralized control, and opponent modeling.
Robustness and Generalization: Training agents that perform well across different opponent strategies, team formations, or slight variations in game physics.
Transfer Learning: Leveraging skills learned in the Football Academy scenarios to accelerate learning in the full game (curriculum learning) or transferring policies between different observation spaces (e.g., SMM to pixels).
Representation Learning: Investigating optimal state representations, whether learned end-to-end from pixels or derived from feature engineering, for complex control tasks.
Model-Based RL: Developing world models capable of predicting game dynamics to enable planning or improve sample efficiency.

Conclusion

The Google Research Football Environment presents a sophisticated and challenging simulation platform for reinforcement learning research. Its combination of realistic physics, complex multi-agent interactions, flexible configuration, and open-source availability makes it a valuable tool for investigating advanced RL concepts. The provided benchmarks and baseline results highlight the difficulties inherent in long-horizon, sparse-reward, multi-agent domains, thereby setting the stage for future research aimed at overcoming these significant challenges.

PDF Markdown Bookmark Chat (Pro)

Authors (11)

Karol Kurach (15 papers)
Anton Raichuk (13 papers)
Michał Zając (33 papers)
Olivier Bachem (52 papers)
Lasse Espeholt (12 papers)
Carlos Riquelme (26 papers)
Damien Vincent (25 papers)
Marcin Michalski (20 papers)
Olivier Bousquet (33 papers)
Sylvain Gelly (43 papers)
Piotr Stańczyk (4 papers)

Citations (360)

View on Semantic Scholar

Related Papers

Find Related Papers

YouTube

Show All Videos