Continuous Arcade Learning Environment (CALE)

Updated 5 March 2026

CALE is a continuous-action extension of ALE that adapts Atari 2600 games for evaluating both continuous and discrete reinforcement learning techniques.
It maps joystick and fire button inputs into a three-dimensional continuous space, enabling detailed studies of exploration dynamics and representation learning.
Empirical results in CALE highlight performance disparities between continuous-control algorithms and discrete methods, emphasizing the impact of action parameterization.

The Continuous Arcade Learning Environment (CALE) is an extension of the Arcade Learning Environment (ALE) that enables evaluation and benchmarking of continuous-control agents—such as Soft Actor-Critic (SAC) and Proximal Policy Optimization (PPO)—on the classic suite of Atari 2600 games. By modifying the action interface of the Stella emulator to support a continuous parameterization of joystick and fire button controls, CALE unifies continuous- and discrete-action methodologies within a single environment, supporting direct comparisons across reinforcement learning (RL) paradigms and prompting investigations into representation learning, exploration dynamics, and action parameterization in non-robotics domains (Farebrother et al., 2024).

1. Motivation and Rationale

ALE has played a central role in deep reinforcement learning by exposing agents to over 100 Atari 2600 games through a dry, discrete Markov decision process (MDP), with the agent choosing among 18 joystick events (nine directions crossed with fire/no-fire). Most prior research with ALE has relied on discrete, value-based RL such as DQN and Rainbow, as the interface precluded straightforward application of policy-gradient methods designed for continuous control. Simultaneously, continuous-control RL research (e.g., PPO, SAC) has gravitated towards robotic locomotion domains (e.g., MuJoCo, DM-Control), which feature smooth transitions and dense rewards—contrasting sharply with the sparse and sometimes erratic transitions of Atari games.

CALE is designed to address these limitations by:

Enabling direct, within-suite benchmarking and comparison of continuous-control and value-based agents on identical visual tasks.
Facilitating the study of continuous-control in an environment with sparse rewards and “jerky” transitions, which present challenges absent in robotics simulations.
Promoting research on exploration, representation and action parameterization, plasticity, and offline RL within this hybridized control setting.

Significantly, CALE restores the continuous (r, θ) joystick interface, supplementing it with a continuous fire button dimension, and thus aligns the digital simulation closer to the original hardware’s capabilities.

2. Formal Definition and MDP Specification

Within CALE, each Atari game is formalized as an MDP

$M = (S, A, T, R, \gamma)$

with the following specifics:

State Space ( $S$ ): $s \in \mathbb{R}^{84 \times 84 \times 4}$ , consisting of four stacked, downsampled grayscale frames as in ALE.
Action Space ( $A$ ): $A = [0, 1] \times [-\pi, \pi] \times [0,1] \subset \mathbb{R}^3$ . The triplet $(r, \theta, f)$ corresponds to the polar radius and angle controlling the joystick position, and a real-valued fire button pressure.
Transition Dynamics ( $T$ ): Deterministic or stochastic, as handled by the Stella emulator. Continuous actions are internally quantized by CALE: $(r, \theta, f)$ are mapped to a discrete event through threshold parameters $\tau$ , and then the emulator is advanced for $k=4$ frames with sticky actions.
Reward Function ( $R$ ): Returns the scalar reward provided by the emulator at every step.
Discount Factor ( $\gamma$ ): Typically set to $\gamma=0.99$ .

The policy and value functions are defined as:

Stochastic Policy: $\pi(a|s)$ is a probability density over $A$ given $s$ .
State-value Function: $V^{\pi}(s) = \mathbb{E}_{a_0 \sim \pi(\cdot|s),\,s_1,\dots} \left[\sum_{t=0}^\infty \gamma^t r(s_t, a_t) \mid s_0 = s \right]$ .
Action-value Function: $Q^{\pi}(s,a) = \mathbb{E}\left[ r(s,a) + \gamma V^{\pi}(s') \mid s,a \right]$ .

3. CALE Software Architecture and Action Mapping

Implementation-wise, CALE is a lightweight wrapper over ALE and the Stella emulator, presenting an interface conformant with Gymnasium conventions. After installing via

1 2	pip install ale-py pip install gymnasium[atari]

environments are instantiated as

1	env = gymnasium.make("Pong-v5", continuous=True)

yielding env.action_space = Box([[0,1](https://www.emergentmind.com/topics/l-p-regularization-for-p-in-0-1)],[-\pi,\pi],[0,1]).

Action mapping employs a quantization threshold $\tau$ :

If $r < \tau$ , the joystick is mapped to “CENTER”; otherwise, one of the eight angular directions is selected based on $\theta$ .
If $f > \tau$ , the action is “PRESS_FIRE”; else, “RELEASE_FIRE.”

This mapping is sensitive to $\tau$ (illustrated for values in $\{0.25, 0.5, 0.75, 0.9\}$ ), and the default is $\tau = 0.5$ .

All other environment parameters (sticky actions, frame skip, frame stack) are inherited directly from ALE, ensuring procedural comparability.

4. Empirical Baselines and Key Findings

Baseline experiments in CALE assess Soft Actor-Critic (SAC, continuous), SAC with categorical outputs (SAC-D, discrete), and DQN and Data-Efficient Rainbow (DER) (discrete ALE). Experiments consider both (a) 200 million frames across 60 games and (b) the Atari 100k benchmark, covering 26 games and 400k environment steps.

Major empirical results:

Threshold Sensitivity: Increasing $\tau$ lowers aggregate agent performance; $\tau=0.5$ is set as default.
Encoder Architectures: A convolutional encoder tailored to SAC ( $\phi_{\mathrm{SAC}}$ ) notably outperforms the DQN encoder ( $\phi_{\mathrm{DQN}}$ ) across both data regimes.
Exploration Regimes: SAC with entropy-regularized policy exploration decisively outperforms $\epsilon$ -greedy randomization in the continuous action space, particularly evident in long-horizon experiments.
Performance Comparison: SAC under continuous CALE actionization underperforms DQN on discrete ALE in pure aggregate, while even SAC-D performs worse than plain SAC. However, on individual games, SAC may surpass DQN (e.g., Asteroids), but lags significantly on others (e.g., Breakout).
Action Distribution Analysis: SAC on CALE explores a broader set of joystick positions (up to all 18 canonical events), while discrete agents, especially DER, often utilize only a minimal subset per game.

Key hyperparameters, consistent across baselines, include $\gamma=0.99$ , Adam optimizer $\epsilon \approx 10^{-4}$ to $10^{-5}$ , and batch sizes of 32–64.

5. Open Research Problems Enabled by CALE

CALE provides a foundation for multiple research directions:

Continuous Exploration: Understanding why entropy-based exploration in SAC outperforms $\epsilon$ -greedy action injection, and developing improved priors or intrinsic motivation in continuous domains.
Representation Learning: Assessing the efficacy of unsupervised and self-supervised encoders (e.g., Proto-Value, SPR, BYOL) under continuous, “jerky” Atari transitions.
Offline RL: Investigating whether the metric structure of $A=\mathbb{R}^3$ aids in mitigating distributional shift, and benchmarking algorithms such as CQL or BRAC.
Network Plasticity: Studying the dynamics of reset-and-replay schemes, such as Shrink & Perturb, when applied to continuous versus discrete actions.
Action Parameterization: Exploring alternatives to Gaussian distributions—e.g., von Mises for angular variables or learned spline mappings—for improved action expressiveness.
Transfer and Multitask Learning: Examining whether shared emulator dynamics facilitate transfer across discrete and continuous policy spaces.

6. Relation to the Standard Arcade Learning Environment

Although CALE and ALE share the same Stella emulator and Atari ROM suite, their action interface diverges substantially:

Feature	ALE (Discrete)	CALE (Continuous)
Action space	18 discrete events	Box([0,1] × [−π,π] × [0,1]) in $\mathbb{R}^3$
Minimal action set	Yes (game-specific)	No (all directions/firing are continuous)
Default agent families	Value-based (e.g., DQN)	Policy gradients (e.g., PPO, SAC)
Suitable use cases	Digital control only; minimal actions	Benchmarking continuous control, agent/plasticity studies, action granularity investigation
Paddle game support	Original analog paddle not yet supported	Treated as joystick titles; paddle extension open

In CALE, the absence of minimal action sets can hamper continuous-control agents in games, such as Breakout, where optimal play is readily achievable with as few as three actions in the discrete case. CALE is best suited for evaluating continuous-control algorithms and for studies on the representational and exploratory impact of action-space granularity. Support for original Atari paddle controllers and a broader baseline evaluation (including TD3, DDPG, GAIL, and offline RL) remains as an articulated direction for further work.

Summary

CALE furnishes a three-dimensional continuous action interface to ALE, mapping continuous (r, θ, fire) triplets onto the digital events recognized by the Stella emulator through a tunable threshold mechanism. It preserves all semantics of ALE aside from actionization, offering a unified benchmark for rigorous, apples-to-apples evaluation of continuous-control and value-based RL methodologies on identical Atari 2600 tasks (Farebrother et al., 2024). CALE thereby enables systematic investigation of action-space continuity’s impact on exploration, representation learning, and policy optimization in the non-robotic, game-centric RL domain.

Markdown Report Issue Upgrade to Chat

References (1)

CALE: Continuous Arcade Learning Environment (2024)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Continuous Arcade Learning Environment (CALE).