CALE: Continuous Arcade Learning Environment (2410.23810v1)
Abstract: We introduce the Continuous Arcade Learning Environment (CALE), an extension of the well-known Arcade Learning Environment (ALE) [Bellemare et al., 2013]. The CALE uses the same underlying emulator of the Atari 2600 gaming system (Stella), but adds support for continuous actions. This enables the benchmarking and evaluation of continuous-control agents (such as PPO [Schulman et al., 2017] and SAC [Haarnoja et al., 2018]) and value-based agents (such as DQN [Mnih et al., 2015] and Rainbow [Hessel et al., 2018]) on the same environment suite. We provide a series of open questions and research directions that CALE enables, as well as initial baseline results using Soft Actor-Critic. CALE is available as part of the ALE athttps://github.com/Farama-Foundation/Arcade-Learning-Environment.
- Deep reinforcement learning at the edge of the statistical precipice. In Neural Information Processing Systems (NeurIPS), 2021.
- On warm-starting neural network training. In Neural Information Processing Systems (NeurIPS), 2020.
- Agent57: Outperforming the Atari human benchmark. In International Conference on Machine Learning (ICML), 2020.
- Investigating contingency awareness using atari 2600 games. AAAI Conference on Artificial Intelligence, 2012.
- The arcade learning environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research (JAIR), 47:253–279, 2013.
- A distributional perspective on reinforcement learning. In International Conference on Machine Learning (ICML), 2017.
- pc-gym: Reinforcement learning envionments for process control, 2024. URL https://github.com/MaximilianB2/pc-gym.
- Jax: composable transformations of python+ numpy programs. 2018.
- Dopamine: A Research Framework for Deep Reinforcement Learning. CoRR, abs/1812.06110, 2018.
- Mico: Improved representations via sampling-based state similarity for markov decision processes. In Neural Information Processing Systems (NeurIPS), 2021.
- Petros Christodoulou. Soft actor-critic for discrete action settings. CoRR, abs/1910.07207, 2019.
- Automatic state abstraction from demonstration. In International Joint Conference on Artificial Intelligence (IJCAI), 2011.
- Imagenet: A large-scale hierarchical image database. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2009.
- Sample-efficient reinforcement learning by breaking the replay ratio barrier. In International Conference on Learning Representations (ICLR), 2023.
- Dacbench: A benchmark library for dynamic algorithm configuration. In International Joint Conference on Artificial Intelligence (IJCAI), 2021.
- IMPALA: Scalable distributed deep-RL with importance weighted actor-learner architectures. In International Conference on Machine Learning (ICML), 2018.
- Generalization and regularization in dqn. CoRR, abs/1810.00123, 2018.
- Proto-value networks: Scaling representation learning with auxiliary tasks. In International Conference on Learning Representations (ICLR), 2023.
- Stop regressing: The unreasonable effectiveness of classification in deep reinforcement learning. In International Conference on Machine Learning (ICML), 2024.
- The state of sparse training in deep reinforcement learning. In International Conference on Machine Learning (ICML), 2022.
- Rl unplugged: A suite of benchmarks for offline reinforcement learning. In Neural Information Processing Systems (NeurIPS), 2020.
- Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International Conference on Machine Learning (ICML), 2018.
- Dream to control: Learning behaviors by latent imagination. In International Conference on Learning Representations (ICLR), 2020.
- Array programming with numpy. Nature, 585(7825):357–362, 2020.
- Deep reinforcement learning with double q-learning. In AAAI Conference on Artificial Intelligence, 2016.
- Hyperneat-ggp: a hyperneat-based atari general game player. In Conference on Genetic and Evolutionary Computation (GECCO), page 217–224, 2012.
- Rainbow: Combining improvements in deep reinforcement learning. In AAAI Conference on Artificial Intelligence, 2018.
- Myriad: a real-world testbed to bridge trajectory optimization and deep learning. In Neural Information Processing Systems (NeurIPS), 2022.
- John D Hunter. Matplotlib: A 2d graphics environment. Computing in science & engineering, 9(03):90–95, 2007.
- Recurrent experience replay in distributed reinforcement learning. In International Conference on Learning Representations (ICLR), 2019.
- Droid: A large-scale in-the-wild robot manipulation dataset. CoRR, abs/2403.12945, 2024.
- Actor-critic algorithms. In Neural Information Processing Systems (NeurIPS), 1999.
- Deep learning. Nature, 521(7553):436–444, 2015.
- Offline reinforcement learning: Tutorial, review, and perspectives on open problems. CoRR, abs/2005.01643, 2020.
- Understanding plasticity in neural networks. In International Conference on Machine Learning (ICML), 2023.
- Do transformer world models give better policy gradients? In International Conference on Machine Learning (ICML), 2024.
- Revisiting the arcade learning environment: Evaluation protocols and open problems for general agents. Journal of Artificial Intelligence Research (JAIR), 61:523–562, 2018.
- Human-level control through deep reinforcement learning. Nature, 518(7540):529–533, 2015.
- Asynchronous methods for deep reinforcement learning. In International Conference on Machine Learning (ICML), 2016.
- Racing the Beam: The Atari Video Computer System. The MIT Press, 2009. ISBN 026201257X.
- Reinforcement learning testbed for power-consumption optimization. In Methods and Applications for Modeling and Simulation of Complex Systems, pages 45–59. Springer Singapore, 2018. ISBN 978-981-13-2853-4.
- Stella: A multi-platform atari 2600 vcs emulator. https://github.com/stella-emu/stella, 1996.
- The primacy bias in deep reinforcement learning. In International Conference on Machine Learning (ICML), 2022.
- In deep reinforcement learning, a pruned network is a good network. In International Conference on Machine Learning (ICML), 2024a.
- Mixtures of experts unlock parameter scaling for deep RL. In International Conference on Machine Learning (ICML), 2024b.
- Travis E. Oliphant. Python for scientific computing. Computing in Science & Engineering, 9(3):10–20, 2007. doi: 10.1109/MCSE.2007.58.
- The difficulty of passive learning in deep reinforcement learning. In Neural Information Processing Systems (NeurIPS), 2021.
- The phenomenon of policy churn. In Neural Information Processing Systems (NeurIPS), 2022.
- Proximal policy optimization algorithms. CoRR, abs/1707.06347, 2017.
- Data-efficient reinforcement learning with self-predictive representations. In International Conference on Learning Representations (ICLR), 2020.
- Bigger, better, faster: Human-level Atari with human-level efficiency. In International Conference on Machine Learning (ICML), 2023.
- Soori Sivakumaran. Electronic Computer Projects for Commodore and Atari Personal Computers. COMPUTE! Publications, 1986. ISBN 0-87455-052-1.
- The dormant neuron phenomenon in deep reinforcement learning. In International Conference on Machine Learning (ICML), 2023.
- Richard S. Sutton. Temporal Credit Assignment in Reinforcement Learning. PhD thesis, University of Massachusetts Amherst, 1984.
- Policy gradient methods for reinforcement learning with function approximation. In Neural Information Processing Systems (NeurIPS), 1999.
- On bonus based exploration methods in the arcade learning environment. In International Conference on Learning Representations (ICLR), 2020.
- Discretizing continuous action space for on-policy optimization. Proceedings of the AAAI Conference on Artificial Intelligence, 34(04):5981–5988, Apr. 2020.
- Deepmind control suite. CoRR, abs/1801.00690, 2018.
- Multiplayer support for the arcade learning environment. CoRR, abs/2009.09341, 2020.
- Mujoco: A physics engine for model-based control. In International Conference on Intelligent Robots and Systems (IROS), 2012.
- Gymnasium, 2023. URL https://zenodo.org/record/8127025.
- dm_control: Software and tasks for continuous control. Software Impacts, 6:100022, 2020.
- When to use parametric models in reinforcement learning? Neural Information Processing Systems (NeurIPS), 2019.
- Python reference manual. Centrum voor Wiskunde en Informatica Amsterdam, 1995.
- Christopher Watkins. Learning from Delayed Rewards. PhD thesis, King’s College, 1989.
- Continual world: A robotic benchmark for continual reinforcement learning. In Neural Information Processing Systems (NeurIPS), 2021.
- Image augmentation is all you need: Regularizing deep reinforcement learning from pixels. In International Conference on Learning Representations (ICLR), 2021a.
- Improving sample efficiency in model-free reinforcement learning from images. In AAAI Conference on Artificial Intelligence, 2021b.
- Don’t change the algorithm, change the data: Exploratory data for offline reinforcement learning. CoRR, abs/2201.13425, 2022.
- Learning invariant representations for reinforcement learning without reconstruction. In International Conference on Learning Representations (ICLR), 2021.
- SMPL: Simulated industrial manufacturing and process control learning environments. In Neural Information Processing Systems (NeurIPS) Datasets and Benchmarks Track, 2022.
- Brian D. Ziebart. Modeling purposeful adaptive behavior with the principle of maximum causal entropy. PhD thesis, 2010.
- Maximum entropy inverse reinforcement learning. In AAAI Conference on Artificial Intelligence, 2008.
- Model based reinforcement learning for atari. In International Conference on Learning Representations (ICLR), 2020.