Back to Basics: Benchmarking Canonical Evolution Strategies for Playing Atari (1802.08842v1)

Published 24 Feb 2018 in cs.NE

Abstract: Evolution Strategies (ES) have recently been demonstrated to be a viable alternative to reinforcement learning (RL) algorithms on a set of challenging deep RL problems, including Atari games and MuJoCo humanoid locomotion benchmarks. While the ES algorithms in that work belonged to the specialized class of natural evolution strategies (which resemble approximate gradient RL algorithms, such as REINFORCE), we demonstrate that even a very basic canonical ES algorithm can achieve the same or even better performance. This success of a basic ES algorithm suggests that the state-of-the-art can be advanced further by integrating the many advances made in the field of ES in the last decades. We also demonstrate qualitatively that ES algorithms have very different performance characteristics than traditional RL algorithms: on some games, they learn to exploit the environment and perform much better while on others they can get stuck in suboptimal local minima. Combining their strengths with those of traditional RL algorithms is therefore likely to lead to new advances in the state of the art.

Citations (94)

View on Semantic Scholar

Summary

The paper demonstrates that a simple canonical Evolution Strategy can achieve performance comparable to specialized methods in Atari games.
The research systematically benchmarks ES by evaluating the impact of population size and mutation strategies on game-playing outcomes.
The study highlights opportunities for integrating classical ES with modern RL techniques to enhance exploration and avoid suboptimal convergence.

Overview of "Back to Basics: Benchmarking Canonical Evolution Strategies for Playing Atari"

This paper, authored by Patryk Chrabaszcz, Ilya Loshchilov, and Frank Hutter, contributes to the field of reinforcement learning (RL) by examining the efficacy of canonical Evolution Strategies (ES) for game-playing AI, specifically in the context of Atari games. The researchers challenge the prevailing perspective that specialized ES variants, such as those presented by Salimans et al. in 2017, are requisite for high performance in tasks traditionally dominated by deep RL methods. Instead, the authors assert that a fundamental canonical ES, originating from the 1970s, can achieve competitive, sometimes superior results.

The research underscores ES as a parallelizable alternative to standard RL, thanks to its independence from reward distribution, insensitivity to discount factor tuning, and capacity to optimize non-differentiable functions. These strengths present ES as an attractive option for policy optimization. The paper specifically benchmarks a canonical $(\mu, \lambda)$ -ES against the NEAT-like specialized ES algorithms, displaying comparable success without reliance on advanced optimization or mirrored sampling.

Core Contributions

Performance Benchmarking: The major contribution is the empirical demonstration that a basic canonical ES can meet or exceed the performance of contemporary specialized ES methods on Atari game-playing tasks. By conducting experiments over multiple games, the paper shows that canonical ES can discover novel strategies, including bug exploitation in game design, within limited computational resource constraints. Canonical ES achieves significant advancements by leveraging decades-old convergence and mutation strategies.
Qualitative Insights: The paper provides insights into ES's distinctive learning characteristics compared to RL, noting that while capable of exploiting game environments at times, ES can also stagnate in suboptimal local minima. This distinct variance in learning paths suggests potential benefits in hybridizing the strengths of ES and traditional RL algorithms for advancing AI capability in complex environments.
Robustness to Hyperparameters: By systematically evaluating the impact of parent population size, the research highlights the flexibility of canonical ES settings. These insights help refine the parameter choices for better outcomes in diverse training scenarios.

Implications and Future Directions

The implications of this research are manifold. Practically, it offers a simplified yet powerful mechanism for task-specific optimization in environments defined by high-dimensional inputs, such as raw image data from Atari games. Theoretically, it opened pathways for revisiting and integrating classical techniques with modern advances, suggesting untapped potential in ES algorithms when updated with current computational strategies and resources. This work presages growth in hybrid approaches where ES algorithms can complement deep learning-based frameworks, yielding solutions with improved robustness and exploration capabilities.

Future developments might consider:

Combining ES and RL Approaches: Developing hybrid models that leverage ES's exploration efficiency and RL's exploitation focus, potentially overcoming the limitations identified, such as being stuck in local optima.
Refinement and Extension: Applying advanced concepts from evolutionary computation to enhance the performance of canonical ES, potentially through better sample efficiency or adaptive mechanisms.
Broader Applicability: Extending these approaches to other RL environments and problem domains, recognizing ES's potential when game conditions deviate from deterministic idealizations.

Overall, this research validates the potential of canonical ES as a formidable competitor in optimization tasks traditionally tackled by deep RL, serving as a foundation for novel algorithmic innovations in AI.

PDF Markdown

Related Papers

Tweets

https://twitter.com/subminima/status/1892479452858155399

YouTube

Show All Videos