Rolling Horizon Evolutionary Algorithms

Updated 4 February 2026

RHEA is a model-based real-time planning algorithm that evolves fixed-length action sequences using evolutionary search and forward-model evaluations.
It balances exploration and exploitation through parameters like population size and horizon length, which directly impact win rates and performance.
Advanced RHEA variants integrate learned policy/value priors, sophisticated seeding, and adaptations for multi-agent or adversarial setups to boost efficiency.

Rolling Horizon Evolutionary Algorithms (RHEA) are a class of model-based online planning algorithms designed for real-time sequential decision-making. At each decision point, RHEA evolves a fixed-length sequence of future actions (the “horizon”) using evolutionary search, applies a forward model to evaluate candidate sequences, and executes only the first action of the best-evolved trajectory. This paradigm is particularly prominent in General Video Game Playing, continuous-control tasks, and other domains where fast, forward-model-based planning is possible. RHEA’s efficacy is shaped by population size, horizon length, fitness evaluation heuristics, genetic operators, initialization schemes, and, in advanced forms, by learned policy and value priors. The method’s performance envelope and limitations have been explored through rigorous comparative experiments, hybridizations, and automated hyper-parameter optimization.

1. Core Principles and Algorithmic Structure

RHEA maintains a population of individuals, each encoding a sequence of actions of fixed length $L$ (the planning horizon). At each time step:

Each individual $x = (a_0, a_1, ..., a_{L-1})$ , $a_i \in A(s_i)$ , represents a plan simulated from the current state $s_0$ via a known forward model.
Fitness is computed by rolling out the action sequence, yielding either the cumulative reward, raw score, or a win/loss bonus: $f(x|s_0)=h(s_L)+R \mathbf{1}_{\{\mathrm{win}\}}-P \mathbf{1}_{\{\mathrm{loss}\}}$ .
Evolution proceeds via standard selection (e.g., tournament), crossover (uniform or point-based), and mutation (per-locus or whole-individual), subject to a strict computational budget in forward model calls or wall-clock time.
After a fixed number of generations or budget exhaustion, the first action $a_0^*$ of the highest-fitness individual is executed; the population is rolled forward for the next step, often re-initialized from scratch or partially by shifting previous individuals’ action sequences (Gaina et al., 2017, Gaina et al., 2017, Gaina et al., 2020).

RHEA differs fundamentally from tree-based planners (e.g., Monte Carlo Tree Search, MCTS) by searching over open-loop action sequences rather than branching closed-loop policies. This makes RHEA particularly sensitive to the horizon length $L$ and the action branching factor, since the search space grows exponentially with $L$ .

2. Parameter Regimes, Empirical Findings, and Baseline Performance

Several empirical studies analyze how RHEA’s performance scales with core parameters:

Population size ( $P$ ): Increasing $P$ generally leads to higher win rates, especially in stochastic games. In deterministic games, gains plateau at smaller $x = (a_0, a_1, ..., a_{L-1})$ 0. Example: in "Chopper," win-rate scales from 29% ( $x = (a_0, a_1, ..., a_{L-1})$ 1) to 98% ( $x = (a_0, a_1, ..., a_{L-1})$ 2) (Gaina et al., 2017).
Horizon length ( $x = (a_0, a_1, ..., a_{L-1})$ 3): For $x = (a_0, a_1, ..., a_{L-1})$ 4, longer horizons may degrade performance via search space explosion. With $x = (a_0, a_1, ..., a_{L-1})$ 5, longer horizons provide consistent but modest improvements.
Budget ( $x = (a_0, a_1, ..., a_{L-1})$ 6): Under tight budgets (e.g., 480 FM calls), pure random search can rival or outpace vanilla RHEA. Doubling the budget allows more generations, after which improvements plateau.
Comparison with MCTS: RHEA can surpass MCTS in deterministic settings given sufficient budget. In stochastic games, RHEA and MCTS become comparable when $x = (a_0, a_1, ..., a_{L-1})$ 7 is large (Gaina et al., 2017).

These results highlight key tradeoffs in RHEA design—balancing exploration (population size) and exploitation (sequence depth), as well as computational constraints.

3. Advanced Enhancements: Seeding, Shift Buffer, Fitness Schemes

Enhanced RHEA variants address initialization, population carry-over, and evaluation augmentations:

Population Seeding: Seeding with heuristically-constructed individuals (e.g., One-Step Look Ahead or MCTS-derived sequences) significantly improves early convergence and performance for small $x = (a_0, a_1, ..., a_{L-1})$ 8, especially under strict computational budgets (Gaina et al., 2017, Galván et al., 2020). For instance, MCTS-seeded RHEA can outperform both vanilla RHEA and standalone MCTS in several General Video Game AI (GVGAI) games.
Statistical Tree-based Seeding: Statistical tree structures (Galván et al., 2020) store value estimates of action-prefixes (via UCB policies), enabling the generation and insertion of promising individuals each generation, thereby accelerating convergence and improving solution robustness in high-variance games.
Shift Buffer: Rolling the population forward by discarding the played action and appending a new random action reuses search history and is preferred in most game types (Gaina et al., 2020).
Fitness Assignment Variants: Beyond last-state heuristics, fitness can be assigned via discounted sums, improvement over the initial state, average, min/max along the rollout, or via Monte-Carlo rollouts from intermediate states (Gaina et al., 2020). Optimization of these schemes is strongly game-dependent, as highlighted by large-scale NTBEA-tuned RHEA (Gaina et al., 2020).

4. Learning-Guided Evolution: Policy and Value Priors

A critical advancement is the integration of offline-learned neural policy and value networks to guide RHEA’s evolutionary process:

Prior-based RHEA (p-RHEA): Learns a policy prior $x = (a_0, a_1, ..., a_{L-1})$ 9 for population initialization and a value prior $a_i \in A(s_i)$ 0 to bootstrap beyond horizon $a_i \in A(s_i)$ 1 by estimating future returns: $a_i \in A(s_i)$ 2.
Iterative Planning–Learning Loop: p-RHEA alternates between (a) online planning using the learned priors (warm-starting search and evaluating sequences with the value bootstrap), and (b) offline learning, updating network weights with samples from planning rollouts. Both networks are 2-layer MLPs, trained with RMSProp.
Empirical Results: In MuJoCo continuous control, p-RHEA with $a_i \in A(s_i)$ 3 and NG=5 generations often matches or surpasses vanilla RHEA ( $a_i \in A(s_i)$ 4, NG=50) and PPO2, with up to 2.5× shorter horizons and 10× fewer generations (Tong et al., 2019).
Mechanistic Insights: The policy prior sharply focuses search on action regions of high likelihood; the value prior permits longer-term planning with bounded local horizon search, crucial in environments with “deceptive” or sparse rewards.

5. RHEA in Adversarial and Collaborative Multi-Agent Domains

RHEA’s behavior in multi-agent and adversarial settings is shaped by the need for opponent modeling and robust planning:

Opponent Modeling Sensitivity: In real-time strategy and fighting games, RHEA is highly sensitive to opponent model accuracy. A perfectly matched model can improve win-rate, but even modest inaccuracies cause performance collapses to below no-model baselines. In contrast, MCTS is robust to model misspecification and benefits from in-tree opponent simulation (Goodman et al., 2020, Tang et al., 2020).
Learned Opponent Models: Integrating online-learned opponent models (supervised or reinforcement learning, e.g., policy gradient) into RHEA rollouts rapidly improves head-to-head performance against diverse policies and in competitive benchmarks, and enables dynamic anticipation of adversarial strategies (Tang et al., 2020).
Collaborative Stochastic Environments: In complex cooperative board games (e.g., Pandemic), specialized macro-action encodings, stochastic rollout averaging, and hybrid optimistic/pessimistic state-value heuristics are required. Partial-destruction and expert-guided repair mutation operators, high mutation rates, and short planning horizons stabilize coordination under extreme uncertainty (Sfikas et al., 2021).

6. Hybridizations, Parameter Optimization, and State-of-the-Art Configurations

A proliferation of RHEA variants has led to hybrid algorithms that unify multiple literature enhancements, each exposed as a tunable parameter. Large-scale automated tuning (NTBEA) has demonstrated:

Game-dependent preferences for (i) shift buffer vs. population reinitialization, (ii) direct mutation vs. crossover, (iii) MC rollout depth and repetition for sparse-reward domains, (iv) fitness assignment, (v) frame skipping strategies, and (vi) dynamic horizon adaptation.
Automated parameter selection has achieved new state-of-the-art win rates in several GVGAI games, especially those with pronounced sparse/delayed rewards or requiring long-term temporal credit assignment. Examples: “Sea Quest” improved from 65% to 84%, “Missile Command” from 77.8% to 86%, “Camel Race” from 11% to 41% (Gaina et al., 2020).
The importance of meta-controllers or model-based online adaptation of RHEA parameters is highlighted by the heterogeneous optimal configurations across different game structures.

7. Extensions to Model-Based Reinforcement Learning and Neuroevolution

RHEA has been extended to broader model-based reinforcement learning and policy evolution frameworks:

Latent Model Planning: RHEA is used as a planning policy over latent states extracted from learned recurrent state-space models (RSSMs) in visual navigation (Ovalle et al., 2021). Here, a large population of candidate action sequences is evaluated inside the learned model, shifted and mutated at each step. Shift-buffer seeding is maintained for temporal coherence.
Rolling Horizon NEAT (rhNEAT): Rather than evolving sequences, populations of neural policies are evolved by NEAT online (Perez-Liebana et al., 2020). This enables state-conditional policies, population carrying across ticks, and complex structural elaboration, at the expense of greater computational cost per forward model call and substantial additional hyperparameters.
Statistical Tree Hybridization: Building UCB-statistical trees alongside RHEA and using them for population seeding or guidance combines the exploitation-exploration balance of UCT with the global search of evolutionary algorithms, further increasing win rates and robustness in both deterministic and stochastic video games (Galván et al., 2020).

In sum, Rolling Horizon Evolutionary Algorithms form a foundational, highly extensible family of online planning algorithms, demonstrating competitive or superior performance to tree search when appropriately tuned and enhanced with modern policy/value priors, statistical seeding, and robust multi-agent extensions. Empirical and theoretical work establish the centrality of informed initialization, adaptive parameterization, and integration with learned model and policy components for high-dimensional, real-time decision-making in complex environments (Tong et al., 2019, Gaina et al., 2017, Gaina et al., 2017, Gaina et al., 2020, Galván et al., 2020, Ovalle et al., 2021, Perez-Liebana et al., 2020, Goodman et al., 2020, Tang et al., 2020, Sfikas et al., 2021).