Papers
Topics
Authors
Recent
2000 character limit reached

CMA-ME: Covariance Matrix Adaptation MAP-Elites

Updated 3 December 2025
  • CMA-ME is a hybrid algorithm that merges MAP-Elites' quality-diversity archive with CMA-ES's adaptive search, enabling rapid discovery of diverse, high-performing solutions.
  • It employs multiple independent CMA-ES emitters with distinct exploration heuristics that update a shared behavioral archive to balance global diversity and local optimization.
  • Empirical results demonstrate that CMA-ME outperforms standard MAP-Elites in coverage, QD-Score, and convergence speed across tasks such as robotics and game strategy optimization.

Covariance Matrix Adaptation MAP-Elites (CMA-ME) is a hybrid algorithm that synergizes the Quality-Diversity (QD) framework of MAP-Elites with the adaptive search capabilities of Covariance Matrix Adaptation Evolution Strategy (CMA-ES). Designed to rapidly discover collections of diverse, high-performing solutions in high-dimensional continuous spaces, CMA-ME achieves this by employing multiple independent CMA-ES instances—called emitters—coordinated within a global behavioral archive. These emitters incorporate distinct exploration heuristics, driving both the illumination and optimization of behavior space. Empirical investigations demonstrate that CMA-ME consistently outperforms standard MAP-Elites across optimization tasks and complex domains such as robotics and game strategy search, offering enhanced diversity, convergence speed, and solution quality (Cully, 2020, Fontaine et al., 2019, Bruneton et al., 2019).

1. Algorithmic Foundations

CMA-ME operates by maintaining an archive AA of solutions xRnx \in \mathbb{R}^n, each characterized by a behavioral descriptor bd(x)Rdbd(x) \in \mathbb{R}^d. The archive discretizes behavior space into a dd-dimensional grid or tessellation, storing the best fitness solution observed in each cell. At the core of CMA-ME is a population EE of emitters, with each emitter ee running a CMA-ES process parameterized by (me,σe,Ce,pc,e,pσ,e)(m_e, \sigma_e, C_e, p_{c,e}, p_{\sigma,e}). Emitters sample candidate solutions xN(me,σe2Ce)x \sim \mathcal{N}(m_e, \sigma_e^2 C_e) and update their distributions using ranked intrinsic objectives Oe(x)O_e(x), derived from task-specific or QD heuristics.

The high-level iteration proceeds as follows:

  • Initialize archive AA and emitters.
  • For each generation: emitters sample λ\lambda solutions, evaluate fitness and behavioral descriptors, populate the archive if candidate solutions improve a cell, compute intrinsic objectives Oe(x)O_e(x), and update CMA-ES parameters by ranking sampled solutions.
  • Emitters are restarted if stopping criteria are met; such restarts involve reinitialization around randomly selected archive elites.

This structure yields a Gaussian mixture model over behavior space, where each emitter adaptively explores and exploits local regions while contributing to global diversity (Cully, 2020, Fontaine et al., 2019).

2. Detailed CMA-ES Integration

Each emitter ee encapsulates a full CMA-ES process. Sampling of candidates is performed via eigendecomposition Ce=BD2BC_e = B D^2 B^\top,

ziN(0,In),yi=BDzi,xi=me+σeyiz_i \sim \mathcal{N}(0, I_n), \quad y_i = B D z_i, \quad x_i = m_e + \sigma_e y_i

candidates are thus drawn as xiN(me,σe2Ce)x_i \sim \mathcal{N}(m_e, \sigma_e^2 C_e). After evaluating Oe(xi)O_e(x_i) for all offspring, CMA-ES updates are implemented:

  • Weighted mean update (ranked by OeO_e),
  • Evolution paths pσp_{\sigma}, pcp_c for step-size and covariance adaptation,
  • Step-size update: σnew=σeexp((cσ/dσ)(pσ/EN(0,In)1))\sigma_{\text{new}} = \sigma_e \cdot \exp\left( (c_\sigma/d_\sigma)(\|p_\sigma\|/E\|\mathcal{N}(0,I_n)\| - 1 ) \right),
  • Covariance matrix update combining rank-one and rank-μ\mu terms, with standard choices for learning rates (ccc_c, cσc_\sigma, c1c_1, cμc_\mu, dσd_\sigma) as in CMA-ES (Cully, 2020, Fontaine et al., 2019).

Emitters utilize intrinsic objectives:

  • Optimizing Emitter: Oopt(x)=f(x)O_\text{opt}(x) = f(x).
  • Random-Direction Emitter: Odir(x)=u[bd(x)bd(me)]O_\text{dir}(x) = u^\top [bd(x) - bd(m_e)], where uu is a random descriptor space unit vector.
  • Improvement Emitter: Oimp(x)O_\text{imp}(x), favoring both cell-filling and fitness-improving actions.
  • Random/Elites Emitter: offspring generated by grid elites, with isotropic and line mutation.

Emitters operate in parallel with independent objectives but synchronize their search by depositing their best solutions into the shared archive (Cully, 2020).

3. Quality-Diversity Archive Dynamics

The MAP-Elites archive forms the central data structure in CMA-ME, supporting both diversity and quality metrics. The archive is structured as a grid (standard MAP-Elites (Fontaine et al., 2019)) or, for symbolic regression, as a multi-dimensional bin indexed by relevant features (expression length, free scalars, functional types (Bruneton et al., 2019)). Each cell may store only the current elite; insertion and replacement are contingent on fitness improvement or descriptor novelty.

Emitters are periodically restarted. Upon stagnation (e.g., zero QD-success in a generation), the emitter selects a new starting elite from the archive, resets CMA-ES parameters, and may alter its exploration direction. This strategy mitigates premature convergence and maintains ongoing archive expansion (Cully, 2020, Fontaine et al., 2019).

4. Heterogeneous Emitters and Bandit Coordination

Multi-Emitter MAP-Elites (ME-MAP-Elites) generalizes CMA-ME by employing a heterogeneous set of emitters, each possessing distinct exploration characteristics. A UCB1 bandit algorithm dynamically selects which emitters are deployed during each generation. This approach exploits synergies between exploration- and exploitation-focused emitters, achieving accelerated archive filling, higher QD-Score, and robust convergence. If no synergy arises, ME-MAP-Elites matches the best individual emitter’s performance. Empirical tests across six tasks (Rastrigin, Sphere, redundant arm, hexapod locomotion) reveal statistically significant gains in coverage and QD-Score, with ME-MAP-Elites attaining average QD-Scores of $0.8$–$0.98$ after $20,000$ generations, outperforming both single-emitter CMA-ME and MAP-Elites (p < 1e−5) (Cully, 2020).

5. Empirical Results and Performance Analysis

CMA-ME and its extensions have been benchmarked on high-dimensional optimization problems and robotic control tasks:

  • CMA-ME variants increase both archive size and QD-Score orders of magnitude faster than vanilla MAP-Elites (e.g., CMA-ME_imp matches equilibrium QD-Score in <<500 generations on Rastrigin-multi vs >>10,000 for MAP-Elites).
  • Optimizing emitters rapidly improve maximal fitness; random-direction emitters maximize coverage; improvement emitters provide a balanced trade-off (Cully, 2020).
  • ME-MAP-Elites routinely achieves the largest coverage and highest QD-Score on all tested benchmarks (Cully, 2020).
  • In symbolic regression, the CMA-ME pipeline attains direct hits on complex target functions using archives indexed by expression parsimony and structure, albeit with scalability constraints for free-scalar fitting CMA-ES as dimensionality increases (m>8m>8 or N>500N>500) (Bruneton et al., 2019).
  • In policy search for games (e.g., Hearthstone), CMA-ME (improvement emitter) fills 29%29\% of behavior space cells vs 22%22\% for MAP-Elites and 17%17\% for CMA-ES, with median QD-Score improvements by factors of $2$x–$2.4$x and superior win rates (Fontaine et al., 2019).

Representative summary table:

Algorithm Coverage (% Cells) QD-Score (normalized) Max Fitness / Win Rate
MAP-Elites 32 (Toy), 22 (Game) 0.2–0.8 (depends) ≤0.5 (Game)
CMA-ME_imp 77 (Toy), 29 (Game) 0.6–0.95 ≤0.66 (Game)
ME-MAP-Elites ≥77 (Toy), ≥29 (Game) 0.8–0.98 ≥ best competitor

ME-MAP-Elites advantages are maintained even as archive dimensionality and problem complexity grow (Cully, 2020, Fontaine et al., 2019, Bruneton et al., 2019).

6. Practical Considerations and Scalability

Hyperparameter choices in CMA-ME include the number of emitters, offspring samples per emitter (λ\lambda), initial mutation strengths, and grid resolution; default CMA-ES learning rates are employed. The per-generation complexity of emitter operations is dominated by covariance matrix updates, scaling as O(Eλn2)O(|E| \cdot \lambda \cdot n^2), while archive insertions scale as O(1)O(1) under grid-based access (Fontaine et al., 2019).

Principal bottlenecks are encountered for high-dimensional CMA-ES fitting tasks (m>8m>8) and large training sets, particularly in symbolic regression, where superlinear cost may demand advanced CMA-ES variants or black-box optimizers for scalability. Authors recommend parsimony metrics, expression simplification, CVT-MAP-Elites to control archive dimensionality, and noise-robust optimizers for noisy domains (Bruneton et al., 2019).

7. Applications and Theoretical Implications

CMA-ME is applicable to any domain requiring simultaneous discovery of diverse, high-performing solutions:

  • Behavior repertoires for robotics, including locomotion and damage recovery.
  • Game AI, including the illumination of strategy spaces in large-scale games.
  • Symbolic regression, procedural content generation, and general design optimization tasks.
  • Multi-objective approximation scenarios where archive-based diversity is critical.

A plausible implication is that the integration of self-adaptive search (CMA-ES) into archive-driven QD frameworks substantially broadens the scope and efficiency of evolutionary search, facilitating robust exploration and exploitation in domains characterized by continuous high-dimensional spaces and multimodal behavior objectives (Cully, 2020, Fontaine et al., 2019, Bruneton et al., 2019).

Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Covariance Matrix Adaptation MAP-Elites (CMA-ME).