CMA-ME: Covariance Matrix Adaptation MAP-Elites
- CMA-ME is a hybrid algorithm that merges MAP-Elites' quality-diversity archive with CMA-ES's adaptive search, enabling rapid discovery of diverse, high-performing solutions.
- It employs multiple independent CMA-ES emitters with distinct exploration heuristics that update a shared behavioral archive to balance global diversity and local optimization.
- Empirical results demonstrate that CMA-ME outperforms standard MAP-Elites in coverage, QD-Score, and convergence speed across tasks such as robotics and game strategy optimization.
Covariance Matrix Adaptation MAP-Elites (CMA-ME) is a hybrid algorithm that synergizes the Quality-Diversity (QD) framework of MAP-Elites with the adaptive search capabilities of Covariance Matrix Adaptation Evolution Strategy (CMA-ES). Designed to rapidly discover collections of diverse, high-performing solutions in high-dimensional continuous spaces, CMA-ME achieves this by employing multiple independent CMA-ES instances—called emitters—coordinated within a global behavioral archive. These emitters incorporate distinct exploration heuristics, driving both the illumination and optimization of behavior space. Empirical investigations demonstrate that CMA-ME consistently outperforms standard MAP-Elites across optimization tasks and complex domains such as robotics and game strategy search, offering enhanced diversity, convergence speed, and solution quality (Cully, 2020, Fontaine et al., 2019, Bruneton et al., 2019).
1. Algorithmic Foundations
CMA-ME operates by maintaining an archive of solutions , each characterized by a behavioral descriptor . The archive discretizes behavior space into a -dimensional grid or tessellation, storing the best fitness solution observed in each cell. At the core of CMA-ME is a population of emitters, with each emitter running a CMA-ES process parameterized by . Emitters sample candidate solutions and update their distributions using ranked intrinsic objectives , derived from task-specific or QD heuristics.
The high-level iteration proceeds as follows:
- Initialize archive and emitters.
- For each generation: emitters sample solutions, evaluate fitness and behavioral descriptors, populate the archive if candidate solutions improve a cell, compute intrinsic objectives , and update CMA-ES parameters by ranking sampled solutions.
- Emitters are restarted if stopping criteria are met; such restarts involve reinitialization around randomly selected archive elites.
This structure yields a Gaussian mixture model over behavior space, where each emitter adaptively explores and exploits local regions while contributing to global diversity (Cully, 2020, Fontaine et al., 2019).
2. Detailed CMA-ES Integration
Each emitter encapsulates a full CMA-ES process. Sampling of candidates is performed via eigendecomposition ,
candidates are thus drawn as . After evaluating for all offspring, CMA-ES updates are implemented:
- Weighted mean update (ranked by ),
- Evolution paths , for step-size and covariance adaptation,
- Step-size update: ,
- Covariance matrix update combining rank-one and rank- terms, with standard choices for learning rates (, , , , ) as in CMA-ES (Cully, 2020, Fontaine et al., 2019).
Emitters utilize intrinsic objectives:
- Optimizing Emitter: .
- Random-Direction Emitter: , where is a random descriptor space unit vector.
- Improvement Emitter: , favoring both cell-filling and fitness-improving actions.
- Random/Elites Emitter: offspring generated by grid elites, with isotropic and line mutation.
Emitters operate in parallel with independent objectives but synchronize their search by depositing their best solutions into the shared archive (Cully, 2020).
3. Quality-Diversity Archive Dynamics
The MAP-Elites archive forms the central data structure in CMA-ME, supporting both diversity and quality metrics. The archive is structured as a grid (standard MAP-Elites (Fontaine et al., 2019)) or, for symbolic regression, as a multi-dimensional bin indexed by relevant features (expression length, free scalars, functional types (Bruneton et al., 2019)). Each cell may store only the current elite; insertion and replacement are contingent on fitness improvement or descriptor novelty.
Emitters are periodically restarted. Upon stagnation (e.g., zero QD-success in a generation), the emitter selects a new starting elite from the archive, resets CMA-ES parameters, and may alter its exploration direction. This strategy mitigates premature convergence and maintains ongoing archive expansion (Cully, 2020, Fontaine et al., 2019).
4. Heterogeneous Emitters and Bandit Coordination
Multi-Emitter MAP-Elites (ME-MAP-Elites) generalizes CMA-ME by employing a heterogeneous set of emitters, each possessing distinct exploration characteristics. A UCB1 bandit algorithm dynamically selects which emitters are deployed during each generation. This approach exploits synergies between exploration- and exploitation-focused emitters, achieving accelerated archive filling, higher QD-Score, and robust convergence. If no synergy arises, ME-MAP-Elites matches the best individual emitter’s performance. Empirical tests across six tasks (Rastrigin, Sphere, redundant arm, hexapod locomotion) reveal statistically significant gains in coverage and QD-Score, with ME-MAP-Elites attaining average QD-Scores of $0.8$–$0.98$ after $20,000$ generations, outperforming both single-emitter CMA-ME and MAP-Elites (p < 1e−5) (Cully, 2020).
5. Empirical Results and Performance Analysis
CMA-ME and its extensions have been benchmarked on high-dimensional optimization problems and robotic control tasks:
- CMA-ME variants increase both archive size and QD-Score orders of magnitude faster than vanilla MAP-Elites (e.g., CMA-ME_imp matches equilibrium QD-Score in 500 generations on Rastrigin-multi vs 10,000 for MAP-Elites).
- Optimizing emitters rapidly improve maximal fitness; random-direction emitters maximize coverage; improvement emitters provide a balanced trade-off (Cully, 2020).
- ME-MAP-Elites routinely achieves the largest coverage and highest QD-Score on all tested benchmarks (Cully, 2020).
- In symbolic regression, the CMA-ME pipeline attains direct hits on complex target functions using archives indexed by expression parsimony and structure, albeit with scalability constraints for free-scalar fitting CMA-ES as dimensionality increases ( or ) (Bruneton et al., 2019).
- In policy search for games (e.g., Hearthstone), CMA-ME (improvement emitter) fills of behavior space cells vs for MAP-Elites and for CMA-ES, with median QD-Score improvements by factors of $2$x–$2.4$x and superior win rates (Fontaine et al., 2019).
Representative summary table:
| Algorithm | Coverage (% Cells) | QD-Score (normalized) | Max Fitness / Win Rate |
|---|---|---|---|
| MAP-Elites | 32 (Toy), 22 (Game) | 0.2–0.8 (depends) | ≤0.5 (Game) |
| CMA-ME_imp | 77 (Toy), 29 (Game) | 0.6–0.95 | ≤0.66 (Game) |
| ME-MAP-Elites | ≥77 (Toy), ≥29 (Game) | 0.8–0.98 | ≥ best competitor |
ME-MAP-Elites advantages are maintained even as archive dimensionality and problem complexity grow (Cully, 2020, Fontaine et al., 2019, Bruneton et al., 2019).
6. Practical Considerations and Scalability
Hyperparameter choices in CMA-ME include the number of emitters, offspring samples per emitter (), initial mutation strengths, and grid resolution; default CMA-ES learning rates are employed. The per-generation complexity of emitter operations is dominated by covariance matrix updates, scaling as , while archive insertions scale as under grid-based access (Fontaine et al., 2019).
Principal bottlenecks are encountered for high-dimensional CMA-ES fitting tasks () and large training sets, particularly in symbolic regression, where superlinear cost may demand advanced CMA-ES variants or black-box optimizers for scalability. Authors recommend parsimony metrics, expression simplification, CVT-MAP-Elites to control archive dimensionality, and noise-robust optimizers for noisy domains (Bruneton et al., 2019).
7. Applications and Theoretical Implications
CMA-ME is applicable to any domain requiring simultaneous discovery of diverse, high-performing solutions:
- Behavior repertoires for robotics, including locomotion and damage recovery.
- Game AI, including the illumination of strategy spaces in large-scale games.
- Symbolic regression, procedural content generation, and general design optimization tasks.
- Multi-objective approximation scenarios where archive-based diversity is critical.
A plausible implication is that the integration of self-adaptive search (CMA-ES) into archive-driven QD frameworks substantially broadens the scope and efficiency of evolutionary search, facilitating robust exploration and exploitation in domains characterized by continuous high-dimensional spaces and multimodal behavior objectives (Cully, 2020, Fontaine et al., 2019, Bruneton et al., 2019).