LRA-CMA: Adaptive Learning for CMA-ES

Updated 11 December 2025

LRA-CMA is a variant of CMA-ES that dynamically adapts learning rates via a signal-to-noise ratio criterion to enhance optimization efficiency.
It employs online exponential moving averages of parameter updates to balance progress and stability in noisy and rugged search landscapes.
Empirical evaluations demonstrate that LRA-CMA outperforms fixed-rate CMA-ES, achieving robust performance on challenging noisy and multimodal problems.

LRA-CMA, short for Learning Rate Adaptation for Covariance Matrix Adaptation Evolution Strategy, is a variant of CMA-ES that introduces a principled mechanism for online adaptation of learning rates based on a signal-to-noise ratio (SNR) criterion. Unlike standard CMA-ES, which uses fixed learning rates for its mean and covariance updates, LRA-CMA adaptively controls the magnitude of parameter updates to maintain a constant SNR in Fisher metric units. This design enables robust, tuning-free optimization on noisy and multimodal black-box functions while retaining the favorable practical characteristics of default CMA-ES (Nomura et al., 2023, Nomura et al., 29 Jan 2024).

1. Foundations of CMA-ES and the Role of the Learning Rate

CMA-ES operates by evolving a Gaussian search distribution, parameterized by mean $m$ and covariance $\Sigma = \sigma^2 C$ , to minimize a black-box objective $f:\mathbb{R}^d \to \mathbb{R}$ . At each iteration, candidate points $x_i \sim \mathcal{N}(m, \Sigma)$ are sampled and evaluated, with the top-ranked samples influencing the updates of $m$ and $\Sigma$ . In classical CMA-ES, these updates are controlled by fixed learning rates ( $c_m, c_1, c_\mu$ ). The learning rates directly affect the step sizes in natural gradient space, with large rates promoting rapid but potentially unstable adaptation, and small rates ensuring stability but risking inefficiency, especially in the presence of noise or rugged landscapes.

ODE-based analysis shows that the ideal learning rate must be sufficiently small to allow the stochastic discrete updates of CMA-ES to follow the path of the continuous-time natural gradient flow. On functions such as Rastrigin, the ODE for the mean $m$ and variance $v=\sigma^2$ highlights that only for small enough learning rates do the trajectories reach the global optimum, avoiding divergence or stagnation (Nomura et al., 29 Jan 2024). This sensitivity to learning rate motivates the need for automatic adaptation.

2. Signal-to-Noise Ratio Criterion for Learning Rate Adaptation

LRA-CMA is built on the premise that each update to the parameters $\theta \in \{m, \Sigma\}$ should maintain a prescribed ratio between the expected directional progress (signal) and the inherent stochastic fluctuation (noise). The SNR in Fisher-natural coordinates is defined as

$\mathrm{SNR}_\theta = \frac{ \|\mathbb{E}[\tilde{\Delta}_\theta]\|_2^2 }{ \mathbb{E}\left[\|\tilde{\Delta}_\theta\|_2^2\right] - \|\mathbb{E}[\tilde{\Delta}_\theta]\|_2^2 }$

where $\tilde{\Delta}_\theta = F^{1/2} \Delta_\theta$ denotes the update measured in the local Fisher metric, with $F$ being the Fisher information matrix ( $F_m = \Sigma^{-1}$ , $F_\Sigma = \frac{1}{2} \Sigma^{-1} \otimes \Sigma^{-1}$ ).

To estimate the SNR online, LRA-CMA maintains exponential moving averages $S_1$ and $S_2$ of the updates and their squared norms: $S_1 \leftarrow (1-\beta) S_1 + \beta \tilde{\Delta}_\theta, \quad S_2 \leftarrow (1-\beta) S_2 + \beta \|\tilde{\Delta}_\theta\|_2^2$ yielding the estimator

$\widehat{\mathrm{SNR}} = \frac{ \|S_1\|_2^2 - \frac{\beta}{2-\beta} S_2 }{ S_2 - \|S_1\|_2^2 }$

(Nomura et al., 2023, Nomura et al., 29 Jan 2024).

3. Learning Rate Adaptation Mechanism

The core mechanism in LRA-CMA is a multiplicative adaptation rule that adjusts the effective local learning rates $\eta_m$ , $\eta_{\Sigma}$ to enforce

$\mathrm{SNR}_\theta \approx \alpha \eta_\theta$

for a target $\alpha > 0$ . At each generation,

$\eta_\theta^{(t+1)} = \eta_\theta^{(t)} \exp\big( \delta \Pi_{[-1,1]}\left( \frac{ \widehat{\mathrm{SNR}}^{(t)} }{ \alpha \eta_\theta^{(t)} } - 1 \right) \big), \quad \eta_\theta \in (0,1]$

where $\delta = \min(\gamma \eta_\theta, \beta)$ and $\Pi_{[-1,1]}$ denotes clipping to $[-1,1]$ . The learning rate is adapted down in noisy or rugged phases and up in smooth phases, balancing robustness and efficiency.

Once learning rates are adapted, the proposed CMA-ES updates for $m$ and $\Sigma$ are rescaled by $\eta_m$ and $\eta_\Sigma$ : $m^{(t+1)} = m^{(t)} + \eta_m^{(t+1)} \Delta_m^{(t)}, \quad \Sigma^{(t+1)} = \Sigma^{(t)} + \eta_\Sigma^{(t+1)} \mathrm{vec}^{-1}(\Delta_\Sigma^{(t)})$ After the updates, the step size $\sigma$ is corrected to maintain the optimal scaling $\sigma \propto 1/\eta_m$ : $\sigma \leftarrow \sigma \cdot (\eta_m^{\text{old}} / \eta_m^{\text{new}})$ (Nomura et al., 2023, Nomura et al., 29 Jan 2024).

4. Algorithmic Structure and Pseudocode

The LRA-CMA-ES algorithm is best viewed as a modular extension of vanilla CMA-ES, with the SNR-based adaptation intervening just prior to parameter updates. A per-generation loop proceeds as:

Stage	Main Operation	Intervention Point
Sample and evaluate	Standard CMA-ES sampling and ranking	—
Propose updates $(\Delta)$	Standard vanilla update formulas	—
Fisher metric transformation	Convert $\Delta$ to local (unit-Fisher) coords	LRA-CMA mechanism
Exponential averaging	Update $S_1$ , $S_2$ for $m$ , $\Sigma$	LRA-CMA mechanism
SNR estimation	Compute $\widehat{\mathrm{SNR}}$	LRA-CMA mechanism
Learning rate update	Update $\eta_m$ , $\eta_\Sigma$	LRA-CMA mechanism
Rescale and apply update	Apply scaled $\Delta$ to $m$ , $\Sigma$	LRA-CMA mechanism
Step size correction	Update $\sigma$ using $\eta_m$ ratio	LRA-CMA mechanism

This staged structure ensures that LRA-CMA is a drop-in replacement for learning rate control in existing CMA-ES implementations (Nomura et al., 29 Jan 2024).

5. Empirical Evaluation and Performance Characteristics

Experiments on standard benchmarks (Sphere, Ellipsoid, Rosenbrock, Rastrigin, Schaffer, and noisy variants) in $d = 10, 30, 40, 50$ dimensions demonstrate that LRA-CMA achieves both problem-adaptive robustness and speed. On smooth unimodal problems, LRA-CMA matches the performance of CMA-ES with optimally fixed learning rates (SP1 for Sphere is $3.5 \times 10^4$ , $100\%$ success). On multimodal or noisy functions, LRA-CMA obtains high success rates (e.g., Rastrigin: $100\%$ ; SP1 $\approx 1.2 \times 10^5$ ) even when fixed-rate CMA-ES is either unstable (for large $\eta$ ) or inefficient (for small $\eta$ ). Success is defined as reaching $f(m) < 10^{-8}$ within a $10^7$ – $10^8$ evaluation budget.

Notably, in strongly noisy scenarios (additive Gaussian noise, $\sigma_n^2=1$ or $10^6$ ), fixed-rate CMA-ES often stalls, while LRA-CMA maintains steady progress by adaptively shrinking learning rates. LRA-CMA thereby achieves $>90\%$ success down to $f(m) < 10^{-6}$ . Population-size adaptation (PSA-CMA-ES) is competitive on noiseless multimodal functions but is outperformed by LRA-CMA under strong noise (Nomura et al., 2023, Nomura et al., 29 Jan 2024).

6. Practical Guidelines for Implementation

Recommended default hyperparameters for LRA-CMA-ES are $\alpha \approx 1.4$ (target SNR), $\beta_m \approx 0.1$ , $\beta_\Sigma \approx 0.03$ (exponential averaging), and $\gamma \approx 0.1$ (damping), with the learning rate factors initialized to one. If the optimization landscape is particularly noisy or multimodal, smaller $\alpha$ , $\beta$ values are recommended for increased stability; for smooth unimodal problems, larger values yield faster adaptation.

LRA-CMA-ES permits retention of the default population size $\lambda = 4 + \lfloor 3\ln d \rfloor$ , avoiding the need for expensive population size adaptation or restarts. It integrates directly into any CMA-ES variant, including diagonal and separable schemes, by replacing parameter update steps with LRA-controlled updates. The only further constraint is to maintain $\eta_m, \eta_\Sigma \in (0,1]$ , and to adjust $\sigma$ in response to changes in $\eta_m$ (Nomura et al., 2023, Nomura et al., 29 Jan 2024).

7. Relation to Broader Optimization Methodology

LRA-CMA-ES can be formally interpreted as an instance of controlling the magnitude of natural gradient updates relative to their statistical estimation variance, functioning as an automatic mechanism for balancing progress versus robustness in stochastic search. Enforcing constant SNR ensures that neither drift (noise-dominated updates) nor slowness (excessively conservative updates) prevails over extended search horizons. This reflects a general pattern in ES design where stability and adaptivity are prioritized over tuning specific parameter schedules for each problem class. The SNR-based mechanism in LRA-CMA-ES offers a general principle extensible to other stochastic natural-gradient methods beyond ES.

Key references:

"CMA-ES with Learning Rate Adaptation: Can CMA-ES with Default Population Size Solve Multimodal and Noisy Problems?" (Nomura et al., 2023)
"CMA-ES with Learning Rate Adaptation" (Nomura et al., 29 Jan 2024)

PDF Markdown Chat (Pro)

References (2)

CMA-ES with Learning Rate Adaptation: Can CMA-ES with Default Population Size Solve Multimodal and Noisy Problems? (2023)

CMA-ES with Learning Rate Adaptation (2024)

Whiteboard

Generate a whiteboard explanation of this topic.

Follow Topic

Get notified by email when new papers are published related to LRA-CMA.