Papers
Topics
Authors
Recent
2000 character limit reached

LRA-CMA: Adaptive Learning for CMA-ES

Updated 11 December 2025
  • LRA-CMA is a variant of CMA-ES that dynamically adapts learning rates via a signal-to-noise ratio criterion to enhance optimization efficiency.
  • It employs online exponential moving averages of parameter updates to balance progress and stability in noisy and rugged search landscapes.
  • Empirical evaluations demonstrate that LRA-CMA outperforms fixed-rate CMA-ES, achieving robust performance on challenging noisy and multimodal problems.

LRA-CMA, short for Learning Rate Adaptation for Covariance Matrix Adaptation Evolution Strategy, is a variant of CMA-ES that introduces a principled mechanism for online adaptation of learning rates based on a signal-to-noise ratio (SNR) criterion. Unlike standard CMA-ES, which uses fixed learning rates for its mean and covariance updates, LRA-CMA adaptively controls the magnitude of parameter updates to maintain a constant SNR in Fisher metric units. This design enables robust, tuning-free optimization on noisy and multimodal black-box functions while retaining the favorable practical characteristics of default CMA-ES (Nomura et al., 2023, Nomura et al., 29 Jan 2024).

1. Foundations of CMA-ES and the Role of the Learning Rate

CMA-ES operates by evolving a Gaussian search distribution, parameterized by mean mm and covariance Σ=σ2C\Sigma = \sigma^2 C, to minimize a black-box objective f:RdRf:\mathbb{R}^d \to \mathbb{R}. At each iteration, candidate points xiN(m,Σ)x_i \sim \mathcal{N}(m, \Sigma) are sampled and evaluated, with the top-ranked samples influencing the updates of mm and Σ\Sigma. In classical CMA-ES, these updates are controlled by fixed learning rates (cm,c1,cμc_m, c_1, c_\mu). The learning rates directly affect the step sizes in natural gradient space, with large rates promoting rapid but potentially unstable adaptation, and small rates ensuring stability but risking inefficiency, especially in the presence of noise or rugged landscapes.

ODE-based analysis shows that the ideal learning rate must be sufficiently small to allow the stochastic discrete updates of CMA-ES to follow the path of the continuous-time natural gradient flow. On functions such as Rastrigin, the ODE for the mean mm and variance v=σ2v=\sigma^2 highlights that only for small enough learning rates do the trajectories reach the global optimum, avoiding divergence or stagnation (Nomura et al., 29 Jan 2024). This sensitivity to learning rate motivates the need for automatic adaptation.

2. Signal-to-Noise Ratio Criterion for Learning Rate Adaptation

LRA-CMA is built on the premise that each update to the parameters θ{m,Σ}\theta \in \{m, \Sigma\} should maintain a prescribed ratio between the expected directional progress (signal) and the inherent stochastic fluctuation (noise). The SNR in Fisher-natural coordinates is defined as

SNRθ=E[Δ~θ]22E[Δ~θ22]E[Δ~θ]22\mathrm{SNR}_\theta = \frac{ \|\mathbb{E}[\tilde{\Delta}_\theta]\|_2^2 }{ \mathbb{E}\left[\|\tilde{\Delta}_\theta\|_2^2\right] - \|\mathbb{E}[\tilde{\Delta}_\theta]\|_2^2 }

where Δ~θ=F1/2Δθ\tilde{\Delta}_\theta = F^{1/2} \Delta_\theta denotes the update measured in the local Fisher metric, with FF being the Fisher information matrix (Fm=Σ1F_m = \Sigma^{-1}, FΣ=12Σ1Σ1F_\Sigma = \frac{1}{2} \Sigma^{-1} \otimes \Sigma^{-1}).

To estimate the SNR online, LRA-CMA maintains exponential moving averages S1S_1 and S2S_2 of the updates and their squared norms: S1(1β)S1+βΔ~θ,S2(1β)S2+βΔ~θ22S_1 \leftarrow (1-\beta) S_1 + \beta \tilde{\Delta}_\theta, \quad S_2 \leftarrow (1-\beta) S_2 + \beta \|\tilde{\Delta}_\theta\|_2^2 yielding the estimator

SNR^=S122β2βS2S2S122\widehat{\mathrm{SNR}} = \frac{ \|S_1\|_2^2 - \frac{\beta}{2-\beta} S_2 }{ S_2 - \|S_1\|_2^2 }

(Nomura et al., 2023, Nomura et al., 29 Jan 2024).

3. Learning Rate Adaptation Mechanism

The core mechanism in LRA-CMA is a multiplicative adaptation rule that adjusts the effective local learning rates ηm\eta_m, ηΣ\eta_{\Sigma} to enforce

SNRθαηθ\mathrm{SNR}_\theta \approx \alpha \eta_\theta

for a target α>0\alpha > 0. At each generation,

ηθ(t+1)=ηθ(t)exp(δΠ[1,1](SNR^(t)αηθ(t)1)),ηθ(0,1]\eta_\theta^{(t+1)} = \eta_\theta^{(t)} \exp\big( \delta \Pi_{[-1,1]}\left( \frac{ \widehat{\mathrm{SNR}}^{(t)} }{ \alpha \eta_\theta^{(t)} } - 1 \right) \big), \quad \eta_\theta \in (0,1]

where δ=min(γηθ,β)\delta = \min(\gamma \eta_\theta, \beta) and Π[1,1]\Pi_{[-1,1]} denotes clipping to [1,1][-1,1]. The learning rate is adapted down in noisy or rugged phases and up in smooth phases, balancing robustness and efficiency.

Once learning rates are adapted, the proposed CMA-ES updates for mm and Σ\Sigma are rescaled by ηm\eta_m and ηΣ\eta_\Sigma: m(t+1)=m(t)+ηm(t+1)Δm(t),Σ(t+1)=Σ(t)+ηΣ(t+1)vec1(ΔΣ(t))m^{(t+1)} = m^{(t)} + \eta_m^{(t+1)} \Delta_m^{(t)}, \quad \Sigma^{(t+1)} = \Sigma^{(t)} + \eta_\Sigma^{(t+1)} \mathrm{vec}^{-1}(\Delta_\Sigma^{(t)}) After the updates, the step size σ\sigma is corrected to maintain the optimal scaling σ1/ηm\sigma \propto 1/\eta_m: σσ(ηmold/ηmnew)\sigma \leftarrow \sigma \cdot (\eta_m^{\text{old}} / \eta_m^{\text{new}}) (Nomura et al., 2023, Nomura et al., 29 Jan 2024).

4. Algorithmic Structure and Pseudocode

The LRA-CMA-ES algorithm is best viewed as a modular extension of vanilla CMA-ES, with the SNR-based adaptation intervening just prior to parameter updates. A per-generation loop proceeds as:

Stage Main Operation Intervention Point
Sample and evaluate Standard CMA-ES sampling and ranking
Propose updates (Δ)(\Delta) Standard vanilla update formulas
Fisher metric transformation Convert Δ\Delta to local (unit-Fisher) coords LRA-CMA mechanism
Exponential averaging Update S1S_1, S2S_2 for mm, Σ\Sigma LRA-CMA mechanism
SNR estimation Compute SNR^\widehat{\mathrm{SNR}} LRA-CMA mechanism
Learning rate update Update ηm\eta_m, ηΣ\eta_\Sigma LRA-CMA mechanism
Rescale and apply update Apply scaled Δ\Delta to mm, Σ\Sigma LRA-CMA mechanism
Step size correction Update σ\sigma using ηm\eta_m ratio LRA-CMA mechanism

This staged structure ensures that LRA-CMA is a drop-in replacement for learning rate control in existing CMA-ES implementations (Nomura et al., 29 Jan 2024).

5. Empirical Evaluation and Performance Characteristics

Experiments on standard benchmarks (Sphere, Ellipsoid, Rosenbrock, Rastrigin, Schaffer, and noisy variants) in d=10,30,40,50d = 10, 30, 40, 50 dimensions demonstrate that LRA-CMA achieves both problem-adaptive robustness and speed. On smooth unimodal problems, LRA-CMA matches the performance of CMA-ES with optimally fixed learning rates (SP1 for Sphere is 3.5×1043.5 \times 10^4, 100%100\% success). On multimodal or noisy functions, LRA-CMA obtains high success rates (e.g., Rastrigin: 100%100\%; SP1 1.2×105\approx 1.2 \times 10^5) even when fixed-rate CMA-ES is either unstable (for large η\eta) or inefficient (for small η\eta). Success is defined as reaching f(m)<108f(m) < 10^{-8} within a 10710^710810^8 evaluation budget.

Notably, in strongly noisy scenarios (additive Gaussian noise, σn2=1\sigma_n^2=1 or 10610^6), fixed-rate CMA-ES often stalls, while LRA-CMA maintains steady progress by adaptively shrinking learning rates. LRA-CMA thereby achieves >90%>90\% success down to f(m)<106f(m) < 10^{-6}. Population-size adaptation (PSA-CMA-ES) is competitive on noiseless multimodal functions but is outperformed by LRA-CMA under strong noise (Nomura et al., 2023, Nomura et al., 29 Jan 2024).

6. Practical Guidelines for Implementation

Recommended default hyperparameters for LRA-CMA-ES are α1.4\alpha \approx 1.4 (target SNR), βm0.1\beta_m \approx 0.1, βΣ0.03\beta_\Sigma \approx 0.03 (exponential averaging), and γ0.1\gamma \approx 0.1 (damping), with the learning rate factors initialized to one. If the optimization landscape is particularly noisy or multimodal, smaller α\alpha, β\beta values are recommended for increased stability; for smooth unimodal problems, larger values yield faster adaptation.

LRA-CMA-ES permits retention of the default population size λ=4+3lnd\lambda = 4 + \lfloor 3\ln d \rfloor, avoiding the need for expensive population size adaptation or restarts. It integrates directly into any CMA-ES variant, including diagonal and separable schemes, by replacing parameter update steps with LRA-controlled updates. The only further constraint is to maintain ηm,ηΣ(0,1]\eta_m, \eta_\Sigma \in (0,1], and to adjust σ\sigma in response to changes in ηm\eta_m (Nomura et al., 2023, Nomura et al., 29 Jan 2024).

7. Relation to Broader Optimization Methodology

LRA-CMA-ES can be formally interpreted as an instance of controlling the magnitude of natural gradient updates relative to their statistical estimation variance, functioning as an automatic mechanism for balancing progress versus robustness in stochastic search. Enforcing constant SNR ensures that neither drift (noise-dominated updates) nor slowness (excessively conservative updates) prevails over extended search horizons. This reflects a general pattern in ES design where stability and adaptivity are prioritized over tuning specific parameter schedules for each problem class. The SNR-based mechanism in LRA-CMA-ES offers a general principle extensible to other stochastic natural-gradient methods beyond ES.


Key references:

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Whiteboard

Follow Topic

Get notified by email when new papers are published related to LRA-CMA.