CSA-ES: Cumulative Step-Size Adaptation

Updated 9 April 2026

CSA-ES is a self-adaptive mechanism in Evolution Strategies that leverages an evolution path to adjust the global mutation step-size for continuous optimization.
It employs cumulation and damping parameters in its formulation to update the evolution path and step-size, balancing exploration and exploitation.
Empirical and theoretical analyses show that CSA-ES scales robustly with problem dimensions and population sizes, ensuring stable adaptation in diverse scenarios.

Cumulative Step-Size Adaptation (CSA-ES) is a foundational mechanism in Evolution Strategies (ES), particularly in Covariance Matrix Adaptation Evolution Strategy (CMA-ES) and its multi-recombinative $(\mu/\mu_I, \lambda)$ -ES variants, for controlling the global mutation strength (step-size) during the iterative optimization of black-box functions. Designed to optimize continuous, possibly nonconvex and ill-conditioned objective functions, CSA-ES employs an evolution path-based approach to adapt the global step-size, balancing exploration and exploitation via theoretically principled updates. The method ensures robust self-adaptation, scaling to high-dimensional problems and accommodating both unconstrained and constrained domains.

1. Principle and Mathematical Formulation

Cumulative Step-Size Adaptation operates by maintaining a "path" — an exponentially smoothed sequence of recent steps taken by the mean of the search distribution — in a whitened coordinate system defined by the algorithm's evolving covariance matrix. The algorithm adapts the step-size $\sigma$ in response to the length of this evolution path, increasing $\sigma$ when the path is systematically longer than expected by random chance, and decreasing it when the path is shorter.

Let $n$ be the search space dimension, $m^{(g)}$ the mean, $C^{(g)}$ the covariance, $\sigma^{(g)}$ the global step-size in generation $g$ , and $p_\sigma^{(g)}$ the evolution path. The core update equations are as follows:

Evolution path update:

$p_\sigma^{(g+1)} = (1 - c_\sigma) p_\sigma^{(g)} + \sqrt{c_\sigma(2 - c_\sigma)\,\mu_{\rm eff}}\, C^{(g)\,-1/2} \frac{m^{(g+1)} - m^{(g)}}{\sigma^{(g)}}$

Step-size update:

$\sigma$ 0

where $\sigma$ 1 is the cumulation parameter, $\sigma$ 2 is the damping parameter, and $\sigma$ 3 is the expectation of the norm of a standard $\sigma$ 4-dimensional normal vector. For the multi-recombinative $\sigma$ 5 setting, $\sigma$ 6 represents the variance-effective selection mass (Hansen, 2016, Omeradzic et al., 2024, Tessari et al., 2022, Chotard et al., 2012).

2. Stochastic Process Model and Stability

CSA-ES admits a detailed Markov chain analysis—both in unconstrained and constrained settings. In the unconstrained linear case, the pair $\sigma$ 7 forms a Markov chain, and, for $\sigma$ 8 (no cumulation), the step-size increments are i.i.d. (Chotard et al., 2012). With $\sigma$ 9, the chain is positive Harris recurrent under general conditions, admitting a unique stationary distribution for the evolution path. On constrained domains, such as linear or conical constraints, the process $\sigma$ 0 remains Markovian, and asymptotic divergence or convergence rates of $\sigma$ 1 can be rigorously analyzed using Foster–Lyapunov drift conditions and the ergodicity of the chain (Chotard et al., 2015, Spettel et al., 2019).

The typical behavior is geometric divergence of the step-size (i.e., $\sigma$ 2 for some constant $\sigma$ 3) on plateaus or strictly increasing functions as long as the population size $\sigma$ 4, or for $\sigma$ 5 if cumulation ( $\sigma$ 6) is used, as established by precise limit theorems and empirical verification (Chotard et al., 2012, Chotard et al., 2015).

3. Parameter Selection and Scaling Laws

Key CSA parameters include the cumulation constant $\sigma$ 7 and the damping $\sigma$ 8. Standard choices are:

Variant	$\sigma$ 9	$n$ 0	Adaptation Regime
sqrt- $n$ 1	$n$ 2	$n$ 3	Fast, dimension-invariant
lin- $n$ 4	$n$ 5	$n$ 6	Slow for large $n$ 7
CMA-ES default	$n$ 8	$n$ 9	Population-adaptive

Empirical and asymptotic analyses show that the sqrt- $m^{(g)}$ 0 scaling maintains a roughly constant adaptation strength ( $m^{(g)}$ 1 for normalized mutation strength) as $m^{(g)}$ 2 or the population size $m^{(g)}$ 3 increases, whereas both the lin- $m^{(g)}$ 4 and population-adaptive default result in slow adaptation for large $m^{(g)}$ 5 or large $m^{(g)}$ 6 (Omeradzic et al., 2024, Omeradzic et al., 2024). For instance, on the $m^{(g)}$ 7-dimensional sphere, the CSA's steady-state normalized mutation strength can be written as $m^{(g)}$ 8, with $m^{(g)}$ 9 determined by the parameteristion; see large-population results in (Omeradzic et al., 2024, Omeradzic et al., 2024).

4. Algorithmic Structure and Practical Implementations

The multi-recombinative $C^{(g)}$ 0-CSA-ES algorithm involves the following core sequence (Omeradzic et al., 2024, Omeradzic et al., 2024):

Sampling: Generate $C^{(g)}$ 1 offspring $C^{(g)}$ 2.
Selection & Recombinations: Evaluate $C^{(g)}$ 3, rank, and recombine the best $C^{(g)}$ 4 into $C^{(g)}$ 5.
Evolution Path Update: Update $C^{(g)}$ 6 (or $C^{(g)}$ 7).
Step-Size Adaptation: Update $C^{(g)}$ 8 according to the evolution path length.
Covariance Update (if CMA-ES): Update $C^{(g)}$ 9 or its factorizations.
Iterate.

For parallel or distributed contexts, such as distributed LM-CMA in large-scale optimization, CSA is maintained unmodified within each inner instance (island), while global meta-level step-size adaptation may use recombination or diversity injection, rather than cumulative path logic (Duan et al., 2023).

5. Theoretical Results: Markov Analysis, Steady-State, and Progress

Rigorous Markov chain analysis on both unconstrained and constrained problems has produced a detailed understanding of long-term CSA-ES dynamics:

On affine-linear or linear constrained functions, the chain admits an explicit geometric rate of change for $\sigma^{(g)}$ 0 determined by population size, cumulation, damping, and the selection mechanism (Chotard et al., 2012, Chotard et al., 2015, Spettel et al., 2019).
In conically constrained problems, the mean-value iterative system predicts steady-state normalized mutation strength and expected progress, under large- $\sigma^{(g)}$ 1 approximations and separated feasible/infeasible offspring analysis (Spettel et al., 2019).
For all parameter regimes admitting positive geometric rates, CSA-ES exhibits reliable divergence of the step-size on plateaus, essential for escaping suboptimal basins.

Key variance calculations further relate the stochastic fluctuations of log-step-size increments to the cumulation and damping parameters, providing practical guidelines such as $\sigma^{(g)}$ 2 to ensure negligible noise in the regime of interest (Chotard et al., 2012).

6. Interactions with Meta-Learning, Constraint Handling, and Population Control

CSA-ES constitutes the gold standard for self-adaptive global step-size control in ES and is empirically found to be robust across a diverse suite of benchmark functions, including high-dimensional, multimodal, and rugged plateau problems (Tessari et al., 2022, Duan et al., 2023). Reinforcement learning-based adaptation policies can in some instances match or marginally outperform CSA, but require extensive training data, careful feature normalization, and significant computational resources.

CSA's compatibility with adaptive Population Control Strategies (PCS) is tightly linked to its parameterization. For example, population control mechanisms such as APOP, pcCSA, and PSA interact favorably with fast (sqrt- $\sigma^{(g)}$ 3, $\sigma^{(g)}$ 4) CSA-ES variants, which ensure crisp performance monitoring and effective adaptation, while slower CSA variants may cause PCS routines to stall or destabilize (Omeradzic et al., 2024).

In parallel ES meta-frameworks, inner ESs may preserve CSA untouched, with only outer-level step-size recombination or diversity injection modifying the meta-population's adaptation trajectory (Duan et al., 2023).

7. Empirical Performance and Recommendations

Extensive numerical experiments confirm the theoretical predictions for all major CSA variants, across the sphere, random, Rastrigin, and other test problems. The most robust performance across diverse contexts is ensured by the "sqrt- $\sigma^{(g)}$ 5" CSA parameterization, which balances adaptation speed and stability regardless of the population size or dimension (Omeradzic et al., 2024, Omeradzic et al., 2024). Standard recommendations for general-purpose black-box optimization with CSA-ES are summarized as:

Application Context	$\sigma^{(g)}$ 6 Choice	Damping	Comment
General-purpose, large $\sigma^{(g)}$ 7	$\sigma^{(g)}$ 8	$\sigma^{(g)}$ 9	Fast and robust adaptation
Population-adaptive	$g$ 0	CMA-ES default	Slower for large $g$ 1, more stable

When used with contemporary population control or parallelization strategies, the CSA-ES framework retains its effectiveness so long as the adaptation time-scale is matched to the monitoring and communication intervals used by these meta-algorithms (Duan et al., 2023, Omeradzic et al., 2024).

Cumulative Step-Size Adaptation thus remains a theoretically principled, empirically validated, and scaling-robust self-adaptation protocol for mutation strength throughout modern Evolution Strategies.