Natural Evolutionary Strategies (NES)

Updated 9 April 2026

Natural Evolutionary Strategies (NES) are a family of algorithms that optimize the expected fitness by updating search distributions using natural gradient methods.
NES variants, including full-covariance, separable, and discrete forms, adapt parameters to robustly handle ill-conditioned, high-dimensional, and mixed-domain optimization problems.
The framework is applied successfully in continuous optimization, reinforcement learning, and variational inference, offering enhanced sample efficiency and convergence.

Natural Evolutionary Strategies (NES) are a principled family of black-box optimization algorithms that adapt the parameters of a search distribution via natural gradient ascent in distribution space. NES provides a unified statistical framework that leverages information geometry to achieve parameterization invariance, improved convergence on ill-conditioned or non-separable fitness landscapes, and stable adaptation mechanisms for both continuous and discrete domains. The core paradigm is to represent the search as an optimization of the expected objective under a parameterized distribution and to iteratively update the distribution's mean and covariance via efficient estimators of the natural gradient. NES instantiations span full-covariance and factorized Gaussians, heavy-tailed families, mixed-integer and discrete domains, and yield state-of-the-art results across black-box function optimization, reinforcement learning, variational inference, and combinatorial program synthesis.

1. Core Principles and Objective Formulation

NES formalizes black-box optimization as maximizing the expected fitness under a parameterized search distribution. For an objective $f:\mathbb R^d\to\mathbb R$ and search distribution $p(x|\theta)$ , the objective is

$J(\theta) = \mathbb E_{x \sim p(\cdot|\theta)}[f(x)].$

The key innovation is to estimate $\nabla_\theta J(\theta)$ using the score-function identity: $\nabla_\theta J(\theta) = \mathbb E_{x\sim p}[f(x)\nabla_\theta \log p(x|\theta)].$ The core NES update replaces the Euclidean gradient with the natural gradient

$\tilde{\nabla}_\theta J = F(\theta)^{-1}\nabla_\theta J,$

where $F(\theta)$ is the Fisher information matrix associated with $p(\cdot|\theta)$ (Wierstra et al., 2011). This approach yields updates that are invariant to reparameterization and robust to the statistical manifold's curvature.

2. Algorithmic Frameworks and Variants

Several key NES variants are widely utilized:

xNES: Full-covariance multivariate Gaussian with exponential parameterization of the covariance $\Sigma = \sigma^2 B B^\top$ , natural gradients for mean $m$ , scale $p(x|\theta)$ 0, and shape $p(x|\theta)$ 1; $p(x|\theta)$ 2 per iteration (Wierstra et al., 2011).
SNES: Separable NES with a diagonal covariance $p(x|\theta)$ 3; $p(x|\theta)$ 4 per generation, suitable for high-dimensional axis-aligned problems.
Rank-One NES / FM-NES / CR-FM-NES: Approximations leveraging restricted or rank-one covariance updates. FM-NES and CR-FM-NES integrate a shape matrix or diagonal-plus-rank-one decomposition for fast covariance adaptation while achieving $p(x|\theta)$ 5 computational cost and improving performance on ridge-structured or non-separable landscapes (Nomura et al., 2022, Sun et al., 2011).
Discrete NES: Specialized formulations for Bernoulli and categorical search spaces, deriving natural-gradient updates that contract distributions towards high-fitness regions without explicit Fisher matrix computation. The natural gradient update for Bernoulli is

$p(x|\theta)$ 6

and similarly for categorical (Amin, 2024).

Mixed-Integer NES: DX-NES-ICI adapts the NES framework to mixed continuous-integer domains, handling plateaus induced by integer discretization via a combination of natural-gradient updates, encoding/decoding procedures, and plateau-leap mechanisms (Ikeda et al., 2023).

Fitness shaping, antithetic sampling, and learning-rate adaptation further improve sample efficiency and convergence behavior (Wierstra et al., 2011, Nomura et al., 2021).

3. Theoretical Foundations and Connections

NES updates have a deep connection with the Riemannian geometry of statistical manifolds. For parametric families such as the Gaussian, the natural gradient ensures that updates follow the geodesics of the Fisher metric, efficiently tracking search directions while maintaining parameterization invariance (Wierstra et al., 2011, Otwinowski et al., 2019). For normal families, the mean update reduces to preconditioned gradient ascent, and the covariance update dynamically adapts the search distribution's shape to local fitness curvature. In population genetics, the analogous replicator dynamics correspond exactly to a natural-gradient flow, and, under Gaussian approximations, selection-induced flows yield regularized Newton steps (Otwinowski et al., 2019).

Mirror NES (MiNES) extends this foundation, showing convergence of the adapted covariance towards the inverse Hessian of $p(x|\theta)$ 7, and establishing sublinear convergence rates for both covariance and the mean in smooth, strongly convex settings (Ye et al., 2019). Recent work has further recast NES as convex trust-region optimization (CoNES), where the natural gradient is the infinitesimal solution of a KL-constrained maximization and, for finite updates, CoNES optimizes a convex proxy with explicit parameterization invariance (Veer et al., 2020).

The Fisher information matrix is analytically tractable for multivariate Gaussians; efficient block-diagonal inversion (eNES) and low-rank approximations (R1-NES, CR-FM-NES) enable scaling to high dimensions (Sun et al., 2012, Sun et al., 2011, Nomura et al., 2022).

4. Advanced Adaptation Mechanisms

Natural Evolutionary Strategies have seen algorithmic advances addressing adaptation in challenging domains:

Rank-One and Rapid Covariance Adaptation: FM-NES combines shape-only rank-one updates (inspired by the CMA-ES evolution path) with the core NES rule, using eigenvalue-based ridge detection to trigger rapid alignment of the search covariance along valleys. This yields significant speedups (up to 1.8× versus DX-NES-IC) on ridge-structured and ill-conditioned problems (Nomura et al., 2021).
Learning Rate Adaptation: NES with adaptive step size dynamically tunes the learning rate based on the KL-divergence between successive search distributions, estimating signal-to-noise in the natural gradient and increasing update rates in easy regimes while remaining conservative in high-noise or multimodal settings (Nomura et al., 2021).
Sample Efficiency via Probabilistic Numerics: ProbNES replaces Monte Carlo gradient estimates with Bayesian quadrature, using Gaussian process surrogates for $p(x|\theta)$ 8 and variance-reduction-driven batch selection. This leads to exponential variance reduction and drastically improves sample efficiency on both synthetic and real-world tasks, outperforming standard NES and Bayesian optimization (Osselin et al., 9 Jul 2025).
Hybrid and Co-Evolutionary Extensions: NCES integrates rescaled gradients for distributed, multi-agent and multi-objective optimization, correcting for estimation bias in co-adaptation. Elitist learning-rate adaption by candidate roll-outs is used for robust convergence in distributed dynamic control, e.g., cooperative guidance (Lan et al., 2022).

These mechanisms ensure NES variants are competitive in high-dimensional, hybrid, or constraint-rich settings and can be tailored to exploit specific structure (e.g., ridges, mixed domains, cooperative objectives).

5. Applications, Empirical Performance, and Limitations

NES algorithms have been applied successfully across diverse domains:

Continuous Black-Box Optimization: NES variants (xNES, FM-NES, CR-FM-NES, eNES) achieve competitive or best-known evaluation efficiency on benchmark functions including Sphere, Rosenbrock, Ellipsoid, Cigar, Rastrigin, and others (Wierstra et al., 2011, Nomura et al., 2021, Nomura et al., 2022). FM-NES and CR-FM-NES outperform CMA-ES and xNES on ridge-structured and high-dimensional problems due to accelerated adaptation mechanisms.
Combinatorial/Discrete and Mixed-Integer Optimization: Discrete NES and DX-NES-ICI provide simple, principled natural-gradient updates for discrete and hybrid domains, matching or surpassing variational optimization and CMA-ES w. Margin in both sample efficiency and robustness, particularly as the contribution of continuous variables increases (Ikeda et al., 2023, Amin, 2024).
Machine Learning and Reinforcement Learning: NES is used in policy optimization for environments including MuJoCo, achieving 30–50% reductions in data requirements relative to CMA-ES or plain evolution strategies when combined with convex or Bayesian-quadrature-based step selection (CoNES, ProbNES) (Veer et al., 2020, Osselin et al., 9 Jul 2025).
Stochastic Variational Inference: NES serves as a black-box gradient estimator for the ELBO in VAEs and other latent variable models, functioning in non-reparameterizable or non-differentiable settings, matching or out-performing standard methods (Amin, 2023, Berliner et al., 2022).
Quantum Circuit Optimization: NES is effective for parametrized quantum circuits in regions of vanishing analytic gradient (barren plateaus), with tailor-made xNES/sNES variants providing accuracy comparable to state-of-the-art gradient methods at markedly reduced circuit evaluation counts (Anand et al., 2020).
Dynamic and Distributed Optimization: Natural co-evolutionary strategies (NCES) with adaptive learning rates are successfully deployed in cooperative control and multi-agent optimization under nonstationarity, achieving fine time and trajectory consensus in missile guidance benchmarks (Lan et al., 2022).

Limitations include increased per-iteration cost in full-covariance variants (O(d^3)), practical tuning requirements for step-sizes and ranking schemes, and heuristic choices for certain mechanisms (e.g., ridge detection in FM-NES). Extension to very high dimensionality often necessitates restricted or low-rank covariance representations (Nomura et al., 2022). No formal convergence guarantees exist for all NES variants—convergence analyses are available in strongly convex or quadratic settings or for specific mirror-NES frameworks (Ye et al., 2019).

6. Empirical Comparisons and Practical Considerations

Robust empirical evidence supports NES's competitiveness. On 40-D unconstrained and implicitly constrained benchmarks (Sphere, Rosenbrock, Cigar, Ellipsoid and their constrained variants), FM-NES outperforms xNES, CMA-ES, and resampling versions, especially on ridge-structured problems—yielding up to 1.8× speedup relative to DX-NES-IC. On high-dimensional separable and nonseparable functions (d up to 1000), CR-FM-NES and R1-NES achieve O(d) scaling and outperform diagonal CMA-ES competitors (Nomura et al., 2022, Sun et al., 2011). In discrete and mixed-integer testbeds, discrete NES and DX-NES-ICI provide superior success rates and require fewer evaluations when objective function contributions are dominated by continuous variables (Amin, 2024, Ikeda et al., 2023). ProbNES achieves 10–100× lower evaluation counts than CMA-ES and BO in sample-constrained hyperparameter optimization (Osselin et al., 9 Jul 2025).

Sample efficiency can be further enhanced via antithetic sampling, rank-based or distance-weighted weighting of samples, block-wise fitness baselines, importance mixing, and adaptive learning rate schedules (Sun et al., 2012, Wierstra et al., 2011).

Overall, NES offers a scalable, theoretically grounded, and adaptively parameterizable framework for black-box optimization, unifying stochastic search, information geometry, and covariance adaptation across a spectrum of structured, discrete, and high-dimensional problems. Continued research is extending the framework's sample efficiency (e.g., via probabilistic numerics), computational tractability, and applicability to domains requiring hybrid, discrete, or distributed decision making, and NES variants remain state-of-the-art in query-efficient optimization on both standard and emerging benchmarks.