Gradient-Free Continuous Optimization

Updated 17 April 2026

Gradient-free continuous optimization is a framework that minimizes functions using only value evaluations, bypassing the need for explicit derivatives.
It employs diverse techniques including finite difference schemes, randomized smoothing, and model-based searches to address noise, constraints, and non-smoothness.
These methods offer practical convergence, complexity bounds, and robustness, making them essential for high-dimensional and black-box optimization applications.

Gradient-free continuous optimization refers to algorithmic frameworks that minimize (or maximize) continuous functions using only function value evaluations, avoiding all use of explicit first- or higher-order derivatives. These approaches address scenarios where gradient information is unavailable or too costly to compute, such as black-box models, simulation-based optimization, or high-noise experimental settings. The field encompasses a range of methodologies—finite difference schemes, randomized smoothing, subspace approximations, model-based search, and discrete greedy strategies—each developed under different assumptions about function regularity, constraint structure, and the noise model.

1. Problem Classes and Foundational Assumptions

Gradient-free continuous optimization fundamentally addresses the unconstrained or constrained minimization of functions $f:\mathbb{R}^n\to\mathbb{R}$ (or $f:\mathcal{D}\subset\mathbb{R}^n\to\mathbb{R}$ ), based solely on a zeroth-order oracle: $x\mapsto f(x)$ , potentially noisy. The methodologies are characterized and selected by regularity assumptions:

Smoothness: Many state-of-the-art approaches assume $f$ is continuously differentiable ( $C^1$ ), with either globally or locally Lipschitz continuous gradients ( $\nabla f$ ), i.e., $\|\nabla f(x) - \nabla f(y)\| \le L\|x-y\|$ (Khanh et al., 2023).
Non-smooth regimes: For merely Lipschitz or even discontinuous $f$ , randomized smoothing or model-based methods are employed (Lin et al., 2023, Andrieu et al., 2024).
Convex and Non-convex Settings: Both convex and non-convex objectives are addressed, with convergence guarantees ranging from global optimality to stationarity or local optimality, depending on assumptions and algorithm structure (Gasnikov et al., 2022, Khanh et al., 2023).
Constraints: Problems may be unconstrained, bound-constrained ( $x\in\mathcal{D}$ ), or defined over polytopes or general convex sets, as in projection-free Frank–Wolfe variants (Sahu et al., 2018), or submodular maximization (Zhang et al., 2018).
Noise Models: Noiseless queries, additive or multiplicative noise, or stochastic oracles (expected value objectives) are supported by corresponding robustness analyses (Gasnikov et al., 2022, Lin et al., 2023).

2. Algorithmic Methodologies

Gradient-free methodologies exhibit rich algorithmic diversity, determined by the balance of dimensionality, regularity, noise, and computational budget.

2.1 Finite Difference Schemes

Classical finite difference methods construct forward and central difference estimators for the gradient: $g_\text{fd}(x;h) = \frac{1}{h} \sum_{i=1}^n [f(x + h e_i) - f(x)] e_i, \quad g_\text{cd}(x;h) = \frac{1}{2h} \sum_{i=1}^n [f(x + h e_i) - f(x - h e_i)] e_i$ These approaches require $f:\mathcal{D}\subset\mathbb{R}^n\to\mathbb{R}$ 0 (forward) or $f:\mathcal{D}\subset\mathbb{R}^n\to\mathbb{R}$ 1 (central) function calls per gradient approximation and exhibit an $f:\mathcal{D}\subset\mathbb{R}^n\to\mathbb{R}$ 2 bias when gradients are $f:\mathcal{D}\subset\mathbb{R}^n\to\mathbb{R}$ 3-Lipschitz. Adaptive schemes select $f:\mathcal{D}\subset\mathbb{R}^n\to\mathbb{R}$ 4 based on local gradient magnitude to balance truncation and numerical errors (Khanh et al., 2023).

2.2 Randomized and Smoothing-Based Estimators

Randomized estimators leverage either isotropic random directions or random coordinate directions:

One- and two-point estimators (sphere smoothing): For $f:\mathcal{D}\subset\mathbb{R}^n\to\mathbb{R}$ 5, the estimator

$f:\mathcal{D}\subset\mathbb{R}^n\to\mathbb{R}$ 6

is unbiased for $f:\mathcal{D}\subset\mathbb{R}^n\to\mathbb{R}$ 7, where $f:\mathcal{D}\subset\mathbb{R}^n\to\mathbb{R}$ 8 is a smoothed surrogate (Gasnikov et al., 2022).

Stochastic Subspace Descent (SSD): Each iteration descends in a random subspace of dimension $f:\mathcal{D}\subset\mathbb{R}^n\to\mathbb{R}$ 9, computing directional derivatives along randomized basis $x\mapsto f(x)$ 0 and updating $x\mapsto f(x)$ 1, or its zeroth-order equivalent (Kozak et al., 2020).

2.3 Greedy and Discrete Approximation Methods

For monotone DR-submodular maximization over a convex polytope, the LDGM algorithm discretizes the domain via lattice points and performs greedy selection of directions maximizing marginal gain. These methods optimize over low-dimensional combinatorial structures using only function evaluations (Zhang et al., 2018).

2.4 Gradient-Free Frank–Wolfe Variants

Projection-free optimization with zeroth-order access utilizes randomized (one- or multi-directional) directional derivatives, combined with Frank–Wolfe linear minimization over the feasible set. Momentum-averaged estimates improve empirical and theoretical performance (Sahu et al., 2018).

2.5 Feedback and Decentralized Model-Free Optimization

Two-point random gradient estimators, adapted to model-free feedback (real-time plant measurements), estimate the gradient of a composite objective via perturbation and function difference, achieving optimal $x\mapsto f(x)$ 2 complexity to $x\mapsto f(x)$ 3-stationarity under smoothness (Mehrnoosh et al., 15 Sep 2025). Decentralized methods, leveraging randomized smoothing, gradient tracking, and network communication, achieve complexity bounds matching centralized strategies under non-smooth, non-convex conditions (Lin et al., 2023).

2.6 Bayesian, Model-Based, and Integration-Based Approaches

Recent proposals iteratively fit parametric probability densities (exponential family) to the objective via sequential moment-matching (Bayesian update + reprojection), implementable via MC or SMC, and provably equivalent to time-inhomogeneous gradient descent on a sequence of smoothed objectives (Andrieu et al., 2024).

3. Theoretical Guarantees and Complexity

A rigorous theoretical foundation underpins gradient-free continuous optimization, with the following central guarantees:

Stationarity and Global Convergence: For $x\mapsto f(x)$ 4 functions with (locally or globally) Lipschitz gradients, accumulation points of adaptive finite difference methods correspond to stationary points, with global or local convergence backed by KL property analysis (Khanh et al., 2023).
Approximation Rates: LDGM achieves $x\mapsto f(x)$ 5-approximation for monotone DR-submodular maximization in $x\mapsto f(x)$ 6 evaluations, matching first-order continuous greedy rates (Zhang et al., 2018).
Complexity Bounds: For smooth objectives, two-point random direction methods attain $x\mapsto f(x)$ 7 to $x\mapsto f(x)$ 8-stationarity with $x\mapsto f(x)$ 9 function calls, matching lower bounds for zeroth-order optimization (Khanh et al., 2023, Gasnikov et al., 2022). In decentralized settings, $f$ 0 (DGFM) and $f$ 1 (DGFM $f$ 2) bounds have been established (Lin et al., 2023). For feedback optimization, the two-point method achieves $f$ 3 steps (Mehrnoosh et al., 15 Sep 2025).
Noise Robustness: Adaptive schemes (e.g., step-size selection, variance reduction) and smoothing approaches enhance robustness to oracle noise and non-smoothness (Khanh et al., 2023, Gasnikov et al., 2022). Greedy lattice methods demonstrate $f$ 4 additive noise resilience, outperforming zeroth-order Frank–Wolfe in high-noise regimes (Zhang et al., 2018).

4. Empirical Performance and Applications

Gradient-free methods have been empirically benchmarked across a range of applications:

Smooth/non-smooth minimization: Adaptive finite differences (DFC/DFB) outperform Nelder–Mead, Implicit Filtering, and random probing on convex, nonconvex, and noisy synthetic benchmarks (Khanh et al., 2023).
High-dimensional problems: SSD achieves strong performance on dimension-invariant worst-case examples, Bayesian hparam search (GPs), and PDE shape optimization, with subspace dimension $f$ 5 tuning variance and per-iteration cost (Kozak et al., 2020).
Constrained and stochastic settings: ZO Frank–Wolfe matches first-order methods in constrained Lasso and Cox regression tasks with one- or few-directional queries per iteration (Sahu et al., 2018).
Model-free control: Two-point FO outperforms one-point alternatives in noisy nonlinear feedback systems, matching static optimization rates (Mehrnoosh et al., 15 Sep 2025).
Decentralized and distributed optimization: DGFM/DGFM $f$ 6 demonstrate competitive loss reduction and adversarial attack strength in SVM and vision-network tasks versus centralized and baseline zeroth-order methods (Lin et al., 2023).
Submodular maximization: LDGM matches gradient-based algorithms in noise-free, and significantly outperforms (in robustness) under noisy or stochastic settings (Zhang et al., 2018).
Bayesian/integration-based strategies: Sequential MC variants exhibit rapid and robust empirical risk minimization on challenging noisy objective landscapes (Andrieu et al., 2024).

5. Practical Considerations and Performance Trade-Offs

Selection of gradient-free strategies is governed by problem structure, dimensionality, and resource budget. Key guidelines (see (Gasnikov et al., 2022, Khanh et al., 2023, Kozak et al., 2020)) include:

Criterion	Representative Method	Suitability/Comment
Low $f$ 7, $f$ 8, low noise	Finite differences, DFC/DFB	Simple, optimal for small problems
High $f$ 9, structureless	SSD, randomized two-point, smoothing	Sublinear scaling with $C^1$ 0, dimension-tolerant
Non-smooth or merely Lipschitz	Smoothing + accelerated zeroth order	Noise-tolerant, unbiased for smoothed objectives
Polytope constraint, monotone	LDGM, Frank–Wolfe ZO variants	Efficient, provable guarantees (submodular, convex)
Feedback/model-free	Two-point random direction, DGFM	Information-limited, real-time operation
High stochasticity	Variance-reduced DGFM $C^1$ 1, MC/SMC-based	Empirical variance minimization, robustness to noise

Default parameter choices and adaptivity (step size, subspace dimension, smoothing bandwidth) are crucial for efficiency. For instance, DFB recommends $C^1$ 2, $C^1$ 3, while SSD typically uses small $C^1$ 4 for early acceleration (Khanh et al., 2023, Kozak et al., 2020). Model-based integration methods use annealing in smoothing parameter and particle resampling for diversity (Andrieu et al., 2024).

6. Open Challenges and Ongoing Developments

Despite substantial progress, gradient-free optimization exhibits inherent limitations relative to first-order methods (oracle complexity scaling, noise sensitivity, limited variance reduction). Ongoing developments target:

Variance reduction for high-dimensional, noisy scenarios (e.g., DGFM $C^1$ 5 (Lin et al., 2023)).
Compositional and feedback optimization in adaptive or networked dynamical systems (Mehrnoosh et al., 15 Sep 2025).
Integration of Bayesian, MC/SMC-based strategies with efficient parallel computing (Andrieu et al., 2024).
Hybridization with gradient surrogates—distillation, model-based guidance, and combination with learning-based approaches.

Provable optimality, dimension-independence, and adaptivity to general function structures remain central research directions. In summary, gradient-free continuous optimization provides a critical, theoretically grounded toolkit for a broad class of black-box, high-noise, and structurally complex problems in modern computational mathematics and engineering, with active convergence and complexity research continuing to expand both scope and robustness (Khanh et al., 2023, Gasnikov et al., 2022, Kozak et al., 2020, Lin et al., 2023, Andrieu et al., 2024, Mehrnoosh et al., 15 Sep 2025, Zhang et al., 2018, Sahu et al., 2018).