Zero-Order Optimization

Updated 20 March 2026

Zero-order optimization is a derivative-free approach that estimates gradients using finite-difference or function-smoothing techniques.
It is widely applied in machine learning, adversarial attacks, and robotics where traditional gradient information is inaccessible.
Recent advances, including block and regression-based methods, have improved query efficiency and robust convergence rates.

Zero-order optimization (ZOO), also referred to as derivative-free or black-box optimization, encompasses a class of algorithms designed for solving optimization problems where only function evaluations are available and explicit gradient or higher-order information is inaccessible. These methods are foundational in domains where the underlying objective is non-differentiable, inherently noisy, defined implicitly through simulations, or privacy or security constraints preclude direct access to derivatives. ZOO techniques are now central to modern fields including machine learning, black-box adversarial attacks, hyperparameter and policy optimization, robust control, robotics, and privacy-oriented model inversion.

1. Fundamental Principles and Gradient Estimation

Zero-order optimization relies on the ability to probe the objective function $f:\mathbb{R}^d \to \mathbb{R}$ at chosen query points but prohibits the use of analytic or algorithmic derivatives. The core methodological principle is to emulate first-order (and occasionally higher-order) updates using suitably constructed finite-difference or function-smoothing estimators.

Canonical Gradient Estimators

Two-point random directions: For a query point $x$ , direction $u$ , and smoothing parameter $\lambda$ , the estimator is

$g_{\lambda}(x, u) = \frac{f(x+\lambda u) - f(x-\lambda u)}{2\lambda} u$

This estimator is unbiased for the gradient of the Gaussian-smoothed function $f_\lambda(x) = \mathbb{E}_{u}[f(x+\lambda u)]$ (Zhang et al., 5 Jun 2025, Liu et al., 2020, Duchi et al., 2013).

Coordinate-wise finite differences: Evaluation along standard basis directions; for the $i$ -th component:

$\frac{f(x + h e_i) - f(x)}{h}$

This variant requires $O(d)$ queries per step (Jordana et al., 27 Jun 2025, Liu et al., 2020).

Single-point estimators: Using only a single function evaluation per iteration, such as

$\frac{d}{\delta} f(x + \delta u) u, \quad u \sim \text{Unif}(S^{d-1})$

These methods have high variance but are beneficial when queries are extremely restricted (Chen et al., 6 Jul 2025, Mhanna et al., 2024).

Complex-step derivative: For analytic $f$ , estimates derivative using imaginary perturbations with improved bias/variance properties, e.g.,

$\partial f/\partial x_j(x) \approx \Im f(x + i h e_j)/h + O(h^2)$

Allowing stable estimates even with very small $h$ (Jongeneel, 2021).

The design and choice of estimator govern trade-offs between query efficiency, estimator bias, and variance, shaping convergence rates and practical applicability.

2. Convergence Theory and Complexity Bounds

ZOO methodology fundamentally alters the attainable convergence rates compared to first-order methods, due to the absence of direct gradients and the scaling of estimator variance with the problem dimension.

Convex Settings

Smooth Convex Optimization: With two-point estimators, the best attainable (in expectation) optimization error after $T$ iterations is

$O\left(\frac{R L \sqrt{d}}{\sqrt{T}}\right)$

where $R$ is the feasible domain diameter and $L$ the Lipschitz constant of the gradient (Duchi et al., 2013, Liu et al., 2020). Multiple evaluations per iteration (mini-batching) can improve the $\sqrt{d}$ factor to $\sqrt{d/m}$ (Duchi et al., 2013).

Optimality: Matching lower bounds show the $\sqrt{d}$ penalty for two-point methods is unimprovable (modulo constants), and single-point estimators incur a yet worse $d$ penalty (Duchi et al., 2013).
Strong Convexity: For $\alpha$ -strongly convex functions, two-point methods achieve error

$O\left(\frac{d}{\alpha \sqrt{T}}\right)$

for minimization (Akhavan et al., 2020, Akhavan et al., 2021).

Nonconvex and Stochastic Optimization

Stationarity: For $L$ -smooth nonconvex functions, two-point ZOO achieves

$\frac{1}{T}\sum_{t=1}^T \mathbb{E}\|\nabla f(x_t)\|^2 = O\left(\frac{\sqrt{d}}{\sqrt{T}}\right)$

requiring $O(d/\epsilon^4)$ total queries to reach $\mathbb{E}\|\nabla f\|^2 \leq \epsilon^2$ (Liu et al., 2020).

Single-point ZOO converges more slowly: typically $O(d^2/\epsilon^4)$ or O(1/ $K^{1/3}$ ) convergence rates to stationarity, depending on whether (centralized or distributed) settings and estimator variance constants (Chen et al., 6 Jul 2025, Mhanna et al., 2024).
Noisy/Adversarial Environments: Recent results show robust performance under adversarial noise, with error scaling appropriately in $d$ , noise level, and network connectivity (for distributed schemes) (Akhavan et al., 2021, Neto et al., 2024).

Global Black-Box and Discontinuous Objectives

Sampling-based ZOO, using SDE transport and adaptive "zooming", achieves global minimization under minimal regularity (Gibbs-integrability and local growth), with error

$\|1-\exp(U_*- \min_{i \le N} U(X_i^\theta))\|_{L^p} \lesssim d/(m\theta) + \exp[-N (\kappa_0/\kappa_1)^{d/m}/2p]$

and empirical robustness to nonsmooth/discontinuous objectives (Zhang, 20 Sep 2025).

3. Algorithmic Advances and Variants

Recent research has produced significant algorithmic refinements to ZOO, improving both efficiency and capabilities:

Block/partial-gradient estimators: By randomly updating blocks of variables at each step, query complexity per iteration is reduced to $O(1)$ while achieving optimal O( $d/\epsilon^4$ ) total complexity for ε-stationary solutions in constrained, nonconvex–concave settings (Jin et al., 22 Oct 2025).
Regression-based Single-Point ZO (RESZO): Uses regression over historical queries to fit local surrogates (linear, quadratic), substantially reducing estimator variance and empirically matching two-point estimator query complexity (Chen et al., 6 Jul 2025).
Sharpness-Aware Minimization (SAM) integration: ZOSA leverages ZO gradient estimation and an inner maximization to explicitly bias solutions toward flat minima, thereby improving generalization in few-shot and prompt-tuning applications (Fu et al., 12 Nov 2025, Zhang et al., 5 Jun 2025).
Safe ZOO with quadratic local approximations: For black-box constrained problems, quadratic models of constraints built from finite-difference gradients yield feasible iterates converging to ε-KKT points with O( $d^2/\epsilon^2$ ) complexity, outperforming log-barrier and Bayesian methods in constraint satisfaction (Guo et al., 2023).
Hierarchical ZOO for Deep Neural Networks: By recursively bisecting the depth axis, hierarchical ZOO achieves O(ML log L) query count (M = width, L = depth), matching backprop accuracy (cosine similarity >0.95) while avoiding the prohibitive O(ML²⁾ cost of neuron-wise ZOO in deep architectures (Cao et al., 11 Feb 2026).
Communication-Efficient Federated and Byzantine-Resilient ZOO: Recent algorithms such as CYBER-0 aggregate only a small number of scalar ZO queries from each client rather than full gradients, using robust aggregation (trimmed mean) to tolerate Byzantine attacks, achieving O(k) communication per round (Neto et al., 2024).
Distributed and Stochastic Settings: Extensions include ZOO with gradient tracking in decentralized networks, attaining consensus and O(1/K^{1/3}) convergence in nonconvex stochastic objectives using only single-point oracle calls per iteration (Mhanna et al., 2024).

4. Application Domains

Zero-order optimization has found impactful applications across multiple technical domains:

Model Inversion and Privacy: Reconstruction attacks on neural networks, such as "Inverting Black-Box Face Recognition Systems via Zero-Order Optimization in Eigenface Space" (DarkerBB), exploit ZOO in a PCA-projected subspace to efficiently reconstruct plausible images with limited queries and without embedding access (Razzhigaev et al., 11 Jun 2025).
Neural Architecture Search: ZARTS demonstrates the power of ZOO (random search, maximum-likelihood guided smoothing, gradientless descent) to robustly discover performant neural architectures in search spaces where gradient approximations distort the objective landscape (Wang et al., 2021).
Robotics and Control: ZOO underlies essential black-box trajectory and policy optimization algorithms, including Predictive Sampling, MPPI, and CMA-ES, solving high-dimensional planning in contact-rich and nonsmooth environments (Jordana et al., 27 Jun 2025).
Adversarial Machine Learning: ZOO is core to black-box adversarial attack crafting, leveraging two-point and sign-based gradient approximations to efficiently find high-confidence adversarial examples (Liu et al., 2020).
Federated Learning and Privacy: Zero-order methods reduce communication and memory overhead and enable privacy-preserving distributed model training (Neto et al., 2024).

5. Structural Bias and Regularization Effects

A prominent feature of zero-order methods is implicit regularization. Standard two-point estimators, via the smoothing they induce, bias optimization trajectories toward flat minima—those with small Hessian trace—demonstrated both theoretically and empirically in large-scale models (Zhang et al., 5 Jun 2025, Fu et al., 12 Nov 2025). Sharpness-aware ZOO, by design, enhances this effect, further increasing generalization in modern deep learning pipelines.

6. Limitations and Frontiers

While powerful, ZOO is not free from limitations:

Curse of Dimensionality: Variance of random-direction estimators scales with dimension; single-point methods suffer most ( $O(d^2)$ ), and even the best two-point methods pay an inherent $\sqrt{d}$ penalty unless massive parallelization or block-structure is exploited (Duchi et al., 2013, Jin et al., 22 Oct 2025).
Query Complexity: High iteration counts are often unavoidable unless prior structure, surrogate modeling, or adaptive strategies can be invoked. Sampling-based global ZOO requires increasing sample counts as dimension grows to maintain error bounds (Zhang, 20 Sep 2025).
Bias-Variance Tradeoff: Fine-tuning smoothing parameters, step sizes, and block sizes is critical and problem-dependent (Jin et al., 22 Oct 2025, Chen et al., 6 Jul 2025).
Numerical Instability in Finite Differences: Addressed by complex-step and regression-based approaches, which can yield stable, low-bias estimates even under numerical or stochastic noise (Jongeneel, 2021, Chen et al., 6 Jul 2025).

7. Tables and Summary of Complexity Bounds

Estimator / Method	Per-iteration Query Cost	Overall Complexity (convex, smooth)	Remarks
Two-point random directions	$O(1)$	$O(\sqrt{d}/\epsilon^2)$	Minimax optimal for many settings
Single-point estimator	$O(1)$	$O(d/\epsilon^2)$	Higher variance, slower in high-d
Coordinate-wise (full) finite diff.	$O(d)$	$O(1/\epsilon^2)$	Expensive per-iteration in high-d
Block-coordinate/partial ZOO	$O(b)$	$O(d/\epsilon^4)$ (nonconvex)	Efficient for adjustable block size
RESZO (regression-based single pt.)	$O(1)$	$O(d\sqrt{d}/\epsilon^2)$ approx.	Surrogate fit using query history
Safe ZOO (quadratic approx.)	$O(d)$	$O(d^2/\epsilon^2)$	Strict constraint feasibility

References

(Razzhigaev et al., 11 Jun 2025) Inverting Black-Box Face Recognition Systems via Zero-Order Optimization in Eigenface Space
(Zhang et al., 5 Jun 2025) Zeroth-Order Optimization Finds Flat Minima
(Chen et al., 6 Jul 2025) Regression-Based Single-Point Zeroth-Order Optimization
(Fu et al., 12 Nov 2025) Zero-Order Sharpness-Aware Minimization
(Jongeneel, 2021) Imaginary Zeroth-Order Optimization
(Duchi et al., 2013) Optimal rates for zero-order convex optimization: the power of two function evaluations
(Jin et al., 22 Oct 2025) Query-Efficient Zeroth-Order Algorithms for Nonconvex Optimization
(Guo et al., 2023) Safe Zeroth-Order Optimization Using Quadratic Local Approximations
(Neto et al., 2024) Communication-Efficient Byzantine-Resilient Federated Zero-Order Optimization
(Mhanna et al., 2024) Single Point-Based Distributed Zeroth-Order Optimization with a Non-Convex Stochastic Objective Function
(Liu et al., 2020) A Primer on Zeroth-Order Optimization in Signal Processing and Machine Learning
(Zhang, 20 Sep 2025) Sampling-Based Zero-Order Optimization Algorithms
(Akhavan et al., 2021) Distributed Zero-Order Optimization under Adversarial Noise
(Cao et al., 11 Feb 2026) Hierarchical Zero-Order Optimization for Deep Neural Networks
(Akhavan et al., 2021, Jordana et al., 27 Jun 2025, Wang et al., 2021, Akhavan et al., 2020) and additional sources as referenced above.

Zero-order optimization thus represents a broad, rapidly evolving paradigm in computational mathematics and data science, offering principled frameworks for global, constrained, and distributed learning under profound structural restrictions on information access. Its foundational results, encompassing optimality, robustness, and flexibility, continue to drive advances in algorithmic design for modern machine learning and control systems.