Inexact Zeroth-Order Methods

Updated 19 August 2025

Inexact zeroth-order methods are algorithms that solve nonsmooth, nonconvex stochastic optimization problems using only noisy and biased function evaluations.
They combine randomized smoothing, finite-difference gradient estimates, and proximal updates to converge to surrogate stationary points despite oracle inexactness.
Explicit non-asymptotic guarantees quantify error trade-offs based on smoothing parameters, inexactness bounds, and problem dimensionality for robust application.

Inexact zeroth-order methods constitute a powerful class of algorithms for nonsmooth, nonconvex, and composite stochastic optimization problems where only noisy, biased, or approximate function evaluations are accessible, and explicit gradient information is unavailable or infeasible to compute. These methods operate by combining randomized smoothing, finite-difference-based gradient estimation, and proximal mapping, thereby enabling convergence to meaningful approximate stationary points of a suitably defined surrogate or envelope objective. The inexactness may originate from stochastic simulation, lower-level optimization solved inexactly, or noisy function oracles. The modern approach rigorously characterizes the stationarity achieved in terms of the Moreau envelope or Goldstein subdifferentials of surrogate objectives, and provides concrete non-asymptotic complexity guarantees and explicit dependence on the oracle inexactness. This versatile theory accommodates problems far beyond the reach of traditional stochastic gradient or sample-based zeroth-order methods.

1. Algorithmic Structure and Oracle Model

The core algorithm is an inexact zeroth-order proximal stochastic gradient ("Z-iProxSG"—Editor's term) method for composite problems of the form

$\min_{x\in\mathbb{R}^n}~\varphi(x) = \mathbb{E}\big\{F(x,\xi)\big\} + r(x),$

where $F(\cdot,\xi)$ is (possibly nonsmooth and nonconvex) but Lipschitz, and $r$ is a proper closed convex, proximable function. The algorithm accesses an inexact stochastic oracle that, given $(x,\xi)$ , returns

$\widetilde{F}(x,\xi) = F(x,\xi) + \delta(x,\xi),\qquad |\delta(x,\xi)| \leq \widetilde{\delta},$

where $\delta$ models noise and bias from lower-level approximation or simulation.

At iteration $t$ :

A random direction $W_t$ is sampled uniformly from the unit sphere $S^{n-1}$ .
Noisy evaluations are performed at $x_t + \mu W_t$ and $x_t - \mu W_t$ .
The finite-difference gradient estimator is computed: $G_t = \frac{n}{2\mu}\left(\widetilde{F}(x_t + \mu W_t, \xi_t) - \widetilde{F}(x_t - \mu W_t, \xi_t)\right) W_t.$
A proximal update is applied: $x_{t+1} = \operatorname{prox}_{\alpha_t r}\big(x_t - \alpha_t G_t\big).$ This construction only requires inexact, zeroth-order function values.

2. Surrogate Problem and Notions of Approximate Stationarity

Due to nondifferentiability and inexactness, the convergence of zeroth-order methods cannot be established with respect to standard stationarity of $\varphi$ . Instead, the framework introduces smooth surrogate problems:

The randomly smoothed function

$f_\mu(x) = \mathbb{E}_U[F(x - \mu U, \xi)],\qquad U\sim\mathrm{Uniform}(B_1(0)),$

and the corresponding smoothed composite objective $\varphi_\mu(x) = f_\mu(x) + r(x)$ .

The Goldstein $(\mu,\varepsilon)$ -subdifferential and the Moreau envelope

$e_\lambda \varphi_\mu(x) = \min_w\left\{ \varphi_\mu(w) + \frac{1}{2\lambda}\|w - x\|^2 \right\}.$

Approximate stationarity is defined as the event that

$\|\nabla e_\lambda\varphi_\mu(x)\| \leq \varepsilon,$

where $\lambda = (2\rho)^{-1}$ and $\rho = cG\sqrt{n}/\mu$ for some constant $c$ .

This framework generalizes both the envelope stationarity for weakly convex functions and the Goldstein subdifferential stationarity for nonsmooth optimization, enabling rigorous convergence statements in the inexact, nonsmooth setting.

3. Convergence Guarantees and Complexity Bounds

Under minimal assumptions (Lipschitz continuity of $F(\cdot,\xi)$ and bounded inexactness $\widetilde{\delta}$ ), the Z-iProxSG method achieves explicit non-asymptotic guarantees:

After $T$ iterations, the expected squared gradient norm of the Moreau envelope at a randomly chosen iterate $\tilde{x} = x_{t^*}$ (where $t^*$ is chosen at random with weights proportional to the step-size $\alpha_t$ ) satisfies

$\mathbb{E}\left[\|\nabla e_{(2\rho)^{-1}} \varphi_\mu(\tilde{x})\|^2\right] \leq O\left(\frac{1}{\sqrt{T+1}} + \mathcal{E}(\widetilde{\delta},\mu,n)\right),$

where the additive error $\mathcal{E}$ depends explicitly on the inexactness $\widetilde{\delta}$ , smoothing parameter $\mu$ , and the dimension $n$ . For error $\widetilde{\delta} = O(\mu^2/n)$ , the convergence rate matches that of the exact-oracle case, demonstrating robustness to realistic oracle noise (Pougkakiotis et al., 15 Aug 2025).

Two regimes for the inexact oracle are considered:

(B1) Error $\delta(x,\xi)$ has mean $\Delta$ independent of $x$ (allowing for certain biases).
(B2) The variable remains in a compact set (when $r$ is an indicator), so the uniform bound suffices regardless of error bias.

This allows the method to accommodate a broad class of stochastic oracles, including those arising in inner-loop optimization or simulation.

4. Applicability to Challenging Problem Classes

The flexibility of the oracle model and the generalized stationarity concepts enable the algorithm to address a diverse range of settings, including:

Two-stage stochastic programming: $F(x,\xi)$ defined as the optimal value of an inner (possibly nonconvex) optimization; the method only requires approximate solutions (possibly inexact, multi-valued) for the inner problem.
Stochastic minimax problems: covers both cases where the adversary has instantaneous or ergodic access to information—this includes robust machine learning (e.g., adversarial neural networks) and distributionally robust optimization.
General black-box and simulation-based optimization: where function evaluations are only accessible via expensive simulations or estimations.

These capabilities fundamentally surpass requirements of convexity, smoothness, and solution uniqueness of lower-level problems, thus extending the reach of zeroth-order efforts in stochastic optimization (Pougkakiotis et al., 15 Aug 2025).

5. Comparison with Existing Methods

Relative to prior work, the inexact zeroth-order approach in (Pougkakiotis et al., 15 Aug 2025) introduces several essential advancements:

Most extant zeroth-order and gradient-sampling methods address unconstrained or smooth settings, using exact oracles and requiring access to unbiased function values.
In contrast, Z-iProxSG accommodates generic composite structure (nonsmooth $r$ ), noisy inexact evaluation, and nonconvex (often multi-level) functional forms.
The stationarity and convergence theory generalizes existing Moreau envelope and gradient-sampling stationarity in the literature, unifying and extending theoretical guarantees.
Non-asymptotic rates are explicit and show controlled dependence on the inexactness and dimension, thus quantifying the performance loss due to realistic oracle error rather than assuming idealized feedback.

6. Oracle Inexactness, Robustness, and Tunable Trade-offs

The key technical challenge is the presence of uniform, possibly biased noise in the function evaluations from the oracle. The proposed framework treats this by:

Rigorous control of the error in the finite-difference estimator, explicitly relating the size of errors in function evaluations to bias and variance terms in the gradient estimation.
Ensuring that as long as the error $\widetilde{\delta}=O(\mu^2/n)$ , the ultimate stationarity guarantees in Moreau envelope norm are only minimally degraded.
Allowing for trade-offs: increasing the smoothing radius $\mu$ improves robustness to noise (since the relative error in the finite-difference shrinks) at the expense of introducing greater smoothing/bias to the original function.
Tuning of $\mu$ and step size sequences enables one to interpolate between exact and highly inexact regimes, allowing practical deployment in computationally intensive or noisy black-box environments.

7. Implications and Theoretical Advancements

The introduction of surrogate (λ,μ,ε)-stationarity and the generalized framework for inexact zeroth-order optimization advances the state-of-the-art by:

Removing the requirement for convexity, differentiability, or unbiasedness of the oracle; the method can handle nonuniqueness and nonconvexity in inner optimization.
Establishing non-asymptotic rates for convergence to surrogate stationarity, thereby giving practitioners explicit guidance on sample complexity and attainable accuracy as a function of oracle and problem parameters.
Allowing unified treatment of diverse practical problems (black-box optimization, bilevel and minimax optimization, stochastic simulation, and composite regularized objectives) within a single theoretical construct.
Providing a new paradigm where the essential information is a proximal point of a smoothed envelope, rather than classical stationarity or Fejér monotonicity.

This theoretical machinery opens further directions for extending the robustness and applicability of zeroth-order methods, incorporating adaptive smoothing, sharper complexity analysis, and principled handling of more general or unbounded oracles.

Table 1. Key Features of Inexact Zeroth-Order Stochastic Composite Optimization (Pougkakiotis et al., 15 Aug 2025)

Feature	Description	Scope/Condition
Oracle Model	Stochastic, possibly biased, bounded-error function value	$\|\delta(x,\xi)\| \leq \widetilde{\delta}$
Smoothing	Uniform-ball or Gaussian, parameter $\mu$ controls trade-off	$\mu>0$
Stationarity Metric	$\\|\nabla e_\lambda \varphi_\mu(x)\\| \leq \varepsilon$	$\lambda = (2\rho)^{-1}$
Composite Structure	Nonsmooth, proximable $r$ included	Proper, closed, convex $r$
Applicability	Two-stage, minimax, black-box, inexact bilevel problems	No lower-level convexity/uniqueness needed
Convergence Rate	$\mathbb{E} \\|\nabla e_\lambda \varphi_\mu(x)\\|^2 \leq O(1/\sqrt{T} + \mathcal{E})$	$\mathcal{E}$ : explicit in $\mu$ , $n$ , $\widetilde{\delta}$

This synthesis encompasses algorithmic design, stationarity concepts, oracle modeling, convergence theory, and practical applicability informed by the latest research advances (Pougkakiotis et al., 15 Aug 2025).

PDF Markdown Chat (Pro)

References (1)

Inexact Zeroth-Order Nonsmooth and Nonconvex Stochastic Composite Optimization and Applications (2025)

Follow Topic

Get notified by email when new papers are published related to Inexact Zeroth-Order Method.