Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
92 tokens/sec
Gemini 2.5 Pro Premium
52 tokens/sec
GPT-5 Medium
25 tokens/sec
GPT-5 High Premium
22 tokens/sec
GPT-4o
99 tokens/sec
DeepSeek R1 via Azure Premium
87 tokens/sec
GPT OSS 120B via Groq Premium
457 tokens/sec
Kimi K2 via Groq Premium
252 tokens/sec
2000 character limit reached

Inexact Zeroth-Order Methods

Updated 19 August 2025
  • Inexact zeroth-order methods are algorithms that solve nonsmooth, nonconvex stochastic optimization problems using only noisy and biased function evaluations.
  • They combine randomized smoothing, finite-difference gradient estimates, and proximal updates to converge to surrogate stationary points despite oracle inexactness.
  • Explicit non-asymptotic guarantees quantify error trade-offs based on smoothing parameters, inexactness bounds, and problem dimensionality for robust application.

Inexact zeroth-order methods constitute a powerful class of algorithms for nonsmooth, nonconvex, and composite stochastic optimization problems where only noisy, biased, or approximate function evaluations are accessible, and explicit gradient information is unavailable or infeasible to compute. These methods operate by combining randomized smoothing, finite-difference-based gradient estimation, and proximal mapping, thereby enabling convergence to meaningful approximate stationary points of a suitably defined surrogate or envelope objective. The inexactness may originate from stochastic simulation, lower-level optimization solved inexactly, or noisy function oracles. The modern approach rigorously characterizes the stationarity achieved in terms of the Moreau envelope or Goldstein subdifferentials of surrogate objectives, and provides concrete non-asymptotic complexity guarantees and explicit dependence on the oracle inexactness. This versatile theory accommodates problems far beyond the reach of traditional stochastic gradient or sample-based zeroth-order methods.

1. Algorithmic Structure and Oracle Model

The core algorithm is an inexact zeroth-order proximal stochastic gradient ("Z-iProxSG"—Editor's term) method for composite problems of the form

minxRn φ(x)=E{F(x,ξ)}+r(x),\min_{x\in\mathbb{R}^n}~\varphi(x) = \mathbb{E}\big\{F(x,\xi)\big\} + r(x),

where F(,ξ)F(\cdot,\xi) is (possibly nonsmooth and nonconvex) but Lipschitz, and rr is a proper closed convex, proximable function. The algorithm accesses an inexact stochastic oracle that, given (x,ξ)(x,\xi), returns

F~(x,ξ)=F(x,ξ)+δ(x,ξ),δ(x,ξ)δ~,\widetilde{F}(x,\xi) = F(x,\xi) + \delta(x,\xi),\qquad |\delta(x,\xi)| \leq \widetilde{\delta},

where δ\delta models noise and bias from lower-level approximation or simulation.

At iteration tt:

  • A random direction WtW_t is sampled uniformly from the unit sphere Sn1S^{n-1}.
  • Noisy evaluations are performed at xt+μWtx_t + \mu W_t and xtμWtx_t - \mu W_t.
  • The finite-difference gradient estimator is computed: Gt=n2μ(F~(xt+μWt,ξt)F~(xtμWt,ξt))Wt.G_t = \frac{n}{2\mu}\left(\widetilde{F}(x_t + \mu W_t, \xi_t) - \widetilde{F}(x_t - \mu W_t, \xi_t)\right) W_t.
  • A proximal update is applied: xt+1=proxαtr(xtαtGt).x_{t+1} = \operatorname{prox}_{\alpha_t r}\big(x_t - \alpha_t G_t\big). This construction only requires inexact, zeroth-order function values.

2. Surrogate Problem and Notions of Approximate Stationarity

Due to nondifferentiability and inexactness, the convergence of zeroth-order methods cannot be established with respect to standard stationarity of φ\varphi. Instead, the framework introduces smooth surrogate problems:

  • The randomly smoothed function

fμ(x)=EU[F(xμU,ξ)],UUniform(B1(0)),f_\mu(x) = \mathbb{E}_U[F(x - \mu U, \xi)],\qquad U\sim\mathrm{Uniform}(B_1(0)),

and the corresponding smoothed composite objective φμ(x)=fμ(x)+r(x)\varphi_\mu(x) = f_\mu(x) + r(x).

  • The Goldstein (μ,ε)(\mu,\varepsilon)-subdifferential and the Moreau envelope

eλφμ(x)=minw{φμ(w)+12λwx2}.e_\lambda \varphi_\mu(x) = \min_w\left\{ \varphi_\mu(w) + \frac{1}{2\lambda}\|w - x\|^2 \right\}.

Approximate stationarity is defined as the event that

eλφμ(x)ε,\|\nabla e_\lambda\varphi_\mu(x)\| \leq \varepsilon,

where λ=(2ρ)1\lambda = (2\rho)^{-1} and ρ=cGn/μ\rho = cG\sqrt{n}/\mu for some constant cc.

This framework generalizes both the envelope stationarity for weakly convex functions and the Goldstein subdifferential stationarity for nonsmooth optimization, enabling rigorous convergence statements in the inexact, nonsmooth setting.

3. Convergence Guarantees and Complexity Bounds

Under minimal assumptions (Lipschitz continuity of F(,ξ)F(\cdot,\xi) and bounded inexactness δ~\widetilde{\delta}), the Z-iProxSG method achieves explicit non-asymptotic guarantees:

  • After TT iterations, the expected squared gradient norm of the Moreau envelope at a randomly chosen iterate x~=xt\tilde{x} = x_{t^*} (where tt^* is chosen at random with weights proportional to the step-size αt\alpha_t) satisfies

E[e(2ρ)1φμ(x~)2]O(1T+1+E(δ~,μ,n)),\mathbb{E}\left[\|\nabla e_{(2\rho)^{-1}} \varphi_\mu(\tilde{x})\|^2\right] \leq O\left(\frac{1}{\sqrt{T+1}} + \mathcal{E}(\widetilde{\delta},\mu,n)\right),

where the additive error E\mathcal{E} depends explicitly on the inexactness δ~\widetilde{\delta}, smoothing parameter μ\mu, and the dimension nn. For error δ~=O(μ2/n)\widetilde{\delta} = O(\mu^2/n), the convergence rate matches that of the exact-oracle case, demonstrating robustness to realistic oracle noise (Pougkakiotis et al., 15 Aug 2025).

Two regimes for the inexact oracle are considered:

  • (B1) Error δ(x,ξ)\delta(x,\xi) has mean Δ\Delta independent of xx (allowing for certain biases).
  • (B2) The variable remains in a compact set (when rr is an indicator), so the uniform bound suffices regardless of error bias.

This allows the method to accommodate a broad class of stochastic oracles, including those arising in inner-loop optimization or simulation.

4. Applicability to Challenging Problem Classes

The flexibility of the oracle model and the generalized stationarity concepts enable the algorithm to address a diverse range of settings, including:

  • Two-stage stochastic programming: F(x,ξ)F(x,\xi) defined as the optimal value of an inner (possibly nonconvex) optimization; the method only requires approximate solutions (possibly inexact, multi-valued) for the inner problem.
  • Stochastic minimax problems: covers both cases where the adversary has instantaneous or ergodic access to information—this includes robust machine learning (e.g., adversarial neural networks) and distributionally robust optimization.
  • General black-box and simulation-based optimization: where function evaluations are only accessible via expensive simulations or estimations.

These capabilities fundamentally surpass requirements of convexity, smoothness, and solution uniqueness of lower-level problems, thus extending the reach of zeroth-order efforts in stochastic optimization (Pougkakiotis et al., 15 Aug 2025).

5. Comparison with Existing Methods

Relative to prior work, the inexact zeroth-order approach in (Pougkakiotis et al., 15 Aug 2025) introduces several essential advancements:

  • Most extant zeroth-order and gradient-sampling methods address unconstrained or smooth settings, using exact oracles and requiring access to unbiased function values.
  • In contrast, Z-iProxSG accommodates generic composite structure (nonsmooth rr), noisy inexact evaluation, and nonconvex (often multi-level) functional forms.
  • The stationarity and convergence theory generalizes existing Moreau envelope and gradient-sampling stationarity in the literature, unifying and extending theoretical guarantees.
  • Non-asymptotic rates are explicit and show controlled dependence on the inexactness and dimension, thus quantifying the performance loss due to realistic oracle error rather than assuming idealized feedback.

6. Oracle Inexactness, Robustness, and Tunable Trade-offs

The key technical challenge is the presence of uniform, possibly biased noise in the function evaluations from the oracle. The proposed framework treats this by:

  • Rigorous control of the error in the finite-difference estimator, explicitly relating the size of errors in function evaluations to bias and variance terms in the gradient estimation.
  • Ensuring that as long as the error δ~=O(μ2/n)\widetilde{\delta}=O(\mu^2/n), the ultimate stationarity guarantees in Moreau envelope norm are only minimally degraded.
  • Allowing for trade-offs: increasing the smoothing radius μ\mu improves robustness to noise (since the relative error in the finite-difference shrinks) at the expense of introducing greater smoothing/bias to the original function.
  • Tuning of μ\mu and step size sequences enables one to interpolate between exact and highly inexact regimes, allowing practical deployment in computationally intensive or noisy black-box environments.

7. Implications and Theoretical Advancements

The introduction of surrogate (λ,μ,ε)-stationarity and the generalized framework for inexact zeroth-order optimization advances the state-of-the-art by:

  • Removing the requirement for convexity, differentiability, or unbiasedness of the oracle; the method can handle nonuniqueness and nonconvexity in inner optimization.
  • Establishing non-asymptotic rates for convergence to surrogate stationarity, thereby giving practitioners explicit guidance on sample complexity and attainable accuracy as a function of oracle and problem parameters.
  • Allowing unified treatment of diverse practical problems (black-box optimization, bilevel and minimax optimization, stochastic simulation, and composite regularized objectives) within a single theoretical construct.
  • Providing a new paradigm where the essential information is a proximal point of a smoothed envelope, rather than classical stationarity or Fejér monotonicity.

This theoretical machinery opens further directions for extending the robustness and applicability of zeroth-order methods, incorporating adaptive smoothing, sharper complexity analysis, and principled handling of more general or unbounded oracles.


Table 1. Key Features of Inexact Zeroth-Order Stochastic Composite Optimization (Pougkakiotis et al., 15 Aug 2025)

Feature Description Scope/Condition
Oracle Model Stochastic, possibly biased, bounded-error function value δ(x,ξ)δ~|\delta(x,\xi)| \leq \widetilde{\delta}
Smoothing Uniform-ball or Gaussian, parameter μ\mu controls trade-off μ>0\mu>0
Stationarity Metric eλφμ(x)ε\|\nabla e_\lambda \varphi_\mu(x)\| \leq \varepsilon λ=(2ρ)1\lambda = (2\rho)^{-1}
Composite Structure Nonsmooth, proximable rr included Proper, closed, convex rr
Applicability Two-stage, minimax, black-box, inexact bilevel problems No lower-level convexity/uniqueness needed
Convergence Rate Eeλφμ(x)2O(1/T+E)\mathbb{E} \|\nabla e_\lambda \varphi_\mu(x)\|^2 \leq O(1/\sqrt{T} + \mathcal{E}) E\mathcal{E}: explicit in μ\mu, nn, δ~\widetilde{\delta}

This synthesis encompasses algorithmic design, stationarity concepts, oracle modeling, convergence theory, and practical applicability informed by the latest research advances (Pougkakiotis et al., 15 Aug 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)