Zeroth-Order Incremental Variance Reduction

Updated 15 January 2026

ZIVR is a set of derivative-free optimization techniques that combine zeroth-order estimators with incremental control variates to reduce variance and achieve near-optimal convergence rates.
It is applied to large-scale black-box and composite optimization problems, including simulation-based models and adversarial attacks where gradients are unavailable.
ZIVR variants employ advanced strategies like loopless control variates and Jacobian sketching to balance query costs and enhance performance in high-dimensional settings.

Zeroth-Order Incremental Variance Reduction (ZIVR) refers to a suite of techniques for derivative-free optimization that achieve variance reduction in stochastic gradient estimators when no explicit gradient information is available. ZIVR has become fundamental for large-scale black-box and composite optimization, especially for settings where gradients are inaccessible or prohibitively expensive, such as simulation-based models, adversarial attacks, and stochastic programming. The key insight of ZIVR is to combine gradient-free (zeroth-order, ZO) estimators with incremental, often SVRG/SARAH/SPIDER-style, control variate constructions, thereby matching or closely approaching the best-known first-order convergence rates, up to dimension factors, across convex, strongly convex, and nonconvex objective structures.

1. Problem Formulation and Zeroth-Order Gradient Estimation

ZIVR approaches address stochastic or finite-sum composite optimization problems of the form

$\min_{x \in \mathbb{R}^d} F(x) := \frac{1}{n} \sum_{i=1}^n f_i(x) + R(x)$

with $f_i$ smooth (possibly nonconvex) but with no access to $\nabla f_i$ , and $R$ convex and proximable. Key sub-classes include nonconvex finite-sum minimization, composite problems, and minimax or saddle-point settings. A standard ZO estimator is the two-point symmetric finite-difference:

$\hat{\nabla}_u f(x; \beta) = \frac{f(x+\beta u) - f(x-\beta u)}{2\beta} u$

with $u$ sampled either uniformly on the sphere or from a coordinate basis. This estimator is unbiased for the gradient of a mollified objective, but its variance scales poorly with $d$ and does not always vanish near solutions for composite constrained problems, motivating advanced variance reduction schemes (Zhang et al., 8 Jan 2026, Rando et al., 30 Jun 2025, Zhang et al., 2024, Hikima et al., 2024).

2. Variance Reduction Mechanisms in Zeroth-Order Optimization

Core to ZIVR is the incremental control variate, which builds surrogate gradients at a reference (snapshot) iterate and incrementally updates the estimator as the optimization progresses. In SVRG-style ZIVR, the estimator at step $k$ has the structure:

$g_k = \hat{\nabla} f_{i_k}(x_k) - \hat{\nabla} f_{i_k}(x_s) + g(x_s)$

where $x_s$ is a snapshot point and $g(x_s)$ is a full or batch-average ZO gradient at $x_s$ . This cancels most estimator noise from random directions and sample indices. In loopless settings (Zhang et al., 2024), the reference point is updated probabilistically to amortize the cost of full surrogates, crucially reducing per-iteration query cost to $\mathcal{O}(1)$ in expectation.

More sophisticated ZIVR instances, such as in (Zhang et al., 8 Jan 2026), maintain a Jacobian sketch or surrogate $J_k$ that is incrementally updated using sketches along mini-batches of index-direction pairs, and each iterate’s gradient uses both the surrogate and a correction with fresh two-point estimators:

$g_k = \frac{1}{n} J_k \mathbf{1} + \frac{d}{R} \sum_{(i,u) \in \mathcal{R}_k} \left( \hat{\nabla}_u f_i(x_k; \beta_k) - u u^T J_k^{(i)} \right)$

where $R$ is the batch size and $\mathcal{R}_k$ a sample of pairs. This architecture ensures the estimator approaches the true gradient as $J_k$ converges.

3. Convergence Guarantees and Complexity

ZIVR frameworks systematically achieve accelerated or optimal convergence rates—matching those of first-order incremental variance reduction up to dimension-dependent multiplicative overhead. A summary of established complexities is as follows:

Method	Problem Class	Oracle-Query Complexity
ZO-ProxSGD	Convex/Nonconvex	$O(d \sigma^2/\epsilon^2)$
ZO-SVRG-Coord (Ji et al., 2019)	Convex/Nonconvex	$O(d \min\{n^{2/3}/\epsilon, 1/\epsilon^{5/3}\})$
VR-SZD/ZIVR (Rando et al., 30 Jun 2025)	Nonconvex composite	$O(d n^{2/3}/\epsilon)$
ZIVR (Zhang et al., 8 Jan 2026)	Strongly convex	$O(d (n+L/\mu)\log(n/\epsilon))$
ZO-L-Katyusha (Zhang et al., 2024)	Strongly convex composite	$O(d\sqrt{L/\mu}\log(1/\epsilon))$
ZO-VRGDA (minimax) (Xu et al., 2020)	Minimax nonconvex/strongly-concave	$O((d_1+d_2) \kappa^3 \epsilon^{-3})$
ZIVR (decision-dependent) (Hikima et al., 2024)	Nonconvex, decision-dependent	$O(d^{9/2} \epsilon^{-6})$

Under Polyak-Łojasiewicz or strong convexity, linear convergence can be achieved. For the nonconvex regime, complexities of the form $O(n^{2/3}d/\epsilon)$ are attained. These results match or improve over the best prior ZO rates and approach the optimal first-order rates for corresponding settings.

4. Algorithmic Variants and Structural Enhancements

A range of ZIVR variants have been developed, each exploiting problem structure or specific estimator mechanics:

Structured directions: Use of orthogonal or randomized basis directions in the finite-difference estimators to further reduce variance with lower per-iteration complexity (Rando et al., 30 Jun 2025).
Loopless variance reduction: Algorithms such as ZO-L-Katyusha implement probabilistic resets of anchor points (“loopless” control variate) to balance surrogate recomputation and stochastic estimations with optimal amortized query cost (Zhang et al., 2024).
Jacobian sketching: Incremental surrogate Jacobian estimation, as developed in the latest general ZIVR frameworks (Zhang et al., 8 Jan 2026), achieving per-iteration costs independent of $n$ or $d$ while maintaining global variance reduction.
Minibatch and coordinate sampling: Flexible batching schemes enable control of bias-variance tradeoffs and enhance parallelism (Gu et al., 2016, Rando et al., 30 Jun 2025, Zhang et al., 8 Jan 2026).

In settings where only one-point queries are admissible, ZIVR applies control variate corrections via recycled function values or incremental baselines (Hikima et al., 2024, Xiao et al., 2022). When two-point queries are allowed, estimator variance is further suppressed, and convergence is accelerated.

5. Empirical Performance and Practical Considerations

Extensive numerical experiments confirm that ZIVR methods converge substantially faster (in terms of ZO oracle calls) compared with non-variance-reduced ZO baselines. On large-scale logistic regression, Cox regression, decision-dependent pricing, and universal adversarial attack tasks, ZIVR consistently outperforms classic ZO-SGD, basic two-point, and non-incremental VR schemes (Zhang et al., 8 Jan 2026, Hikima et al., 2024, Rando et al., 30 Jun 2025).

Key practices include tuning batch size $R$ to balance per-iteration cost and estimator quality (for most settings, $R = n^{2/3} d$ or $R=\Theta(1)$ suffices), employing adaptive step-size schedules, and leveraging structured estimator directions for dimensionality reduction.

6. Extensions: Composite, Minimax, and Decision-Dependent Problems

Recent lines of research extend ZIVR methodology to more challenging and structured optimization landscapes:

Composite problems: Integration with proximal operators allows ZIVR to handle $\ell_1$ , indicator, and general convex constraints, matching first-order complexity up to dimension factors (Zhang et al., 2024, Rando et al., 30 Jun 2025, Zhang et al., 8 Jan 2026).
Minimax settings: ZIVR techniques form the backbone of the first zeroth-order methods to achieve optimal $\epsilon$ -complexity for nonconvex-strongly-concave minimax problems (Xu et al., 2020).
Decision-dependent stochastic objectives: ZIVR can leverage past-cached samples or control variates to remain competitive even when sampling distributions depend on decision variables, a setting where traditional variance-reduction mechanisms break down (Hikima et al., 2024).

Empirical evidence recommends defaulting to two-point ZIVR when available; otherwise, control-variate-enhanced one-point ZIVR offers substantial gains over classical approaches (Hikima et al., 2024).

ZIVR establishes a unified, flexible, and scalable framework for variance-reduced ZO optimization, subsuming earlier methods such as SZVR-G (Liu et al., 2018), SVRG-Coord (Ji et al., 2019), AsyDSZOVR (Gu et al., 2016), and newer accelerated approaches. ZIVR matches the oracle complexity of first-order incremental variance reduction (SVRG, SAGA, SPIDER), with an expected $O(d)$ per-iteration factor that is unavoidable in truly gradient-free regimes. Its incremental design permits structures such as composite regularization, Jacobian sketching, and loopless control variates, which are not available to prior batch-based or full-difference ZO methods.

Key distinctions include:

Avoidance of large, expensive full-batch queries (as required by classical ZO-SVRG).
Attainment of variance that vanishes near optimality for composite/constrained settings—addressing the non-vanishing variance problem of standard 2-point ZO estimators (Zhang et al., 2024).
Generality to handle decision-dependent and minimax settings beyond the reach of vanilla ZO-SGD or finite-difference baselines.

ZIVR thus delivers a rigorous, scalable, and versatile foundation for derivative-free optimization in modern high-dimensional and composite environments, with demonstrable theoretical and empirical superiority across a spectrum of settings (Zhang et al., 8 Jan 2026, Rando et al., 30 Jun 2025, Hikima et al., 2024, Zhang et al., 2024).