Papers
Topics
Authors
Recent
Search
2000 character limit reached

Recursive Variance-Reduced ZO Methods

Updated 24 April 2026
  • The paper presents a novel formulation extending variance reduction frameworks like SVRG, SPIDER, and SAGA to noisy zeroth-order settings for improved convergence in high-dimensional, nonconvex optimization.
  • It details strategies such as SPIDER/SARAH-type recursion and incremental Jacobian updates that effectively suppress the intrinsic variance of gradient estimators.
  • Empirical and theoretical analyses demonstrate that these recursive methods achieve convergence rates comparable to first-order techniques while significantly reducing query complexity.

Recursive variance-reduced zeroth-order methods constitute a theoretically and practically significant class of algorithms for stochastic optimization when only noisy function evaluations are available, with an explicit focus on reducing the intrinsic high variance of zeroth-order (ZO) gradient estimators via recursive or incremental control variate schemes. These methods extend variance reduction frameworks such as SVRG, SPIDER, and SAGA to the ZO domain, often drawing on recursive update structures and control variates to suppress estimator noise without incurring the prohibitive costs of full-batch or high-query complexity estimators. This article surveys core formulations, algorithmic strategies, variance reduction mechanisms, complexity bounds, and representative algorithmic instantiations, drawing on representative advances from theoretical and applied literature.

1. Fundamental Problem Setup and Smoothing

Recursive variance-reduced zeroth-order methods address the problem of minimizing a (possibly nonsmooth, nonconvex) stochastic objective where direct gradient access is unavailable: minxXf(x)=Eξ[f~(x,ξ)]\min_{x\in\mathcal{X}} f(x) = \mathbb{E}_\xi[\tilde{f}(x,\xi)] with X\mathcal{X} a closed convex subset of Rn\mathbb{R}^n and f~\tilde{f} only available via noisy queries. The classical zeroth-order approach is to replace ff with a smoothed version, e.g., spherical or Gaussian smoothing: fη(x)=Eu[f(x+ηu)],uUniform(Sn1)f_\eta(x) = \mathbb{E}_u[f(x + \eta u)], \qquad u \sim \text{Uniform}(S^{n-1}) so that gradient information about fηf_\eta, which is smooth, can be estimated unbiasedly from function evaluations of ff at perturbed points. This smoothing is crucial for both theory (to guarantee stationarity in a generalized Clarke sense) and for reducing infinite variance when ff is only Lipschitz (Marrinan et al., 2023).

2. Zeroth-Order Gradient Estimators and Their Variance

Key ZO estimators include the two-point symmetric estimator: gη(x;v,ξ)=n2η[f(x+v,ξ)f(xv,ξ)]vvg_\eta(x; v, \xi) = \frac{n}{2\eta}\left[f(x + v, \xi) - f(x - v, \xi)\right]\frac{v}{\|v\|} with X\mathcal{X}0, and various single-directional estimators (random Gaussian, coordinate-finite-difference, etc.). The estimator

X\mathcal{X}1

is unbiased for X\mathcal{X}2: X\mathcal{X}3 with variance scaling as X\mathcal{X}4. Importantly, in standard ZO regimes this variance remains non-vanishing as X\mathcal{X}5 and X\mathcal{X}6 grow, motivating the development of recursive variance-reduction techniques that can efficiently drive estimator variance toward zero (Marrinan et al., 2023, Ji et al., 2019).

3. Recursive Variance-Reduction Strategies

Recursive variance-reduction in ZO optimization leverages control variates and recursions over historical gradient estimates, typically in one of two paradigms:

  • SPIDER/SARAH-Type Recursion: At iteration X\mathcal{X}7,

X\mathcal{X}8

where batch gradient snapshots are periodically taken, while inner iterations only incur low-variance updates via differences, and thus avoid full-batch cost (Ji et al., 2019, Zhang et al., 2022).

  • Incremental Jacobian or Control-Variate Updating: For finite-sum or composite problems, maintain a Jacobian estimate X\mathcal{X}9 and update only a fraction of its entries at each step,

Rn\mathbb{R}^n0

then use

Rn\mathbb{R}^n1

to achieve both variance reduction and computational scalability (Zhang et al., 8 Jan 2026).

The recursive structure ensures that estimator variance diminishes as historical information is incrementally adaptively reused or updated, ultimately matching theoretical rates of first-order variance-reduced methods up to a dimension factor.

4. Convergence Rates and Complexity Bounds

Recursive variance-reduced zeroth-order methods achieve complexity bounds that interpolate between standard zeroth-order SGD and first-order SVRG or SPIDER, with a critical dependence on batch schedule and recursion depth. Representative results:

Algorithm Class Stationarity Target Oracle Complexity Batch Cost Per Step Reference
ZO-GD (two-point) Rn\mathbb{R}^n2 Rn\mathbb{R}^n3 Rn\mathbb{R}^n4 (Ji et al., 2019)
ZO-SGD (random direction) Rn\mathbb{R}^n5 Rn\mathbb{R}^n6 Rn\mathbb{R}^n7 (Ji et al., 2019)
ZO-SPIDER-Coord (recursive) Rn\mathbb{R}^n8 Rn\mathbb{R}^n9 f~\tilde{f}0 (Ji et al., 2019, Zhang et al., 2022)
ZO-VRG-ZO (spherical, one-loop) f~\tilde{f}1 f~\tilde{f}2 f~\tilde{f}3 (Marrinan et al., 2023)
ZIVR (incremental composite) f~\tilde{f}4 f~\tilde{f}5 (nonconvex), f~\tilde{f}6 (convex) f~\tilde{f}7 (Zhang et al., 8 Jan 2026)

These rates are achieved by ensuring that recursive variance reduction ensures estimator error contracts fast enough to permit non-diminishing step sizes and low total sample complexity, even in the presence of high-dimensional or constrained domains (Marrinan et al., 2023, Zhang et al., 8 Jan 2026, Ji et al., 2019).

5. Algorithmic Instantiations and Extensions

Prominent algorithmic frameworks include:

  • VRG-ZO (spherical smoothing, projection-based) (Marrinan et al., 2023): Implements one-loop variance-reduced two-point smoothing with growing batch size f~\tilde{f}8 and spherical perturbations, achieving f~\tilde{f}9 projections and ff0 function calls for guaranteed ff1-Clarke stationarity.
  • ZO-SPIDER-Coord (Ji et al., 2019, Zhang et al., 2022): Alternates full and mini-batch central-difference gradient estimators in a recursive control variate fashion, with provable ff2 complexity and no need for diminishing step sizes.
  • Incremental ZIVR (Zhang et al., 8 Jan 2026): Maintains an explicit Jacobian approximation for finite-sum composite problems, incrementally randomizing updates for scalability, and supporting proximal structures.
  • Networked/DZOVR (Chen et al., 2023): Combines two-point estimators with momentum and gradient tracking for distributed nonconvex optimization, maintaining ff3 per-node sample complexity regardless of network topology.

These algorithms may employ randomization over directions, direction sampling schedules, adaptive batch sizes, and low-memory blockwise Jacobian storage to achieve practical scalability in high-dimensional or distributed settings.

6. Connections, Variations, and Practical Considerations

Recursive variance-reduced ZO methods are closely related to their first-order progenitors (SVRG, SARAH, SPIDER, SAGA), but must contend with the unique high-variance characteristics of finite-difference estimators. Key connections and considerations include:

  • Smoothing Parameter (ff4 or ff5) Tuning: The smoothing radius directly controls bias-variance tradeoffs, with smaller ff6 reducing the distance to Michelson–Clarke stationarity but increasing ff7 and step-size constraints (Marrinan et al., 2023).
  • Memory–Variance–Query Complexity Tradeoff: Algorithms such as ZIVR reduce variance with minimal memory overhead, compared to classical multi-loop methods that store full batch gradients (Zhang et al., 8 Jan 2026).
  • Distributed and Composite Optimization: Extensions to distributed (decentralized) and composite regularized domains (e.g., ff8 regularization, indicator constraints) have been achieved with careful split of estimator updates over blocks or network nodes (Chen et al., 2023, Zhang et al., 8 Jan 2026).
  • Negative Curvature Finding: In recent work, recursive ZO variance-reduced frameworks have been merged with negative curvature discovery heuristics to guarantee second-order stationarity efficiently (Zhang et al., 2022).
  • Adaptive Query Reuse: LAZO and related paradigms employ instance-adaptive reuse of queries, further reducing effective estimator variance and total query complexity (Xiao et al., 2022).

7. Practical Impact and Empirical Performance

Empirical studies demonstrate that recursive variance-reduced ZO frameworks consistently outperform classical ZO-SGD and non-recursive methods in terms of wall-clock query complexity, convergence rate, and stability—across applications ranging from robust regression, decentralized learning, bandit settings, composite regularization, and diffusion model fine-tuning (Marrinan et al., 2023, Chen et al., 2023, Zhang et al., 8 Jan 2026, Ren et al., 2 Feb 2025). For instance, VRG-ZO matches the best complexity of Gaussian-smoothed approaches for nonconvex and nonsmooth problems while delivering genuine ff9-Clarke stationarity, and ZIVR demonstrates faster convergence and lower oracle count in regularized learning tasks.

In summary, recursive variance-reduced zeroth-order methods represent the principal advance in rendering zeroth-order stochastic optimization efficient and scalable for complex, nonsmooth, nonconvex, distributed, and constraint-laden problems, achieving convergence rates previously accessible only to first-order methods but relying solely on function-value information. Representative algorithms, their analysis, and practical instantiations validate the effectiveness and theoretical optimality of recursive variance reduction in zeroth-order regimes (Marrinan et al., 2023, Ji et al., 2019, Chen et al., 2023, Zhang et al., 8 Jan 2026).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Recursive Variance-Reduced Zeroth-Order Methods.