Papers
Topics
Authors
Recent
Search
2000 character limit reached

Two-Point Random Gradient Estimator

Updated 8 February 2026
  • The two-point random gradient estimator is a zeroth-order method that approximates gradients using two noisy function evaluations at random perturbations.
  • It smooths non-smooth functions and reduces bias, proving crucial in black-box optimization, feedback control, and high-dimensional online learning.
  • Its performance hinges on the choice of randomization scheme and geometry, which influence convergence rates, sample complexity, and robustness to noise.

A two-point random gradient estimator is a zeroth-order optimization tool that constructs an unbiased or asymptotically unbiased estimate of a function's gradient using only two noisy function evaluations at random perturbations of the current point. It is foundational in model-free optimization, black-box feedback control, and high-dimensional online learning, where gradient information is unavailable or infeasible to compute. The estimator's properties—variance, bias, minimality, adaptivity—are governed by the choice of randomization scheme and the geometry of the problem, with substantial impacts on convergence rates, sample complexity, and robustness to noise.

1. Mathematical Formulations and Fundamental Properties

Given a smooth function f:RdRf:\mathbb{R}^d\to\mathbb{R} and a random direction uu drawn from a prescribed distribution, the canonical two-point estimator is given by

gδ(x;u)=f(x+δu)f(xδu)2δu,δ>0g_{\delta}(x;u) = \frac{f(x+\delta u)-f(x-\delta u)}{2\delta}\, u, \quad \delta>0

(Ma et al., 22 Oct 2025). For fixed-perturbation estimators (e.g., forward difference), a closely related form is

g~δ(x;v)=vδ[f(x+δv)f(x)]\tilde{g}_\delta(x; v) = \frac{v}{\delta}\left[f(x+\delta v) - f(x)\right]

(Mehrnoosh et al., 15 Sep 2025). The estimator is asymptotically unbiased as δ0\delta\to 0, and under mild regularity on ff (C2C^2 smoothness),

Eu[gδ(x;u)]=f(x)+O(δ2)\mathbb{E}_u[g_\delta(x; u)] = \nabla f(x) + O(\delta^2)

(Ma et al., 22 Oct 2025). The key requirement is the so-called II-unbiasedness condition: E[uu]=Id\mathbb{E}[u u^\top]=I_d, ensuring correct mean scaling.

The estimator’s variance, critical for optimization efficiency, is strongly dictated by the distribution of uu. For small δ\delta and a=f(x)a=\nabla f(x),

Eu(ua)ua2=a(E[(uu)2]Id)a\mathbb{E}_u \left\| (u^\top a) u - a \right\|^2 = a^\top \left( \mathbb{E}[ (u u^\top)^2 ] - I_d \right) a

Thus, finding randomizations that minimize E[(uu)2]\mathbb{E}[ (u u^\top)^2 ] under the unbiasedness constraint is essential (Ma et al., 22 Oct 2025). This analysis underpins recent developments in minimum-variance and geometry-adapted randomization schemes.

2. Bias, Variance, and Smoothing Effects

Two-point estimators smooth a possibly nonsmooth function by convolution with the perturbation distribution's measure, yielding a differentiable surrogate fδf_\delta. By linearity: fδ(x):=Eu[f(x+δu)]f_\delta(x) := \mathbb{E}_{u}[f(x + \delta u)] with fδ(x)=Eu[gδ(x;u)]\nabla f_\delta(x) = \mathbb{E}_u[g_\delta(x; u)]. The bias between fδf_\delta and ff is bounded and scales as O(δ2)O(\delta^2) (under LL-smoothness or LL-Lipschitz assumptions) (Mehrnoosh et al., 15 Sep 2025, Akhavan et al., 2022). For specific randomizations, the smoothing error is further characterized: fh(x)f(x)Lhbq(d)|f_h(x) - f(x)| \leq L h\, b_q(d) with bq(d)b_q(d) dependent on the geometry (e.g., 1\ell_1 or 2\ell_2) and dimension (Akhavan et al., 2022).

Variance bounds depend crucially on the distribution of uu. For Gaussian or uniform-sphere uu, the variance of the estimator is typically O(dfδ2)O(d \| \nabla f_\delta \|^2), while for the one-point estimator it is O(d2)O(d^2) (Mehrnoosh et al., 15 Sep 2025). For 1\ell_1-randomized estimators, dimension-dependent constants and a weighted Poincaré inequality provide precise variance scaling (Akhavan et al., 2022).

3. Optimal Randomization: Minimum-Variance and Directional Schemes

Minimizing estimator variance under unbiasedness leads to a constrained optimization over the space of distributions for uu: minV:EuV[uu]=IdaEuV[(uu)2]a\min_{V: \mathbb{E}_{u\sim V}[uu^\top]=I_d} a^\top \mathbb{E}_{u\sim V}[(uu^\top)^2] a (Ma et al., 22 Oct 2025). The optimal solutions are split into two analytic families:

  • Fixed-length randomization: uu is supported on {u:u2=d}\{u : \|u\|^2 = d\} with E[uu]=Id\mathbb{E}[uu^\top]=I_d (e.g., uniform sphere, Rademacher, random basis).
  • Directionally Aligned Perturbations (DAP): uu satisfies (ua)2=a2(u^\top a)^2 = \|a\|^2, i.e., exactly aligned or antialigned with the gradient, with uu further distributed to keep E[uu]=Id\mathbb{E}[uu^\top]=I_d.

DAPs can be implemented practically by projecting random samples onto hyperplanes aligned with an estimate of f\nabla f (Ma et al., 22 Oct 2025). In settings where the underlying geometry is non-Euclidean, randomization on the 1\ell_1-sphere (as in g(x;ζ)g(x; \zeta) for ζUnif(S1d)\zeta\sim\mathrm{Unif}(\mathbb{S}_1^d)) becomes theoretically and empirically advantageous (Akhavan et al., 2022).

Method Randomization Key Scaling
Uniform sphere u2=d\|u\|^2=d O(d)O(d) variance
Gaussian uN(0,I)u\sim N(0, I) O(d2)O(d^2) variance
1\ell_1-sphere ζ1=1\|\zeta\|_1=1 O(dlogd)O(d\log d) regret
DAP (ua)2=a2(u^\top a)^2=\|a\|^2 Optimal variance O(d)O(d)

4. Embedding in Optimization Algorithms

Two-point random gradient estimators are embedded in various algorithmic frameworks for black-box optimization, online learning, and feedback control.

  1. Feedback Optimization for Plants: The estimator gkδg_k^\delta, formulated using two consecutive real-time plant evaluations under random perturbations, drives a gradient-free feedback update law for steady-state input selection. Convergence to ϵ\epsilon-stationary points for smooth, nonconvex costs is provable at rate O(ϵ1)O(\epsilon^{-1}), outperforming one-point methods (Mehrnoosh et al., 15 Sep 2025).
  2. Online Dual Averaging: In online convex optimization settings, the estimator is used to drive mirror descent or dual-averaging iterates, with stepsizes and smoothing radius possibly chosen adaptively. For 1\ell_1-sphere randomization, regret bounds match or improve prior work: O(LdT)O(L\sqrt{dT}) for Euclidean balls, O(LdTlogd)O(L\sqrt{dT \log d}) for the simplex (Akhavan et al., 2022).
  3. Zeroth-Order SGD: Fixed-length or DAP randomizations generate direction perturbations for each iterate, producing unbiased stochastic gradients. For functions with LL-smoothness and bounded fourth perturbation moment, step sizes ηT1/2\eta\sim T^{-1/2} achieve O(1/T)O(\sqrt{1/T}) convergence in the mean-squared norm of the gradient, with optimal O(d/ϵ2)O(d/\epsilon^{2}) sample complexity (Ma et al., 22 Oct 2025).

5. Convergence Analysis and Parameter Selection

Theoretical convergence rates are determined by balancing step size η\eta, smoothing parameter δ\delta, and noise/error levels. For the feedback optimization setting (Mehrnoosh et al., 15 Sep 2025), the key parameters satisfy:

  • η<1/[8L(p+4)]\eta < 1 / [8L(p+4)]
  • δ22ϵΦ/(Lp)\delta^2 \leq 2\epsilon_\Phi/(Lp)
  • Optimal δ2μ\delta^2 \propto \sqrt{\mu} for plant error μ\mu

Under these settings, after TT iterations,

1Tk=0T1EΦ~(uk)2ϵ\frac{1}{T} \sum_{k=0}^{T-1} \mathbb{E}\|\nabla \tilde{\Phi}(u_k)\|^2 \leq \epsilon

with overall complexity O(ϵ1)O(\epsilon^{-1}). In online convex optimization, regret bounds also reflect dimension and geometry, with parameter-free variants achieving optimal rates adaptively (Akhavan et al., 2022). For stochastic zeroth-order SGD, the convergence rate in the nonconvex case is

min1tTEf(xt)2f(x1)fηT+O(ηL2d)\min_{1\leq t\leq T} \mathbb{E}\|\nabla f(x_t)\|^2 \leq \frac{f(x_1) - f^*}{\eta T} + O(\eta L^2 d)

with T=O(d/ϵ2)T=O(d/\epsilon^2) iterations for precision ϵ\epsilon (Ma et al., 22 Oct 2025).

6. Practical Implementations: Randomization Schemes and Robustness

Implementation details are shaped by problem structure:

  • Gaussian and uniform-sphere samplers suit settings with isotropic geometry.
  • 1\ell_1-sphere randomization is advantageous for simplex-structured or sparse problems (Akhavan et al., 2022).
  • DAPs, requiring an ongoing estimate of the gradient for alignment, reduce estimator variance in “effective” directions—empirically yielding significantly smaller MSE and faster convergence in high variance coordinates or “needle-in-haystack” settings (Ma et al., 22 Oct 2025).

Adaptive step-size and smoothing-radius schedules, often of the form ηt1/k=1t1gk2\eta_t \propto 1/\sqrt{\sum_{k=1}^{t-1} \|g_k\|^2} and ht1/th_t \propto 1/\sqrt{t}, drive parameter-free operation ensuring convergence without prior knowledge of Lipschitz or noise parameters (Akhavan et al., 2022).

Simulator studies (e.g., 10-state, 5-input nonlinear plant, quadratic costs) confirm theoretical predictions: two-point estimators achieve O(ϵ1)O(\epsilon^{-1}) rates, matching model-based or ideal estimators and outperforming one-point schemes, with their performance robust (within limits) to choice of δ\delta and η\eta (Mehrnoosh et al., 15 Sep 2025).

7. Extensions and Connections to Prior Work

Two-point estimators form the basis for a large class of black-box and feedback optimization algorithms, generalizing and often improving upon one-point (finite difference) or randomly perturbed function evaluation schemes. Notably,

  • Duchi et al. and Shamir achieved optimal rates for 2/\ell_2/\ell_\infty cases using sphere/axis randomization; recent 1\ell_1-based schemes close logarithmic gaps in dimensional dependence (Akhavan et al., 2022).
  • The concept of minimum-variance randomizations unifies prior work, with DAPs and fixed-length randomizations yielding the best-possible rates for general smooth objectives and uniform-sphere or simplex geometries (Ma et al., 22 Oct 2025).

This family of estimators continues to evolve, with new randomization schemes and variance reduction strategies targeting increasingly high-dimensional and high-noise applications. Directionally aligned perturbations and geometry-matched randomizations delineate current directions for minimizing sample complexity and improving real-time or adversarial robustness.

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Two-Point Random Gradient Estimator.