Papers
Topics
Authors
Recent
Search
2000 character limit reached

Randomized Stochastic Gradient-Free Method

Updated 7 May 2026
  • The method leverages randomized smoothing and finite-difference estimators to approximate gradients, enabling optimization without direct gradient access.
  • It achieves established convergence rates and improved complexity via techniques like recursive variance reduction and momentum acceleration.
  • RSGF is pivotal for optimizing noisy, nonconvex functions in high-dimensional settings, with successful applications in training neural networks.

The randomized stochastic gradient-free (RSGF) method is a class of zeroth-order stochastic optimization algorithms for nonconvex (and possibly nonsmooth) objective functions, which operate in settings where only noisy function evaluations are available. RSGF leverages randomized smoothing and finite-difference gradient estimators to enable stochastic optimization without direct access to gradients. The method achieves established convergence rates to approximate Goldstein stationary points, with practical extensions that yield high-probability guarantees and improved complexity using recursive variance reduction and momentum techniques (Ghadimi et al., 2013, Luo et al., 2019, Lin et al., 2022, Chen et al., 2023).

1. Mathematical Model and Problem Setting

RSGF targets optimization of objectives of the form

minxRdf(x):=Eξ[F(x,ξ)]\min_{x \in \mathbb{R}^d} f(x) := \mathbb{E}_{\xi}[F(x,\xi)]

where F(,ξ)F(\cdot,\xi) is a possibly nonconvex, stochastic black-box function, and access is limited to (possibly noisy) function evaluations. Depending on the setting:

  • ff may be smooth or nonsmooth but is typically Lipschitz or mean-squared Lipschitz.
  • The accessible oracle satisfies bounded variance: Eξ[(F(x,ξ)f(x))2]σ02\mathbb{E}_{\xi}[(F(x,\xi)-f(x))^2] \leq \sigma_0^2.
  • Direct gradient, subgradient, or higher-order information is not available.

The central computational goal is to reach an ϵ\epsilon-stationary point—characterized by small gradient norm for smooth ff, or a (δ,ϵ)(\delta,\epsilon)-Goldstein stationary point for nonsmooth functions (Ghadimi et al., 2013, Lin et al., 2022).

2. Randomized Smoothing and Gradient-Free Estimation

Without gradient information, RSGF uses randomized smoothing to make ff more amenable to (finite-difference) estimation. The standard smoothing constructs a function fμf_{\mu} or fδf_{\delta} via convolution: F(,ξ)F(\cdot,\xi)0 This operation ensures that:

  • F(,ξ)F(\cdot,\xi)1, F(,ξ)F(\cdot,\xi)2 become differentiable even if F(,ξ)F(\cdot,\xi)3 is merely Lipschitz.
  • F(,ξ)F(\cdot,\xi)4, i.e., in the Goldstein subdifferential of F(,ξ)F(\cdot,\xi)5 at F(,ξ)F(\cdot,\xi)6 (Lin et al., 2022, Chen et al., 2023).
  • Smoothing and two-point finite-difference estimation underpin all major RSGF variants.

The RSGF gradient estimator is generally: F(,ξ)F(\cdot,\xi)7 or the Gaussian variant: F(,ξ)F(\cdot,\xi)8 These estimators are unbiased for F(,ξ)F(\cdot,\xi)9 (or ff0), with variance ff1 (Ghadimi et al., 2013, Chen et al., 2023).

3. Baseline RSGF Algorithms

All RSGF algorithms iterate as follows:

  1. Sample a random direction ff2 (or ff3), and a data sample ff4.
  2. Compute the stochastic finite-difference estimator ff5.
  3. Update the iterate: ff6, for some stepsize ff7.

A widely-analyzed version (uniform smoothing, two-point estimator) is (Lin et al., 2022, Chen et al., 2023): ff3 The output is ff8 for randomly ff9.

In the setting where Eξ[(F(x,ξ)f(x))2]σ02\mathbb{E}_{\xi}[(F(x,\xi)-f(x))^2] \leq \sigma_0^20 is smooth, the estimator is precisely tailored so that the bias due to smoothing Eξ[(F(x,ξ)f(x))2]σ02\mathbb{E}_{\xi}[(F(x,\xi)-f(x))^2] \leq \sigma_0^21 can be matched or dominated by variance via the choice Eξ[(F(x,ξ)f(x))2]σ02\mathbb{E}_{\xi}[(F(x,\xi)-f(x))^2] \leq \sigma_0^22, delivering optimal bias–variance tradeoff (Ghadimi et al., 2013).

4. Convergence Theory and Complexity

Expectation bounds:

Under Lipschitz assumptions, the smoothed RSGF algorithm guarantees: Eξ[(F(x,ξ)f(x))2]σ02\mathbb{E}_{\xi}[(F(x,\xi)-f(x))^2] \leq \sigma_0^23 with sample complexity (number of zero-order calls): Eξ[(F(x,ξ)f(x))2]σ02\mathbb{E}_{\xi}[(F(x,\xi)-f(x))^2] \leq \sigma_0^24 where Eξ[(F(x,ξ)f(x))2]σ02\mathbb{E}_{\xi}[(F(x,\xi)-f(x))^2] \leq \sigma_0^25 is the dimension, Eξ[(F(x,ξ)f(x))2]σ02\mathbb{E}_{\xi}[(F(x,\xi)-f(x))^2] \leq \sigma_0^26 is the smoothing parameter, and Eξ[(F(x,ξ)f(x))2]σ02\mathbb{E}_{\xi}[(F(x,\xi)-f(x))^2] \leq \sigma_0^27 the stationarity target (Lin et al., 2022, Chen et al., 2023).

For the smooth case, if Eξ[(F(x,ξ)f(x))2]σ02\mathbb{E}_{\xi}[(F(x,\xi)-f(x))^2] \leq \sigma_0^28 is Eξ[(F(x,ξ)f(x))2]σ02\mathbb{E}_{\xi}[(F(x,\xi)-f(x))^2] \leq \sigma_0^29-smooth and the stepsize/parameters are appropriately chosen,

ϵ\epsilon0

with required number of calls ϵ\epsilon1 (Ghadimi et al., 2013).

High-probability guarantees (Two-phase RSGF):

Using ϵ\epsilon2 independent runs, and validating candidate solutions with a mini-batch estimator, the two-phase RSGF achieves: ϵ\epsilon3 with total cost

ϵ\epsilon4

(Lin et al., 2022, Chen et al., 2023).

5. Advanced Extensions: Acceleration and Variance Reduction

Momentum/acceleration

Accelerated RSGF algorithms incorporate momentum, as in: ϵ\epsilon5 with ϵ\epsilon6 and ϵ\epsilon7 a normalization. This approach yields convergence rates ϵ\epsilon8 for strongly convex objectives (with bias and variance both ϵ\epsilon9) (Luo et al., 2019).

Recursive variance reduction (SPIDER/SARAH):

Utilizing recursive gradient estimators ff0, complexity with respect to ff1 can be improved from ff2 to ff3. Specifically, the GFM+ variant [Editor's term] forms

ff4

with suitable choice of epoch length ff5, batch sizes ff6 (Chen et al., 2023).

The total zeroth-order oracle complexity becomes: ff7 which is a dimension-dependent but tighter rate than vanilla RSGF.

6. Stationarity Concepts and Smoothing-Subdifferential Mapping

For nonsmooth objectives, stationarity is formalized using the Goldstein subdifferential: ff8 where ff9 is the Clarke subdifferential. (δ,ϵ)(\delta,\epsilon)0-Goldstein stationarity is achieved when

(δ,ϵ)(\delta,\epsilon)1

Uniform smoothing ensures (δ,ϵ)(\delta,\epsilon)2, and RSGF constructs its stationary guarantees using this mapping. This equivalence is central to both theoretical convergence and complexity analysis (Lin et al., 2022, Chen et al., 2023).

7. Practical Implementation and Applications

Parameter selection:

Step-size choices ((δ,ϵ)(\delta,\epsilon)3 for nonsmooth, (δ,ϵ)(\delta,\epsilon)4 for smooth) are critical for balancing estimation bias and variance. The smoothing parameter ((δ,ϵ)(\delta,\epsilon)5 or (δ,ϵ)(\delta,\epsilon)6) is usually tied to (δ,ϵ)(\delta,\epsilon)7 via (δ,ϵ)(\delta,\epsilon)8 (Ghadimi et al., 2013, Lin et al., 2022, Chen et al., 2023).

Two-phase validation:

Employing multiple independent runs and selecting via post hoc validation using mini-batch gradient estimators yields strong large-deviation bounds.

Applications:

Two-phase RSGF (2-SGFM) has been demonstrated to train small-scale convolutional neural networks on MNIST, showing competitive accuracy with classical gradient-based methods on this task even when only function-value queries are available. Batch size and validation sample size influence empirical stability and match predicted theory (Lin et al., 2022).

Complexity comparison and guidelines:

Algorithm Complexity (oracle calls) Key features
RSGF (basic) (δ,ϵ)(\delta,\epsilon)9 Two-point, smoothing
RSGF (two-phase) ff0 High-prob confidence
Accelerated RSGF (GFM+) ff1 Recursive variance red.
Classic gradient-based ff2 (for reference) First-order only

Summary:

The RSGF framework and its accelerations provide general, robust zeroth-order algorithms for high-dimensional, nonsmooth, nonconvex stochastic optimization, with theoretically grounded oracle complexity and demonstrated practical feasibility (Ghadimi et al., 2013, Luo et al., 2019, Lin et al., 2022, Chen et al., 2023).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Randomized Stochastic Gradient Freer (RSGF) Method.