Zeroth-Order Projected Stochastic Subgradient Method

Updated 18 August 2025

The paper introduces a zeroth-order method that uses Gaussian smoothing to approximate Clarke subgradients in constrained, nonsmooth, nonconvex optimization settings.
It employs a two-timescale iterative scheme where fast gradient tracking and slow projected updates ensure feasibility over compact convex sets.
The approach guarantees almost sure convergence to a neighborhood of Clarke stationary points with explicit bias control, advancing classical stochastic methods.

A zeroth-order projected stochastic subgradient method is an algorithmic framework for solving constrained stochastic optimization problems when gradients or subgradients of the objective function are unavailable or inaccessible, and only noisy function evaluations can be queried. These methods approximate generalized (in particular, Clarke) subgradients by using randomized smoothing, and combine stochastic gradient tracking with projection steps to handle convex constraints. This framework is motivated by optimizing Lipschitz continuous, nonsmooth, nonconvex objectives over compact convex sets, a setting for which classical gradient-based techniques are infeasible or insufficiently robust.

1. Smoothing-Based Zeroth-Order Subgradient Approximation

The main technical challenge addressed is the lack of a Taylor-like expansion or analytical handle on the Clarke subdifferential for nonsmooth functions, which impedes both subgradient approximation and theoretical analysis. To overcome this, the method utilizes Gaussian smoothing: for a given $\lambda > 0$ , a smoothed version of the objective is defined as

$f_\lambda(x) = \mathbb{E}_{u \sim \mathcal{N}(0, I)} [f(x + \lambda u)],$

which is differentiable even if $f$ is nondifferentiable. The gradient of the smoothed function can be written as

$\nabla f_\lambda(x) = \frac{1}{\lambda} \mathbb{E}_u \left[ (f(x + \lambda u) - f(x)) u \right].$

A key structural result is that, under mild regularity (Lipschitz continuity) conditions, for every $x$ ,

$\nabla f_\lambda(x) \in \partial f(x) + B(0, r(\lambda)),$

where $B(0, r(\lambda))$ is a ball centered at zero with vanishing radius $r(\lambda) \to 0$ as $\lambda \to 0$ (see (Paul et al., 14 Aug 2025)). Thus the expectation of the Gaussian-smoothed subgradient lies within an explicitly bounded distance of the Clarke subdifferential.

2. Two-Timescale Coupled Iterative Scheme

The algorithm employs a two-timescale stochastic approximation architecture:

The fast timescale recursively tracks the (randomized, noisy) smoothed subgradient. At iteration $n$ , given $x_n$ and an independent standard Gaussian $U_n$ , the algorithm draws two function evaluations $F(x_n + \lambda U_n, \zeta_n^1)$ , $F(x_n - \lambda U_n, \zeta_n^2)$ with potential independent noise $\zeta$ and computes

$\widetilde{g}(n) = \frac{ F(x_n + \lambda U_n, \zeta_n^1) - F(x_n - \lambda U_n, \zeta_n^2) }{2\lambda} U_n.$

The auxiliary variable $y_n$ is updated by

$y_{n+1} = y_n + \beta(n)(\widetilde{g}(n) - y_n),$

with step-sizes $\beta(n)$ satisfying $\sum_n \beta(n) = \infty$ , $\sum_n \beta^2(n) < \infty$ .

The slow timescale performs the projected update:

$x_{n+1} = \mathcal{P}_\mathcal{X}( x_n - \alpha(n) y_n ),$

where $\mathcal{P}_\mathcal{X}$ denotes orthogonal projection onto the compact convex set $\mathcal{X}$ , and $\alpha(n)$ is a sequence such that $\alpha(n) / \beta(n) \to 0$ (ensuring the timescales are well separated).

This two-timescale design ensures that $y_n$ closely tracks the expected smoothed subgradient for the current $x_n$ , while the projected descent—using $y_n$ as the "search direction"—enforces iterates remain feasible.

3. Convergence Properties and Neighborhood Characterization

By leveraging continuous-time dynamical systems theory and robust perturbation analysis (specifically, Lyapunov-based arguments and properties of set-valued Marchaud maps), the analysis establishes almost sure convergence of the iterates to a neighborhood of Clarke stationary points of the original, nonsmooth, nonconvex problem. The critical points of the limiting projected dynamical system satisfy

$0 \in \partial f(x) + N_{\mathcal{X}}(x),$

where $N_{\mathcal{X}}(x)$ denotes the normal cone at $x \in \mathcal{X}$ . Due to smoothing, the neighborhood size is controlled explicitly by the smoothing parameter $\lambda$ via $r(\lambda)$ . As $\lambda \to 0$ , the bias in the subgradient approximation vanishes, and the iterates become arbitrarily close (in limit) to the Clarke stationary set. This result yields the first almost sure convergence for zeroth-order methods with projections in the constrained, nonsmooth, nonconvex stochastic optimization regime (Paul et al., 14 Aug 2025).

4. Role and Adaptation of Gaussian Smoothing for Clarke Subdifferentials

Gaussian smoothing regularizes the nonsmooth objective without requiring an explicit subdifferential oracle. For a function $f$ that is merely Lipschitz, the smoothed version $f_\lambda$ is always differentiable (by convolution with the Gaussian kernel), and its gradient can be efficiently and unbiasedly estimated by finite differences and random sampling. Importantly, while standard zeroth-order methods for smooth/nonconvex objectives approximate classical gradients, the approach here rigorously approximates elements of the Clarke subdifferential, which is fundamental for nonsmooth nonconvex analysis.

The explicit control of the bias $r(\lambda)$ —quantified in the error between the smoothed gradient and the true Clarke subgradient—permits a tradeoff: making $\lambda$ small improves accuracy but increases variance and possibly the number of function evaluations required.

5. Comparisons to Classical and Contemporary Methodologies

Earlier zeroth-order stochastic methods have established guaranteed convergence only for unconstrained smooth problems or have provided non-almost-sure statements (e.g., convergence in $L^1$ only). Traditional techniques for nonsmooth, nonconvex problems presuppose access to subgradient oracles, which is implausible in many simulation optimization or black-box contexts.

Distinctive features of this method (Paul et al., 14 Aug 2025):

Generalizes stochastic projected subgradient methods from subgradient-available settings to pure black-box (function value only) contexts.
Handles constraints exactly via Euclidean projections, rather than through penalization.
Achieves almost sure convergence to a quantified neighborhood, an advancement over prior results limited to asymptotic gaps or expectation guarantees.

This approach is complementary to smoothing-based zeroth-order approaches for unconstrained problems (Marrinan et al., 2023), but specifically overcomes additional technical obstacles in the analysis of constrained, nonsmooth landscapes (notably, the lack of a Taylor expansion for Clarke subdifferentials).

6. Practical Applicability and Further Implications

The method is designed for, and directly applicable to, scenarios where gradient or subgradient information is unavailable—such as simulation-based optimization, black-box machine learning, and all settings where only noisy function evaluations are feasible. The guaranteed feasibility of iterates (via projection), ability to handle nonconvexity and nonsmoothness simultaneously, and rigorous convergence characterization to Clarke stationary neighborhoods provide robustness for practical deployments. The separation of timescales and explicit bias-variance tradeoff (via the smoothing parameter) allow practitioners to tailor algorithmic performance to problem requirements and noise regimes.

Potential extensions suggested by the methodology include accelerated two-timescale schemes, adaptivity in the selection of smoothing and step-size parameters, and application to constraints beyond compact convex sets using more general projection or proximal operators. This methodology enables a novel class of zeroth-order projected methods for challenging nonsmooth stochastic optimization problems in high-dimensional black-box settings.

PDF Markdown Chat (Pro)

References (2)

Zeroth-Order Non-smooth Non-convex Optimization via Gaussian Smoothing (2025)

Zeroth-order Gradient and Quasi-Newton Methods for Nonsmooth Nonconvex Stochastic Optimization (2023)

Follow Topic

Get notified by email when new papers are published related to Zeroth-Order Projected Stochastic Subgradient Method.