Zeroth-Order Projected Stochastic Subgradient Method
- The paper introduces a zeroth-order method that uses Gaussian smoothing to approximate Clarke subgradients in constrained, nonsmooth, nonconvex optimization settings.
- It employs a two-timescale iterative scheme where fast gradient tracking and slow projected updates ensure feasibility over compact convex sets.
- The approach guarantees almost sure convergence to a neighborhood of Clarke stationary points with explicit bias control, advancing classical stochastic methods.
A zeroth-order projected stochastic subgradient method is an algorithmic framework for solving constrained stochastic optimization problems when gradients or subgradients of the objective function are unavailable or inaccessible, and only noisy function evaluations can be queried. These methods approximate generalized (in particular, Clarke) subgradients by using randomized smoothing, and combine stochastic gradient tracking with projection steps to handle convex constraints. This framework is motivated by optimizing Lipschitz continuous, nonsmooth, nonconvex objectives over compact convex sets, a setting for which classical gradient-based techniques are infeasible or insufficiently robust.
1. Smoothing-Based Zeroth-Order Subgradient Approximation
The main technical challenge addressed is the lack of a Taylor-like expansion or analytical handle on the Clarke subdifferential for nonsmooth functions, which impedes both subgradient approximation and theoretical analysis. To overcome this, the method utilizes Gaussian smoothing: for a given , a smoothed version of the objective is defined as
which is differentiable even if is nondifferentiable. The gradient of the smoothed function can be written as
A key structural result is that, under mild regularity (Lipschitz continuity) conditions, for every ,
where is a ball centered at zero with vanishing radius as (see (Paul et al., 14 Aug 2025)). Thus the expectation of the Gaussian-smoothed subgradient lies within an explicitly bounded distance of the Clarke subdifferential.
2. Two-Timescale Coupled Iterative Scheme
The algorithm employs a two-timescale stochastic approximation architecture:
- The fast timescale recursively tracks the (randomized, noisy) smoothed subgradient. At iteration , given and an independent standard Gaussian , the algorithm draws two function evaluations , with potential independent noise and computes
The auxiliary variable is updated by
with step-sizes satisfying , .
- The slow timescale performs the projected update:
where denotes orthogonal projection onto the compact convex set , and is a sequence such that (ensuring the timescales are well separated).
This two-timescale design ensures that closely tracks the expected smoothed subgradient for the current , while the projected descent—using as the "search direction"—enforces iterates remain feasible.
3. Convergence Properties and Neighborhood Characterization
By leveraging continuous-time dynamical systems theory and robust perturbation analysis (specifically, Lyapunov-based arguments and properties of set-valued Marchaud maps), the analysis establishes almost sure convergence of the iterates to a neighborhood of Clarke stationary points of the original, nonsmooth, nonconvex problem. The critical points of the limiting projected dynamical system satisfy
where denotes the normal cone at . Due to smoothing, the neighborhood size is controlled explicitly by the smoothing parameter via . As , the bias in the subgradient approximation vanishes, and the iterates become arbitrarily close (in limit) to the Clarke stationary set. This result yields the first almost sure convergence for zeroth-order methods with projections in the constrained, nonsmooth, nonconvex stochastic optimization regime (Paul et al., 14 Aug 2025).
4. Role and Adaptation of Gaussian Smoothing for Clarke Subdifferentials
Gaussian smoothing regularizes the nonsmooth objective without requiring an explicit subdifferential oracle. For a function that is merely Lipschitz, the smoothed version is always differentiable (by convolution with the Gaussian kernel), and its gradient can be efficiently and unbiasedly estimated by finite differences and random sampling. Importantly, while standard zeroth-order methods for smooth/nonconvex objectives approximate classical gradients, the approach here rigorously approximates elements of the Clarke subdifferential, which is fundamental for nonsmooth nonconvex analysis.
The explicit control of the bias —quantified in the error between the smoothed gradient and the true Clarke subgradient—permits a tradeoff: making small improves accuracy but increases variance and possibly the number of function evaluations required.
5. Comparisons to Classical and Contemporary Methodologies
Earlier zeroth-order stochastic methods have established guaranteed convergence only for unconstrained smooth problems or have provided non-almost-sure statements (e.g., convergence in only). Traditional techniques for nonsmooth, nonconvex problems presuppose access to subgradient oracles, which is implausible in many simulation optimization or black-box contexts.
Distinctive features of this method (Paul et al., 14 Aug 2025):
- Generalizes stochastic projected subgradient methods from subgradient-available settings to pure black-box (function value only) contexts.
- Handles constraints exactly via Euclidean projections, rather than through penalization.
- Achieves almost sure convergence to a quantified neighborhood, an advancement over prior results limited to asymptotic gaps or expectation guarantees.
This approach is complementary to smoothing-based zeroth-order approaches for unconstrained problems (Marrinan et al., 2023), but specifically overcomes additional technical obstacles in the analysis of constrained, nonsmooth landscapes (notably, the lack of a Taylor expansion for Clarke subdifferentials).
6. Practical Applicability and Further Implications
The method is designed for, and directly applicable to, scenarios where gradient or subgradient information is unavailable—such as simulation-based optimization, black-box machine learning, and all settings where only noisy function evaluations are feasible. The guaranteed feasibility of iterates (via projection), ability to handle nonconvexity and nonsmoothness simultaneously, and rigorous convergence characterization to Clarke stationary neighborhoods provide robustness for practical deployments. The separation of timescales and explicit bias-variance tradeoff (via the smoothing parameter) allow practitioners to tailor algorithmic performance to problem requirements and noise regimes.
Potential extensions suggested by the methodology include accelerated two-timescale schemes, adaptivity in the selection of smoothing and step-size parameters, and application to constraints beyond compact convex sets using more general projection or proximal operators. This methodology enables a novel class of zeroth-order projected methods for challenging nonsmooth stochastic optimization problems in high-dimensional black-box settings.