Projection-Free Zeroth-Order Frank–Wolfe

Updated 24 April 2026

The framework integrates finite-difference gradient estimates with linear minimization oracles to solve constrained optimization without explicit projections.
Variance reduction and acceleration techniques yield competitive complexity bounds matching first-order methods in both convex and nonconvex settings.
Empirical results across matrix completion, SDP applications, and adversarial attacks demonstrate improved sample efficiency and query savings.

Projection-free zeroth-order Frank–Wolfe algorithms constitute a class of optimization procedures for constrained problems where only function evaluations (zeroth-order information) are available and projection onto the constraint set is computationally prohibitive. These methods blend zeroth-order (gradient-free) optimization with the conditional gradient (Frank–Wolfe) algorithm, leveraging linear minimization oracles (LMOs) over the feasible set in place of full projections. Recent advances unite variance reduction, acceleration, and momentum to approach the complexity bounds and empirical performance of first-order projection-free algorithms, fundamentally extending Frank–Wolfe methodology to black-box, highly structured, or combinatorial regimes.

1. Problem Formulation and Oracle Setting

The core setting involves minimizing a convex (or occasionally nonconvex) objective

$\min_{x \in \mathcal{C}} f(x) = \frac{1}{n}\sum_{i=1}^n f_i(x)$

where $\mathcal{C} \subset \mathbb{R}^d$ is a compact, convex set with diameter $D$ . The paradigm assumes the following oracle access:

Function Query Oracle (FQO): Given $(i, x)$ , outputs $f_i(x)$ (for finite-sum) or $f(x)$ (for general stochastic objectives).
Linear Oracle (LO): Given $g \in \mathbb{R}^d$ , outputs $v^* = \arg\min_{v \in \mathcal{C}} \langle g, v \rangle$ .
No direct gradient access is assumed. Instead, finite-difference estimators are constructed using coordinate or randomly sampled directions.

The prototypical coordinate-wise two-point estimator with smoothing parameter $\mu > 0$ is given by

$\widehat{\nabla}_{\mathrm{coord}} f_i(x) = \sum_{j=1}^d \frac{f_i(x+\mu e_j) - f_i(x-\mu e_j)}{2\mu} e_j,$

with bias/variance controlled by $\mathcal{C} \subset \mathbb{R}^d$ 0 and the function smoothness (e.g., $\mathcal{C} \subset \mathbb{R}^d$ 1 in (Wei et al., 2021)).

2. Algorithmic Structure: Projection-Free Zeroth-Order Frank–Wolfe

Projection-free zeroth-order Frank–Wolfe algorithms inherit the conditional gradient update structure, substituting gradient information with zeroth-order surrogates. The general step at iteration $\mathcal{C} \subset \mathbb{R}^d$ 2 is:

Estimate gradient $\mathcal{C} \subset \mathbb{R}^d$ 3 using finite-differences.
Linear oracle call: $\mathcal{C} \subset \mathbb{R}^d$ 4.
Update: $\mathcal{C} \subset \mathbb{R}^d$ 5, with an appropriate stepsize $\mathcal{C} \subset \mathbb{R}^d$ 6.

Contemporary variants—such as ZO-ARCS (Wei et al., 2021), momentum-corrected trackers (Akhtar et al., 2021), and accelerated stochastic zeroth-order FW (Acc-SZOFW) (Huang et al., 2020)—employ outer–inner loop architectures, variance-reduced zeroth-order gradient approximations, and generalized momentum. A canonical example is the ZO-ARCS procedure, which maintains a pivot point $\mathcal{C} \subset \mathbb{R}^d$ 7, computes a full zeroth-order gradient at $\mathcal{C} \subset \mathbb{R}^d$ 8, and performs inner variance-reduced stochastic FW iterations with extrapolation and averaging sequences.

Pseudocode Fragment: ZO-ARCS Outer–Inner Loop

$f_i(x)$ 4 The CondG subroutine approximately solves the quadratic-regularized FW subproblem using only LO calls (Wei et al., 2021).

3. Variance Reduction and Acceleration Mechanisms

Variance reduction in zeroth-order Frank–Wolfe is principally achieved by referencing full or large-batch gradient surrogates at anchor points, then combining them with fresh function evaluation differences:

$\mathcal{C} \subset \mathbb{R}^d$ 9

This mechanism ensures $D$ 0 while the variance $D$ 1 diminishes as the number of inner iterations increases. Acceleration typically employs multiple sequences (e.g., $D$ 2 or $D$ 3 in (Wei et al., 2021, Huang et al., 2020)), mirroring Nesterov's or Lan's accelerated frameworks, but adapted for the conditional gradient context and fully decoupled from direct gradient computation.

Momentum-based updates are constructed so that averaging and extrapolation facilitate faster reduction of the optimality gap. In the ARCS framework:

$D$ 4

with analogous update expressions for $D$ 5 (Wei et al., 2021).

4. Convergence Rates and Oracle Complexity

Projection-free zeroth-order Frank–Wolfe algorithms now achieve rates matching or improving upon classic first-order projection-free methods and surpassing early zeroth-order schemes in both sample and iteration complexity.

Convex case (finite-sum): For ZO-ARCS, to reach $D$ 6,

$D$ 7

with $D$ 8 calls scaling as $D$ 9 or $(i, x)$ 0 (Wei et al., 2021).

Composite constraints and affine feasibilities: ZO-FW achieves $(i, x)$ 1 FQO calls and $(i, x)$ 2 LMO calls for $(i, x)$ 3-optimality and $(i, x)$ 4-feasibility (Akhtar et al., 2021).
Nonconvex setting: Algorithms such as Acc-SZOFW attain $(i, x)$ 5 finite-sum and $(i, x)$ 6 stochastic FQO complexities for $(i, x)$ 7-stationarity (Huang et al., 2020).
Early single-direction ZO-FW: Achieved $(i, x)$ 8 primal gap decay in convex settings and $(i, x)$ 9 Frank–Wolfe gap decay in nonconvex, reflecting tight known bounds for single-direction estimation (Sahu et al., 2018).

These results demonstrate that, up to dimension-dependent factors, modern projection-free zeroth-order methods reach the iteration–complexity and rate benchmarks of their first-order counterparts, and in many regimes achieve substantial query savings over projection-based black-box methods.

5. Projection-Free Property and Linear Oracles

At no point do these algorithms compute explicit projections onto $f_i(x)$ 0. Instead, updates rely entirely on LMOs, which are typically much cheaper (in both theory and numerics) for structured constraint sets such as simplices, nuclear-norm balls, or polytopes. For example, on the simplex, an LO reduces to a coordinate selection, and for the spectrahedron, an LO becomes an extremal eigenvector computation.

Conditional gradient subproblems within inner loops are solved via iterative LO calls that terminate once an optimality gap falls below a schedule-dependent tolerance. This design ensures that the full iteration remains within the feasible set and that each step exploits the low computational complexity of the LMO for appropriately structured $f_i(x)$ 1 (Wei et al., 2021, Akhtar et al., 2021).

6. Empirical Validation and Applications

Experimental results across multiple domains validate the practical impact of projection-free zeroth-order Frank–Wolfe:

Low-rank matrix completion: Using nuclear-norm ball constraints, ZO-ARCS outperforms prior zeroth-order and first-order projection-free methods in sample efficiency and suboptimality decay on image completion and LIBSVM tasks (Wei et al., 2021).
SDP-type applications: ZO-FW achieves the theoretically predicted rates and closely tracks first-order counterparts on sparse covariance estimation, k-means clustering relaxations, and sparsest-cut problems. The trimmed-FW variant empirically skips 20%–40% of LMO calls with negligible loss (Akhtar et al., 2021).
Robust black-box classification and adversarial attacks: Accelerated methods achieve 10–100× query speedups compared to previous deterministic zeroth-order and stochastic conditional gradient methods (Huang et al., 2020).
Complex combinatorial equilibrium optimization: ZO-Stackelberg incorporates projection-free Frank–Wolfe for fast inner equilibrium computation under zeroth-order leader updates, achieving orders-of-magnitude wall-clock and memory improvements over differentiation-based approaches (Masiha et al., 26 Feb 2026).
On high-dimensional problems, query complexity aligns with predicted $f_i(x)$ 2- and $f_i(x)$ 3-scaled rates, with empirical gap to fully gradient-based methods closing as variance-reduction and acceleration mechanisms are deployed.

7. Extensions, Limitations, and Future Directions

Projection-free zeroth-order Frank–Wolfe is now a mature high-performance framework for black-box and large-scale constrained optimization:

Variance-reduced and accelerated extensions generalize to stochastic and finite-sum settings, nonconvex and composite domains, and admit higher-order correction (SPIDER, SARAH) and adaptive stepsizes (Wei et al., 2021, Huang et al., 2020).
Generalization to combinatorial polytopes and nonsmooth objectives, including stratified sampling to maintain oracle efficiency in massive discrete domains (Masiha et al., 26 Feb 2026).
Open fronts include random directional or Gaussian-smoothing estimators for settings where coordinate queries are expensive, incorporation of saddle-point and nonconvex regimes via negative curvature detection, and scalability to distributed or federated settings.
Dimension dependence remains a critical bottleneck in very high-dimensional spaces unless domain-specific structure (e.g., sparsity, low rank) can be algorithmically exploited (Sahu et al., 2018).
Practical limitations involve computational overhead when the LO is expensive, the tuning of variance-reduction schedules, and possible degradation in non-smooth or poorly conditioned regimes.

The projection-free zeroth-order Frank–Wolfe framework, especially as crystallized in the ARCS algorithm, now constitutes a leading methodology for constraint-rich, black-box convex and nonconvex optimization, efficiently bridging the gap between first-order projection-free solvers and purely function-based oracles (Wei et al., 2021, Akhtar et al., 2021, Huang et al., 2020, Sahu et al., 2018, Masiha et al., 26 Feb 2026).