Papers
Topics
Authors
Recent
Search
2000 character limit reached

Frank-Wolfe Algorithm

Updated 21 March 2026
  • Frank-Wolfe Algorithm is a conditional gradient method that iteratively linearizes a convex function to solve constrained optimization problems.
  • It leverages the geometry of feasible sets in Banach and Hilbert spaces, with convergence analysis based on the regularity and curvature of the objective function.
  • Modern extensions include handling composite objectives with rate improvements from O(1/k) to O(1/k^ν), enabling applications in high-dimensional optimization.

The Frank-Wolfe (FW) algorithm—also known as the conditional gradient method—is a classical first-order algorithm for constrained optimization in Banach and Hilbert spaces, historically notable for its projection-free design based on iterative linearization. The FW method operates by iteratively solving a linear minimization subproblem over a constraint set, then updating the current iterate along the resulting direction with a carefully chosen step size. Its convergence analysis is fundamentally governed by the regularity of the objective function, the geometric structure of the feasible set, and the curvature of the objective. Modern developments include variants for composite objectives, rate guarantees under different smoothness regimes, and broad generalizations to infinite-dimensional and non-Euclidean settings (Xu, 2017).

1. Problem Formulation and Algorithmic Structure

The classical FW algorithm addresses the convex minimization problem in Banach spaces: minxCf(x),where CX\min_{x \in C}\, f(x), \quad \text{where } C \subset X with XX a real Banach space (norm \|\cdot\|), CC nonempty, closed, convex, and bounded; f:XRf: X \to \mathbb{R} is convex and Fréchet differentiable on an open set containing CC. The method generalizes to composite structure: minxCφ(x):=f(x)+g(x)\min_{x \in C} \, \varphi(x) := f(x) + g(x) where gg is proper, lower semicontinuous, convex, and finite on CC (Xu, 2017).

The basic FW iteration comprises:

  1. Linearization subproblem:

sk=argminsCf(xk),s,s_k = \arg\min_{s \in C} \langle f'(x_k), s \rangle,

where f(xk)f'(x_k) is the Fréchet derivative (the gradient in Hilbert spaces). For composite problems, sk=argminsCf(xk),s+g(s)s_k = \arg\min_{s \in C} \langle f'(x_k), s \rangle + g(s).

  1. Update step:

xk+1=xk+γk(skxk),x_{k+1} = x_k + \gamma_k (s_k - x_k),

with step size γk[0,1]\gamma_k \in [0,1] determined either by exact line search (argmin0γ1f(xk+γ(skxk))\arg\min_{0 \leq \gamma \leq 1} f(x_k + \gamma(s_k-x_k))), or via an open-loop rule (γk0\gamma_k \to 0, kγk=\sum_k \gamma_k = \infty).

These choices ensure that the iterates xkx_k remain feasible and that sparsity is promoted, as each xkx_k is a convex combination of at most k+1k+1 extreme points of CC (Xu, 2017).

2. Assumptions, Regularity, and Curvature Concepts

The convergence properties of FW fundamentally depend on the regularity of ff and the geometry of CC:

  • Basic convergence is proven under merely the uniform continuity of ff' on CC.
  • Composite case requires gg to be convex and lsc, finite on CC.
  • Rate analysis leverages higher-order curvature via

Cf(σ)=supx,sC,γ(0,1]σ(f(x+γ(sx))f(x)γf(x),sx)γσ,C_f^{(\sigma)} = \sup_{x,s \in C,\, \gamma \in (0,1]} \frac{\sigma (f(x + \gamma(s-x)) - f(x) - \gamma\langle f'(x), s-x \rangle)}{\gamma^\sigma},

for some σ(1,2]\sigma \in (1,2], generalizing the standard FW curvature (σ=2\sigma = 2).

If ff' is ν\nu-Hölder (with 0<ν10 < \nu \leq 1) then σ=1+ν\sigma=1+\nu, and Cf(1+ν)LH[diam(C)]1+νC_f^{(1+\nu)} \leq L_H\, [\operatorname{diam}(C)]^{1+\nu}; for ff' Lipschitz (ν=1\nu=1), Cf(2)L[diam(C)]2C_f^{(2)} \leq L [\operatorname{diam}(C)]^2. These constants drive the theoretical rate bounds (Xu, 2017).

3. Convergence and Complexity Analysis

3.1 Qualitative Convergence

With exact line search or any open-loop sequence γk\gamma_k with γk0\gamma_k\to 0, kγk=\sum_k \gamma_k = \infty, the FW sequence satisfies f(xk)ff(x_k) \to f^*. The proof uses a Polyak-style recursion: ak+1(1γk)ak+γkEk,Ek0a_{k+1} \leq (1-\gamma_k) a_k + \gamma_k E_k, \quad E_k \to 0 with ak=f(xk)fa_k = f(x_k) - f^*. Under strict or uniform convexity, xkxx_k \to x^* weakly or strongly, respectively.

3.2 Rate of Convergence

With a finite curvature constant Cf(σ)<C_f^{(\sigma)}<\infty and exact line search, the rate is: f(xk)f=O(1kσ1).f(x_k) - f^* = O\left(\frac{1}{k^{\sigma-1}}\right).

  • For ff' Lipschitz: O(1/k)O(1/k),
  • For ff' ν\nu-Hölder: O(1/kν)O(1/k^\nu), matching, respectively, the prototypical sublinear and Hölder-smooth rates.

This is shown by considering that at each step,

f(xk+1)f(xk)γkΔk+(Cf(σ)/σ)γkσDσ,f(x_{k+1}) \leq f(x_k) - \gamma_k \Delta_k + (C_f^{(\sigma)}/\sigma)\gamma_k^\sigma D^\sigma,

where DD is diam(C)\operatorname{diam}(C), and optimizing γk\gamma_k with respect to Δk\Delta_k yields the recursion

Δk+1ΔkcΔkσ/(σ1).\Delta_{k+1} \leq \Delta_k - c \Delta_k^{\sigma/(\sigma-1)}.

Similar rates are maintained under open-loop step size schedules satisfying the standard summability/divergence conditions.

4. Generalization to Composite Objectives

For composite objectives φ(x)=f(x)+g(x)\varphi(x) = f(x) + g(x), the generalized FW method replaces the LMO by minimizing the sum of a linearization of ff and gg; the update is

sk=argminsCf(xk),s+g(s),s_k = \arg\min_{s \in C} \langle f'(x_k), s \rangle + g(s),

with step size and update analogous to the smooth case.

The composite FW inherits both the qualitative convergence and the quantitative rate properties from the smooth FW analysis, as the only nonlinearity handled by Taylor's expansion appears in ff (Xu, 2017).

5. Practical and Theoretical Implications

The Banach-space FW framework established in (Xu, 2017) demonstrates:

  • Universality: FW applies in any real Banach space for closed, bounded, convex feasible sets, without requiring projections.
  • Weak/Strong Convergence: With strict (or uniform) convexity of ff, weak (or strong) convergence of iterates is achieved.
  • Flexible Regularity: Rate bounds interpolate smoothly from O(1/k) (Lipschitz ff') to O(1/k\nu) (Hölder ff').
  • Composite Structure: The linearization principle seamlessly transfers to composite optimization settings.

No boundedness or explicit smoothness of ff is required beyond what is used in the curvature definition and proof arguments. This framework encompasses and extends the guarantees typically shown for FW in finite dimensions, providing theoretical clarity and practical guidance for applications in high- or infinite-dimensional structured settings (Xu, 2017).

6. Summary Table

Regularity type Curvature exponent (σ\sigma) Convergence rate
ff' Lipschitz $2$ O(1/k)O(1/k)
ff' Hölder-ν\nu 1+ν, 0<ν<11+\nu,\ 0 < \nu < 1 O(1/kν)O(1/k^\nu)

Under any of these regimes, the composite FW method achieves the same rates for φ(xk)φ\varphi(x_k) - \varphi^* as for the smooth case.

7. References

Primary reference for Banach-space analysis and extended curvature is "Convergence Analysis of the Frank-Wolfe Algorithm and Its Generalization in Banach Spaces" (Xu, 2017), including pointers to foundational work by Polyak, Jaggi, Nesterov, and others.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Frank-Wolfe Algorithm.