Frank-Wolfe Algorithm

Updated 21 March 2026

Frank-Wolfe Algorithm is a conditional gradient method that iteratively linearizes a convex function to solve constrained optimization problems.
It leverages the geometry of feasible sets in Banach and Hilbert spaces, with convergence analysis based on the regularity and curvature of the objective function.
Modern extensions include handling composite objectives with rate improvements from O(1/k) to O(1/k^ν), enabling applications in high-dimensional optimization.

The Frank-Wolfe (FW) algorithm—also known as the conditional gradient method—is a classical first-order algorithm for constrained optimization in Banach and Hilbert spaces, historically notable for its projection-free design based on iterative linearization. The FW method operates by iteratively solving a linear minimization subproblem over a constraint set, then updating the current iterate along the resulting direction with a carefully chosen step size. Its convergence analysis is fundamentally governed by the regularity of the objective function, the geometric structure of the feasible set, and the curvature of the objective. Modern developments include variants for composite objectives, rate guarantees under different smoothness regimes, and broad generalizations to infinite-dimensional and non-Euclidean settings (Xu, 2017).

1. Problem Formulation and Algorithmic Structure

The classical FW algorithm addresses the convex minimization problem in Banach spaces: $\min_{x \in C}\, f(x), \quad \text{where } C \subset X$ with $X$ a real Banach space (norm $\|\cdot\|$ ), $C$ nonempty, closed, convex, and bounded; $f: X \to \mathbb{R}$ is convex and Fréchet differentiable on an open set containing $C$ . The method generalizes to composite structure: $\min_{x \in C} \, \varphi(x) := f(x) + g(x)$ where $g$ is proper, lower semicontinuous, convex, and finite on $C$ (Xu, 2017).

The basic FW iteration comprises:

Linearization subproblem:

$s_k = \arg\min_{s \in C} \langle f'(x_k), s \rangle,$

where $f'(x_k)$ is the Fréchet derivative (the gradient in Hilbert spaces). For composite problems, $s_k = \arg\min_{s \in C} \langle f'(x_k), s \rangle + g(s)$ .

Update step:

$x_{k+1} = x_k + \gamma_k (s_k - x_k),$

with step size $\gamma_k \in [0,1]$ determined either by exact line search ( $\arg\min_{0 \leq \gamma \leq 1} f(x_k + \gamma(s_k-x_k))$ ), or via an open-loop rule ( $\gamma_k \to 0$ , $\sum_k \gamma_k = \infty$ ).

These choices ensure that the iterates $x_k$ remain feasible and that sparsity is promoted, as each $x_k$ is a convex combination of at most $k+1$ extreme points of $C$ (Xu, 2017).

2. Assumptions, Regularity, and Curvature Concepts

The convergence properties of FW fundamentally depend on the regularity of $f$ and the geometry of $C$ :

Basic convergence is proven under merely the uniform continuity of $f'$ on $C$ .
Composite case requires $g$ to be convex and lsc, finite on $C$ .
Rate analysis leverages higher-order curvature via

$C_f^{(\sigma)} = \sup_{x,s \in C,\, \gamma \in (0,1]} \frac{\sigma (f(x + \gamma(s-x)) - f(x) - \gamma\langle f'(x), s-x \rangle)}{\gamma^\sigma},$

for some $\sigma \in (1,2]$ , generalizing the standard FW curvature ( $\sigma = 2$ ).

If $f'$ is $\nu$ -Hölder (with $0 < \nu \leq 1$ ) then $\sigma=1+\nu$ , and $C_f^{(1+\nu)} \leq L_H\, [\operatorname{diam}(C)]^{1+\nu}$ ; for $f'$ Lipschitz ( $\nu=1$ ), $C_f^{(2)} \leq L [\operatorname{diam}(C)]^2$ . These constants drive the theoretical rate bounds (Xu, 2017).

3. Convergence and Complexity Analysis

3.1 Qualitative Convergence

With exact line search or any open-loop sequence $\gamma_k$ with $\gamma_k\to 0$ , $\sum_k \gamma_k = \infty$ , the FW sequence satisfies $f(x_k) \to f^*$ . The proof uses a Polyak-style recursion: $a_{k+1} \leq (1-\gamma_k) a_k + \gamma_k E_k, \quad E_k \to 0$ with $a_k = f(x_k) - f^*$ . Under strict or uniform convexity, $x_k \to x^*$ weakly or strongly, respectively.

3.2 Rate of Convergence

With a finite curvature constant $C_f^{(\sigma)}<\infty$ and exact line search, the rate is: $f(x_k) - f^* = O\left(\frac{1}{k^{\sigma-1}}\right).$

For $f'$ Lipschitz: $O(1/k)$ ,
For $f'$ $\nu$ -Hölder: $O(1/k^\nu)$ , matching, respectively, the prototypical sublinear and Hölder-smooth rates.

This is shown by considering that at each step,

$f(x_{k+1}) \leq f(x_k) - \gamma_k \Delta_k + (C_f^{(\sigma)}/\sigma)\gamma_k^\sigma D^\sigma,$

where $D$ is $\operatorname{diam}(C)$ , and optimizing $\gamma_k$ with respect to $\Delta_k$ yields the recursion

$\Delta_{k+1} \leq \Delta_k - c \Delta_k^{\sigma/(\sigma-1)}.$

Similar rates are maintained under open-loop step size schedules satisfying the standard summability/divergence conditions.

4. Generalization to Composite Objectives

For composite objectives $\varphi(x) = f(x) + g(x)$ , the generalized FW method replaces the LMO by minimizing the sum of a linearization of $f$ and $g$ ; the update is

$s_k = \arg\min_{s \in C} \langle f'(x_k), s \rangle + g(s),$

with step size and update analogous to the smooth case.

The composite FW inherits both the qualitative convergence and the quantitative rate properties from the smooth FW analysis, as the only nonlinearity handled by Taylor's expansion appears in $f$ (Xu, 2017).

5. Practical and Theoretical Implications

The Banach-space FW framework established in (Xu, 2017) demonstrates:

Universality: FW applies in any real Banach space for closed, bounded, convex feasible sets, without requiring projections.
Weak/Strong Convergence: With strict (or uniform) convexity of $f$ , weak (or strong) convergence of iterates is achieved.
Flexible Regularity: Rate bounds interpolate smoothly from O(1/k) (Lipschitz $f'$ ) to O(1/k^\nu) (Hölder $f'$ ).
Composite Structure: The linearization principle seamlessly transfers to composite optimization settings.

No boundedness or explicit smoothness of $f$ is required beyond what is used in the curvature definition and proof arguments. This framework encompasses and extends the guarantees typically shown for FW in finite dimensions, providing theoretical clarity and practical guidance for applications in high- or infinite-dimensional structured settings (Xu, 2017).

6. Summary Table

Regularity type	Curvature exponent ( $\sigma$ )	Convergence rate
$f'$ Lipschitz	$2$	$O(1/k)$
$f'$ Hölder- $\nu$	$1+\nu,\ 0 < \nu < 1$	$O(1/k^\nu)$

Under any of these regimes, the composite FW method achieves the same rates for $\varphi(x_k) - \varphi^*$ as for the smooth case.

7. References

Primary reference for Banach-space analysis and extended curvature is "Convergence Analysis of the Frank-Wolfe Algorithm and Its Generalization in Banach Spaces" (Xu, 2017), including pointers to foundational work by Polyak, Jaggi, Nesterov, and others.

Markdown Report Issue Upgrade to Chat

References (1)

Convergence Analysis of the Frank-Wolfe Algorithm and Its Generalization in Banach Spaces (2017)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Frank-Wolfe Algorithm.