Frank-Wolfe Algorithm
- Frank-Wolfe Algorithm is a conditional gradient method that iteratively linearizes a convex function to solve constrained optimization problems.
- It leverages the geometry of feasible sets in Banach and Hilbert spaces, with convergence analysis based on the regularity and curvature of the objective function.
- Modern extensions include handling composite objectives with rate improvements from O(1/k) to O(1/k^ν), enabling applications in high-dimensional optimization.
The Frank-Wolfe (FW) algorithm—also known as the conditional gradient method—is a classical first-order algorithm for constrained optimization in Banach and Hilbert spaces, historically notable for its projection-free design based on iterative linearization. The FW method operates by iteratively solving a linear minimization subproblem over a constraint set, then updating the current iterate along the resulting direction with a carefully chosen step size. Its convergence analysis is fundamentally governed by the regularity of the objective function, the geometric structure of the feasible set, and the curvature of the objective. Modern developments include variants for composite objectives, rate guarantees under different smoothness regimes, and broad generalizations to infinite-dimensional and non-Euclidean settings (Xu, 2017).
1. Problem Formulation and Algorithmic Structure
The classical FW algorithm addresses the convex minimization problem in Banach spaces: with a real Banach space (norm ), nonempty, closed, convex, and bounded; is convex and Fréchet differentiable on an open set containing . The method generalizes to composite structure: where is proper, lower semicontinuous, convex, and finite on (Xu, 2017).
The basic FW iteration comprises:
- Linearization subproblem:
where is the Fréchet derivative (the gradient in Hilbert spaces). For composite problems, .
- Update step:
with step size determined either by exact line search (), or via an open-loop rule (, ).
These choices ensure that the iterates remain feasible and that sparsity is promoted, as each is a convex combination of at most extreme points of (Xu, 2017).
2. Assumptions, Regularity, and Curvature Concepts
The convergence properties of FW fundamentally depend on the regularity of and the geometry of :
- Basic convergence is proven under merely the uniform continuity of on .
- Composite case requires to be convex and lsc, finite on .
- Rate analysis leverages higher-order curvature via
for some , generalizing the standard FW curvature ().
If is -Hölder (with ) then , and ; for Lipschitz (), . These constants drive the theoretical rate bounds (Xu, 2017).
3. Convergence and Complexity Analysis
3.1 Qualitative Convergence
With exact line search or any open-loop sequence with , , the FW sequence satisfies . The proof uses a Polyak-style recursion: with . Under strict or uniform convexity, weakly or strongly, respectively.
3.2 Rate of Convergence
With a finite curvature constant and exact line search, the rate is:
- For Lipschitz: ,
- For -Hölder: , matching, respectively, the prototypical sublinear and Hölder-smooth rates.
This is shown by considering that at each step,
where is , and optimizing with respect to yields the recursion
Similar rates are maintained under open-loop step size schedules satisfying the standard summability/divergence conditions.
4. Generalization to Composite Objectives
For composite objectives , the generalized FW method replaces the LMO by minimizing the sum of a linearization of and ; the update is
with step size and update analogous to the smooth case.
The composite FW inherits both the qualitative convergence and the quantitative rate properties from the smooth FW analysis, as the only nonlinearity handled by Taylor's expansion appears in (Xu, 2017).
5. Practical and Theoretical Implications
The Banach-space FW framework established in (Xu, 2017) demonstrates:
- Universality: FW applies in any real Banach space for closed, bounded, convex feasible sets, without requiring projections.
- Weak/Strong Convergence: With strict (or uniform) convexity of , weak (or strong) convergence of iterates is achieved.
- Flexible Regularity: Rate bounds interpolate smoothly from O(1/k) (Lipschitz ) to O(1/k\nu) (Hölder ).
- Composite Structure: The linearization principle seamlessly transfers to composite optimization settings.
No boundedness or explicit smoothness of is required beyond what is used in the curvature definition and proof arguments. This framework encompasses and extends the guarantees typically shown for FW in finite dimensions, providing theoretical clarity and practical guidance for applications in high- or infinite-dimensional structured settings (Xu, 2017).
6. Summary Table
| Regularity type | Curvature exponent () | Convergence rate |
|---|---|---|
| Lipschitz | $2$ | |
| Hölder- |
Under any of these regimes, the composite FW method achieves the same rates for as for the smooth case.
7. References
Primary reference for Banach-space analysis and extended curvature is "Convergence Analysis of the Frank-Wolfe Algorithm and Its Generalization in Banach Spaces" (Xu, 2017), including pointers to foundational work by Polyak, Jaggi, Nesterov, and others.