Measured Greedy Frank-Wolfe Algorithm
- The paper introduces the measured greedy Frank-Wolfe algorithm that replaces projection steps with iterative linear minimization, ensuring efficient approximation and feasibility.
- It employs a greedy update method to control the sparsity or low-rank structure of the solution while guaranteeing an O(1/ε) convergence rate via a duality gap certificate.
- The approach optimizes computational efficiency in high-dimensional convex problems by avoiding costly projections and enabling scalable applications in sparse and low-rank recovery.
The measured greedy Frank-Wolfe algorithm is a first-order projection-free scheme for convex optimization, generalizing the original Frank–Wolfe (conditional gradient) method to achieve efficient approximation with optimal sparsity or low-rank guarantees across a wide class of domains. It eliminates explicit projection steps by iteratively solving a linear subproblem defined by the objective’s gradient or subgradient, yielding iterates that remain feasible while providing a means to directly control the sparsity or rank of the solution.
1. Iterative Linear Minimization and Greedy Update
At each iteration, the algorithm replaces the typical gradient-projection step with the solution to a linearized subproblem. For current iterate and subgradient , the next direction is obtained by:
This yields an “atom” within the compact convex domain . The next iterate is then formed as a convex combination:
where the step-size schedule (e.g., ) ensures feasibility and optimal convergence properties. The greedy selection of realizes the duality gap at each step and provides a uniform improvement over the linear approximation without explicit projection. This construction immediately generalizes classical sparse greedy algorithms and low-rank approaches to arbitrary convex domains (Jaggi, 2011).
2. Convergence Analysis and Duality Gap Certificates
A central theoretical guarantee of the measured greedy Frank-Wolfe algorithm is the reduction of the duality gap below any prescribed in iterations. If denotes the duality gap at , defined by:
and is the curvature constant quantifying the deviation of from its local linear approximation, the standard decrease per iteration is bounded by:
By induction and using weak duality ( for the optimizer ), one obtains:
Therefore, to achieve , at most iterations are required. The duality gap is efficiently computable at each iterate, making it an effective stopping criterion and providing a primal-dual certificate for approximation quality.
3. Sparsity, Low-Rank Structure, and Optimality
A defining property is the explicit control over sparsity (or rank) of approximate solutions. In -regularized convex problems or optimization over the simplex, the number of nonzero components in is at most ; hence, achieving an -approximate solution with sparsity . The paper proves matching upper and lower bounds of for the sparsity, confirming optimality; every -approximate solution may require at least this many nonzero terms.
For semidefinite optimization over the spectahedron (PSD matrices with trace bound), each iteration adds a rank-1 term, so the rank of is bounded by . Analogously, for -approximate solutions, one obtains rank . The lower bound holds for all such low-rank approximation problems, even when specialized to diagonally dominant symmetric matrices or other trace-norm constraints, showing that the measured greedy approach achieves the minimal possible structure for a given accuracy.
4. Applications Across Convex Domains
The algorithm’s framework supports a variety of important applications, exploiting the linear minimization structure for computational efficiency:
- -Regularized Problems and the Simplex: Optimization for SVMs, Lasso, logistic regression, and boosting, with the linear oracle mapping to coordinate selection (max-min of the gradient component), yielding efficient implementation even in large-scale settings.
- Nuclear Norm Regularization: Matrix completion, robust PCA, and low-rank recovery, leveraging the equivalence with semidefinite optimization over bounded-trace matrices. Each iteration adds a rank-1 matrix (outer product of a computed eigenvector), supporting scalable solutions in applications like the Netflix dataset.
- Max-Norm and Diagonally Dominant Matrices: Convex problems with max-norm constraints are addressed by optimization over specially structured matrix domains, for which the method offers the first convergence guarantees described in the literature.
- Optimization over the Lovász Extension of Submodular Functions: The greedy oracle step can be solved efficiently, extending the algorithm to combinatorial and submodular objectives.
In all these cases, the per-iteration complexity is low, often scaling linearly in the number of nonzeros or the raw input dimension, and the method’s primal-dual mechanism seamlessly integrates approximation guarantees and natural halting criteria.
5. Comparison to Projection-Based and Proximal Methods
Compared to projected gradient or mirror descent, the measured greedy Frank-Wolfe algorithm avoids costly projections entirely—especially critical in settings where projection involves expensive matrix decompositions (e.g., eigenvalue or SVD computations). Proximal gradient or SVT methods for nuclear norm objectives, in contrast, require full decompositions, whereas the greedy method relies solely on dominant eigenvectors or coordinate-wise steps.
Though the convergence rate of iterations is slower than certain accelerated schemes for smooth convex objectives (e.g., ), the low per-iteration computational cost and scalability compensate in practice, particularly for highly structured or large-scale problems.
Crucially, the method’s primal-dual nature means that the duality gap is always available for progress tracking and solution certification, which is not generally true for projection-based or purely primal schemes.
6. Summary and Significance
The measured greedy Frank-Wolfe algorithm, as presented in "Convex Optimization without Projection Steps" (Jaggi, 2011), extends the reach of the Frank-Wolfe family to arbitrary convex domains by exploiting iterative greedy updates that maintain feasibility while tightly controlling solution structure. It achieves convergence, guarantees optimal sparsity/low-rank approximations (with matching lower bounds), and efficiently scales to high dimensions and large datasets. Its design unifies theory and practical implementation by providing low-cost primal-dual updates and broad applicability to sparse, low-rank, and combinatorial convex optimization—a key advance for large-scale statistical, signal processing, and machine learning problems.