Convergence Rate of Frank-Wolfe for Non-Convex Objectives (1607.00345v1)

Published 1 Jul 2016 in math.OC, cs.LG, cs.NA, and stat.ML

Abstract: We give a simple proof that the Frank-Wolfe algorithm obtains a stationary point at a rate of $O(1/\sqrt{t})$ on non-convex objectives with a Lipschitz continuous gradient. Our analysis is affine invariant and is the first, to the best of our knowledge, giving a similar rate to what was already proven for projected gradient methods (though on slightly different measures of stationarity).

Citations (183)

View on Semantic Scholar

Summary

Convergence Rate of Frank-Wolfe for Non-Convex Objectives

This paper presents a novel investigation into the convergence properties of the Frank-Wolfe (FW) algorithm when applied to optimization problems with non-convex objectives. Notably, the paper introduces a simple yet insightful proof demonstrating that the FW algorithm can achieve a stationary point at a rate of $O(1/\sqrt{t})$ on objectives featuring a Lipschitz continuous gradient. This analysis is particularly significant because it maintains affine invariance, aligning with the intrinsic characteristics of the FW algorithm, and parallels known results for projected gradient methods.

Key Contributions

The Frank-Wolfe algorithm, a first-order optimization method, is widely acknowledged for its utility in solving optimization problems over convex and compact domains. Traditionally applied to convex objectives, its extension to non-convex contexts necessitates a different analytical approach, primarily due to the lack of global optimality guarantees typically associated with convexity. The paper's highlight is its establishment of a convergence rate for the FW algorithm on non-convex functions, relying on a robust proof framework that leverages the Lipschitz continuity of the gradient and an "affine invariant" curvature constant.

Analysis and Results

The core analysis revolves around the introduction of the Frank-Wolfe gap, a measure to assess non-stationarity in the optimization process. This gap is defined as:

$g_t := \max_{s \in M} \langle s-x, -\nabla f(x^{(t)}) \rangle.$

The paper asserts that a stationary point is characterized by a Frank-Wolfe gap of zero, making it a critical criterion for evaluating the algorithm's progress. Through rigorous analysis, it is demonstrated that the minimal FW gap decreases proportionally to $O(1/\sqrt{t})$ , where $t$ represents the iteration count. This finding is encapsulated in Theorem~1:

$\min_{0 \leq k \leq t} g_k \leq \frac{\max\{2 h_0, C\}}{\sqrt{t+1}},$

where $h_0$ represents the initial global suboptimality. The proof, inspired by strategies from gradient descent methods, hinges on bounding the decrease in the objective function in relation to the FW gap across iterations.

Implications and Future Directions

From a practical standpoint, this convergence result equips the research community with a quantified expectation of the FW algorithm's performance in non-convex scenarios, broadening its applicability in machine learning and operations research fields where such constraints frequently occur. The theoretical insights also pave the way for future exploration into more sophisticated variants of the FW algorithm, potentially incorporating adaptive step sizes and curvature-informed adjustments to enhance efficiency further.

The paper's implications extend to algorithm design for non-convex optimization, hinting at the potential development of hybrid methods that could integrate the affine invariance feature of FW with other optimization frameworks. Furthermore, the exploration of lower bounds for the FW algorithm's performance in constrained settings remains an intriguing area of inquiry that could validate or challenge the established convergence rate.

In conclusion, this work provides an essential contribution to non-convex optimization literature, equipping researchers and practitioners with a deeper understanding of the Frank-Wolfe algorithm's capabilities and limitations, and encouraging further investigation into augmenting its efficacy across diverse applications.