Calculus of the exponent of Kurdyka-Łojasiewicz inequality and its applications to linear convergence of first-order methods (1602.02915v6)

Published 9 Feb 2016 in math.OC and stat.ML

Abstract: In this paper, we study the Kurdyka-{\L}ojasiewicz (KL) exponent, an important quantity for analyzing the convergence rate of first-order methods. Specifically, we develop various calculus rules to deduce the KL exponent of new (possibly nonconvex and nonsmooth) functions formed from functions with known KL exponents. In addition, we show that the well-studied Luo-Tseng error bound together with a mild assumption on the separation of stationary values implies that the KL exponent is $\frac12$. The Luo-Tseng error bound is known to hold for a large class of concrete structured optimization problems, and thus we deduce the KL exponent of a large class of functions whose exponents were previously unknown. Building upon this and the calculus rules, we are then able to show that for many convex or nonconvex optimization models for applications such as sparse recovery, their objective function's KL exponent is $\frac12$. This includes the least squares problem with smoothly clipped absolute deviation (SCAD) regularization or minimax concave penalty (MCP) regularization and the logistic regression problem with $\ell_1$ regularization. Since many existing local convergence rate analysis for first-order methods in the nonconvex scenario relies on the KL exponent, our results enable us to obtain explicit convergence rate for various first-order methods when they are applied to a large variety of practical optimization models. Finally, we further illustrate how our results can be applied to establishing local linear convergence of the proximal gradient algorithm and the inertial proximal algorithm with constant step-sizes for some specific models that arise in sparse recovery.

Citations (268)

View on Semantic Scholar

Summary

The paper introduces systematic calculus rules to compute KL exponents for composite and nonsmooth functions, establishing conditions for linear convergence.
It reveals a key connection between the Luo-Tseng error bound and the KL exponent, particularly identifying cases where the exponent equals ½.
By applying these results to sparse recovery and regularized least squares problems, the study guarantees linear convergence in first-order optimization methods.

Analysis of Calculus of the Exponent of Kurdyka-{\L}ojasiewicz Inequality and its Applications to Linear Convergence of First-Order Methods

The paper presents an in-depth exploration of the Kurdyka-{\L}ojasiewicz (KL) exponent, a critical factor in determining the convergence rates of first-order optimization methods. The authors develop comprehensive calculus rules to derive KL exponents for newly formulated functions in optimization problems, which may be nonconvex or nonsmooth, from functions with known KL exponents.

Key Contributions

Calculus Rules for KL Exponents: The paper provides systematic methods to calculate the KL exponent for composite functions. These include rules for:
- The minimum of functions.
- Block separable sums.
- Composite functions with smooth transformations.
- Moreau envelopes of convex functions.
- Lagrangian relaxations for convex programs.
- Partially smooth functions on manifolds.
Relationship with Luo-Tseng Error Bound: A novel connection is established between the Luo-Tseng error bound, known for ensuring linear convergence in various optimization models, and the KL exponent, pinpointing scenarios where the KL exponent is exactly $\frac{1}{2}$ .
Extensive Applications: Building upon these foundations, the paper demonstrates that a range of optimization models, especially those prevalent in sparse recovery and data analysis, have objective functions with a KL exponent of $\frac{1}{2}$ . Examples include least squares problems with smoothly clipped absolute deviation (SCAD) and minimax concave penalty (MCP) regularizations.
Implications for First-Order Methods: The explicit determination of the KL exponent allows the authors to assert linear convergence rates for a variety of first-order methods, using proximal gradient algorithms and the inertial proximal algorithm as examples. This has broad implications for optimization techniques applied within machine learning and statistics.

Strong Numerical Results and Claims

The paper makes robust claims regarding the KL exponent, such as proving that under certain structural conditions, the KL exponent is $\frac{1}{2}$ . This result provides a critical insight for practitioners deploying first-order methods, ensuring predictable and reliable convergence rates.

Theoretical and Practical Implications

Theoretically, this work advances the understanding of convergence properties in nonconvex optimization. Practically, the explicit KL exponent results allow for improved parameter tuning in first-order methods. Moreover, the calculus rules extend the set of optimization problems where linear convergence guarantees can be provided, impacting areas such as statistical learning and signal processing.

Future Developments

The research opens avenues for future work in expanding calculus rules for even broader classes of nonconvex optimization problems. Additionally, exploring the KL properties in dynamic settings, such as in online learning or adaptive systems, could further stimulate advancements. The interaction between the KL exponent and advanced regularizers beyond those considered could also be a fruitful avenue for exploration.

This paper is an essential read for researchers engaged in optimization and algorithmic convergence, offering both practical algorithms and deep theoretical insights into the behavior of nonconvex optimization landscapes.

PDF Markdown