Kurdyka–Łojasiewicz Property in Optimization
- Kurdyka–Łojasiewicz Property is a geometric-analytic condition that connects function value gaps with subdifferential sizes via a desingularizing function.
- It underpins convergence analysis by linking specific KŁ exponents to finite termination, linear, or sublinear convergence rates in optimization algorithms.
- The property applies to diverse function classes (e.g., real-analytic, semialgebraic) and supports modern methods in machine learning, low-rank inference, and variational analysis.
The Kurdyka–Łojasiewicz Property
The Kurdyka–Łojasiewicz (KŁ) property is a geometric-analytic regularity condition satisfied by wide classes of nonsmooth and nonconvex functions. It provides the foundational framework for analyzing the convergence and rate theory of optimization algorithms, especially in the absence of convexity or differentiability. The property relates the function value gap to the size of the (limiting) subdifferential, via a so-called desingularizing function. The KŁ property specializes, in power-law form, to the Polyak–Łojasiewicz–Kurdyka (PLK) inequality, and its exponent—termed the KŁ exponent—directly determines the qualitative and quantitative convergence behavior of descent schemes. The current theoretical framework encompasses smooth, nonsmooth, and even infinite-dimensional or nonisolated-minimum problems, and forms the analytic backbone for recent convergence results in machine learning, low-rank matrix inference, composite and variational optimization.
1. Formal Definition and Classical Exponent Version
Given a proper, lower semicontinuous function and a point with , is said to have the Kurdyka–Łojasiewicz (KŁ) property at if there exist:
- a neighborhood of ,
- a constant ,
- a concave, “desingularizing” function , with 0, 1,
such that for any 2 with 3, the following holds: 4 where 5 denotes the limiting (Mordukhovich) subdifferential at 6 (Bento et al., 2024Jia et al., 2023).
A central case is the power desingularizer 7 with 8, 9, yielding
0
The parameter 1 is called the KŁ (or PLK) exponent at 2. The property, and in particular the form above, unifies error bounds, the Polyak-Łojasiewicz condition, and the classical Łojasiewicz gradient inequality (Chill et al., 2016Josz et al., 26 Feb 2026).
2. Consequences for Descent Methods: Rates, Finite Termination, and Algorithm Theory
The value of the exponent 3 is the principal determinant of the qualitative convergence of a broad class of descent frameworks, where iterates 4 obey sufficient decrease and subgradient-boundedness:
- (H1) Sufficient decrease: 5.
- (H3) Subgradient bound: 6 for some 7.
For the PLK8 inequality, one establishes the following convergence regimes (Bento et al., 2024Qian et al., 2022Ahookhosh et al., 13 Nov 2025):
- 9: finite termination—algorithms stop in finitely many steps.
- 0: linear convergence.
- 1: sublinear rate, specifically 2.
The finite-termination property for 3 is particularly notable; it is not present for 4. In generic frameworks, global convergence and precise local rates are guaranteed whenever a desingularizer of the given form is available. For smooth 5, or for structured nonconvex problems (e.g., difference-of-convex—DC—programming), optimal rates can be similarly established, including superlinear regimes for higher-order methods (Qian et al., 2022).
3. Typical Exponents and Function Classes
The KŁ property is satisfied by a remarkably broad class of functions:
- Real-analytic, semialgebraic, and globally subanalytic functions always admit the property with some exponent 6; for analytic functions this is classical Łojasiewicz (Chill et al., 2016).
- Convex, piecewise linear, and regularized quadratic models typically have 7.
- For polynomial optimization and the largest-eigenvalue function of polynomial matrix mappings, explicit exponents can be computed in terms of degree and dimension (Dinh et al., 2015Osińska-Ulrych et al., 2018).
- In matrix factorization, deep linear networks, and low-rank sensing, precise exponents can be deduced via composition and symmetry calculus rules (Josz et al., 26 Feb 2026).
The minimal value of 8 (“KŁ sharpness”) is critically important since it controls the presence or absence of finite-time convergence and influences the attainable rates for descent algorithms.
4. Advanced Calculus of KŁ Exponents and Desingularization Moduli
The class of admissible desingularizing functions is not limited to power laws; exact moduli may be nondifferentiable, piecewise smooth, or modeled by integral constructions. Recent work has established a powerful calculus for constructing the desingularizer under composition, summation, minimization, and separable addition, bypassing classical limitations of the exponent-based approach (Wang et al., 2021Wang et al., 2020):
- Generalized (possibly nondifferentiable) concave desingularizers permit sharper rate analysis for composite and structured functions.
- The exact modulus of 9 at 0—the smallest possible concave desingularizer—may be explicitly constructed as
1
and yields the tightest bound on algorithmic trajectory lengths and convergence rates (Wang et al., 2020).
- This apparatus allows for the extension of the KŁ theory to broader models, including piecewise polynomial, log-barrier, exponential-type losses, or zero “norms” in sparse recovery.
5. Structural Implications and Limitations
Not all functions 2 can satisfy a PLK3 inequality with 4 at local minimizers. Specifically, when 5 is a DC decomposition 6, with 7 smooth and 8 convex, the existence of a Lipschitz continuous gradient for 9 near a minimizer prohibits the lower-exponent regime: PLK0 cannot hold with 1 for such models (Bento et al., 2024). When only gradient continuity (not Lipschitz) is assumed, this obstruction vanishes, and lower-exponent properties can be established (e.g., 2 admits 3 at 4).
In invariant and nonisolated minimization landscapes, the KŁ exponent transfers via composition (e.g., submanifold parameterizations) and symmetry group actions: the local exponent on a normal slice extends to the ambient function, facilitating rate analysis in matrix factorization, neural network training, and low-rank signal recovery (Josz et al., 26 Feb 2026).
6. Infinite-Dimensional, Variational, and Topological Perspectives
In variational and infinite-dimensional Hilbert settings, the KŁ property (specifically, the Łojasiewicz–Simon inequality) extends to nonsmooth energy functionals—e.g., in PDEs, calculus of variations, and mean-field models. It ensures stabilization of all subgradient flows toward equilibrium, with explicit decay/damping rates governed by the exponent (Chill et al., 2016). Moreover, the topological structure of the zero locus of a KŁ function is highly regular: the set always admits a mapping cylinder neighborhood, precluding topological pathologies ("wild" embeddings), and ensuring well-posedness of gradient trajectories (Cibotaru et al., 2021).
7. Applications in Algorithmic Rate Theory and Error Bounds
A KŁ-type inequality functions as a master regularity condition, subsuming diverse classical assumptions:
- In Tikhonov regularization, the KŁ property is equivalent to standard source/variational conditions and yields direct derivations of convergence rates for Bregman and metric distances in both Banach and Hilbert settings (Gerth et al., 2019).
- For D-gap functions and error bounds in nonsmooth variational inequalities, verifying a KŁ property of exponent 5 yields global linear rate guarantees for first-order algorithms, even in the absence of monotonicity or smoothness (Li et al., 2022).
- Decentralized and nonmonotone algorithms, including boosted proximal point, GLL-type, and reweighted manifold methods, achieve full-sequence, often linear, convergence when the objective or a merit function is KL with exponent 6; if 7, finite termination is automatic (Wu et al., 24 Nov 2025Qian et al., 15 Apr 2025Yu et al., 10 Feb 2025).
- For high-order and boosted algorithms, the precise interaction between the KŁ exponent and the order of the update yields either superlinear or linear complexity; design of regularizations or reparameterizations is tightly linked to the underlying exponents (Qian et al., 2022Ahookhosh et al., 13 Nov 2025Ouyang, 11 Jun 2025Ouyang et al., 2024).
Table: Exponent/Regime Implications for Descent Methods
| KŁ Exponent 8 | Convergence Regime | Example Classes |
|---|---|---|
| 9 | Finite termination | Weak sharp minima, active set models |
| 0 | Finite-time convergence | Nonsmooth, nonconvex, non-Lipschitz |
| 1 | Linear rate | Real-analytic, semialgebraic, convex |
| 2 | Sublinear: 3 | General nonconvex, composite |
Local rates are algorithm-independent within the class of descent methods respecting sufficient decrease and subdifferential control (Bento et al., 2024Yu et al., 10 Feb 2025).
References
- For the convergence theory under PLK/KŁ conditions with descent schemes: (Bento et al., 2024, Qian et al., 2022, Ahookhosh et al., 13 Nov 2025, Qian et al., 15 Apr 2025, Jia et al., 2023, Yu et al., 10 Feb 2025).
- For calculus and transfer rules, composition, and symmetry in exponents: (Josz et al., 26 Feb 2026, Wang et al., 2021, Wang et al., 2020).
- For infinite-dimensional and variational extensions: (Chill et al., 2016, Gerth et al., 2019, Cibotaru et al., 2021).
- For explicit exponents, error-bounds, and sharpness: (Dinh et al., 2015, Osińska-Ulrych et al., 2018, Li et al., 2023).
- For structural compatibility and limitations of PLK exponents: (Bento et al., 2024, Ouyang, 11 Jun 2025).
- For manifold, decentralized, and high-order algorithmic settings: (Yu et al., 10 Feb 2025, Wu et al., 24 Nov 2025, Ouyang et al., 2024).
The Kurdyka–Łojasiewicz property operates as a unifying analytic and geometric principle in modern nonconvex optimization theory, enabling precise control over algorithmic convergence, stability of gradient flows, and the regularity of solution sets in a unified framework.