Weakly Convex Functions: Theory & Applications
- Weakly convex functions are proper, lower-semicontinuous functions that become convex when a quadratic term is added, generalizing both convex and smooth functions.
- They exhibit stability under sum and composition, possess strong proximal properties, and underlie effective algorithms in nonsmooth, nonconvex optimization.
- Their applications span robust estimation, sparse regression, and machine learning, with convergence guarantees provided by proximal and stochastic methods.
A weakly convex function is a proper, lower-semicontinuous function that admits a bounded negative curvature in a precise sense: is called -weakly convex if there exists such that is convex. This class strictly generalizes convex functions (), subsumes smooth functions with Lipschitz gradient ( is the gradient Lipschitz constant), and is central in modern nonsmooth and nonconvex optimization theory.
1. Fundamental Definitions and Characterizations
Weak convexity can be formulated via several equivalent but operationally distinct conditions, all appearing widely in the literature:
- Quadratic perturbation: is -weakly convex if is convex.
- Subgradient lower bound: For all 0 and any 1,
2
- Secant inequality: For all 3, 4,
5
- Differentiable case: If 6 is 7, then weak convexity is equivalent to
8
- Second-order conditions: For 9 functions, 0 for all 1.
- Subdifferential hypomonotonicity: 2 for all 3 (Davis et al., 2018, Renaud et al., 17 Sep 2025).
These conditions admit natural generalization to Banach and Hilbert spaces and underpin the analytic and algorithmic properties of weakly convex functions.
2. Key Properties and Examples
Weakly convex functions preserve many of the powerful properties of convex analysis while accommodating mild nonconvexities:
- Stability under sums and composition: The sum of a 4- and a 5-weakly convex function is 6-weakly convex. Compositions 7, with 8 convex and Lipschitz, and 9 0 with Lipschitz Jacobian, are 1-weakly convex, where 2 and 3 are the respective Lipschitz constants (Davis et al., 2018, Ma et al., 2019).
- Closure under supremum: The supremum of uniformly 4-weakly convex functions is also 5-weakly convex (López-Rivera et al., 1 Feb 2025).
- Proximal properties: For any 6, 7 is strongly convex in 8, so the proximal operator is single-valued and Lipschitz (Renaud et al., 17 Sep 2025).
- Active examples:
- Robust loss functions (MCP, SCAD, Tukey, capped-ℓ1),
- Robust phase retrieval: 9,
- Conditional Value-at-Risk (CVaR),
- Nonconvex penalization in sparse logistic regression (Shen et al., 2017, López-Rivera et al., 1 Feb 2025).
3. Moreau Envelope and Proximal Calculus
A central tool for weakly convex analysis is the Moreau envelope: 0 with associated proximal map 1. For weakly convex 2:
- 3 is everywhere finite and 4; 5.
- 6 is 7-Lipschitz (Renaud et al., 17 Sep 2025).
- 8 pointwise as 9.
- The Moreau envelope preserves minimizers and critical points (Renaud et al., 17 Sep 2025).
- The gradient 0 serves as a natural stationarity measure for nonconvex, nonsmooth problems, and underpins optimality guarantees in algorithmic schemes (Davis et al., 2018, Davis et al., 2018).
For inexact proximal computations, detailed calculus using ε-subdifferentials is available, establishing rigorous inexact stationarity conditions and sum rules for composite functions (Bednarczuk et al., 2022).
4. Optimization Algorithms, Complexity, and Regularity
Optimization of weakly convex objectives leverages the structure through proximal, subgradient, and first-order splitting algorithms:
Algorithmic Frameworks
- Proximal Point and Proximal Gradient Methods: These methods operate directly or with inexact solves on the Moreau envelope and enjoy convergence guarantees when suitable regularity (e.g., Kurdyka–Łojasiewicz (KL) property) is present (Khanh et al., 2023, Liao et al., 2 Sep 2025).
- Stochastic Subgradient Methods: For composite problems 1, the stochastic subgradient or proximity-based variants yield convergence of the Moreau envelope gradient at rate 2, settling the rate for nonconvex, nonsmooth composite stochastic optimization (Davis et al., 2018, Davis et al., 2018).
- Variable Smoothing Schemes: By decreasing the smoothing parameter, algorithms interpolate between smooth and nonsmooth rates, obtaining dimension-independent complexity 3 for composite structured problems (Böhm et al., 2020, López-Rivera et al., 1 Feb 2025).
- Quadratically Regularized Subgradient for Constrained Optimization: By regularizing both objective and constraints, provable complexity guarantees are established for finding nearly stationary points under uniform Slater conditions (Ma et al., 2019).
- Primal-Dual and Forward–Backward Algorithms: When sharpness holds, linear convergence rates can be attained globally or locally for primal-dual and splitting schemes targeting weakly convex (and possibly nonconvex) objectives (Bednarczuk et al., 2023, Bednarczuk et al., 2024).
Regularity Conditions, Error Bounds, and Linear Convergence
Regularity conditions for weakly convex functions mirror, but generalize, those in the convex setting. On any sublevel set, there exists a chain of implications: 4 (Liao et al., 2023). Under quadratic growth, linear convergence of (inexact) proximal point and forward–backward algorithms is established.
5. Saddle Points, Sharpness, and Generic Avoidance
A notable structural property of weakly convex objectives is the landscape organization: generic weakly convex, o-minimal (definable) functions possess only local minimizers and "active strict saddles." Proximal-point, subgradient, and stochastic algorithms provably avoid strict saddles almost surely, converging instead to minimizers (Davis et al., 2019, Bianchi et al., 2021, Huang, 2021). The geometric mechanism is the instability of strict saddle fixed points under the proximal update and landscape sharpness away from active manifolds. Random perturbation methods accelerate escape from saddle traps even in nonsmooth cases (Huang, 2021).
Sharpness—a linear growth condition away from minimizers—enables local (and sometimes global) linear convergence for subgradient and forward–backward schemes on weakly convex objectives, conditional on initialization in a basin of attraction (1803.02461, Bednarczuk et al., 2023, Bednarczuk et al., 2024).
6. Second-order Calculus and Convexity Characterization
Recent work leverages generalized second-order subderivatives and coderivatives to precisely demarcate the convexity of weakly convex functions (Phat, 26 Mar 2026):
- Graphical derivatives of the subgradient mapping: Convexity is equivalent to the positive semi-definiteness of the graphical derivative in each direction.
- Second subderivatives: Convexity is equivalent to non-negativity of the second subderivative for all directions at each subgradient pair.
- Second-order subdifferential: Convexity holds if 5 for all 6 and all 7.
These characterizations unify various fragments of second-order analysis across generalized convexity and inform Newton-type methods.
7. Applications and Practical Implications
Weakly convex models are prevalent in contemporary statistical and machine learning models, including high-dimensional robust estimation, sparse regression, dictionary learning, phase retrieval, robust PCA, and distributionally robust optimization (Davis et al., 2018, Shen et al., 2017, López-Rivera et al., 1 Feb 2025).
- Sparsity-inducing regularization: Weakly convex penalties (like MCP, SCAD, firm-threshold) encode 8-like properties while maintaining algorithmic tractability and provable stationarity when used with proximal-gradient descent (Shen et al., 2017).
- Nonconvex regularization in deep learning: Nonsmooth yet weakly convex loss surfaces, as in ReLU networks and robust estimators, are amenable to first-order and splitting methods.
- Supremum of weakly convex functions: Moreau envelope calculus extends to pointwise maxima and supremum operations, allowing envelope and proximity operator computation for classes of DRO and min-max problems (López-Rivera et al., 1 Feb 2025).
Algorithmically, the Moreau envelope enables efficient smoothing, splitting, and stochastic optimization frameworks in large-scale settings without reliance on variance reduction, mini-batching, or strong convexity (Davis et al., 2018, Renaud et al., 17 Sep 2025).
References:
(Davis et al., 2018, Davis et al., 2018, Ma et al., 2019, Böhm et al., 2020, Renaud et al., 17 Sep 2025, Bednarczuk et al., 2022, Bednarczuk et al., 2023, Huang, 2021, Bednarczuk et al., 2024, Shen et al., 2017, Davis et al., 2019, Bianchi et al., 2021, López-Rivera et al., 1 Feb 2025, Liao et al., 2023, Phat, 26 Mar 2026, Liao et al., 2 Sep 2025, Khanh et al., 2023, 1803.02461).