Papers
Topics
Authors
Recent
Search
2000 character limit reached

Moreau Envelope for Weakly Convex Functions

Updated 19 September 2025
  • The Moreau envelope is defined via infimal convolution, transforming a weakly convex function into a smooth, differentiable surrogate by using a quadratic regularizer.
  • Its gradient, expressed through the proximal mapping, ensures monotonicity and preserves critical points, thereby aiding convergence in various nonconvex optimization algorithms.
  • The concept extends to non-Euclidean settings and supports a range of applications including bilevel programming, stochastic optimization, and advanced variational methods.

The Moreau envelope is a fundamental tool for smoothing, regularization, and convergence analysis of weakly convex functions, providing a bridge between nonsmooth/nonconvex and smooth optimization paradigms. Weakly convex functions—those for which g+(ρ/2)2g + (\rho/2)\|\cdot\|^2 is convex for some ρ0\rho \geq 0—appear broadly in structured nonconvex, nonsmooth optimization, including statistical learning, signal processing, bilevel programming, and variational analysis. The core utility of the Moreau envelope lies in its capacity to induce differentiable surrogates with tractable gradients and to facilitate algorithmic methods that rigorously connect iterates of smoothed problems back to solutions (or stationary points) of the original, possibly weakly convex or highly nonconvex, objective.

1. Definition and Regularization Mechanism

For a proper, lower semicontinuous function g:RdR{+}g : \mathbb{R}^d \to \mathbb{R}\cup\{+\infty\} and for γ>0\gamma > 0, the Moreau envelope is defined as

gγ(x)=infyRd{g(y)+12γxy2}.g^\gamma(x) = \inf_{y \in \mathbb{R}^d} \left\{ g(y) + \frac{1}{2\gamma} \|x-y\|^2 \right\}.

The associated proximal mapping is

Proxγg(x)=argminyRd{g(y)+12γxy2}.\operatorname{Prox}_{\gamma g}(x) = \arg\min_{y \in \mathbb{R}^d} \left\{ g(y) + \frac{1}{2\gamma} \|x-y\|^2 \right\}.

For gg being ρ\rho–weakly convex and γ<1/ρ\gamma < 1/\rho, the Moreau envelope gγg^\gamma is well-defined and differentiable. This regularization smooths ρ0\rho \geq 00 by “infimal convolution” with the quadratic ρ0\rho \geq 01, and is especially powerful because the gradient of ρ0\rho \geq 02 is given by

ρ0\rho \geq 03

This differentiability holds even if ρ0\rho \geq 04 is merely weakly convex and nonsmooth (as opposed to convex) (Renaud et al., 17 Sep 2025). The regularization mechanism thereby transforms a potentially intractable nonsmooth problem into a smooth one whose critical points can be interpreted in terms of the original weakly convex function.

2. Fundamental Properties of the Moreau Envelope on Weakly Convex Functions

Multiple key properties extend classical Moreau theory for convex functions to the weakly convex setting:

  • Monotonicity and Pointwise Convergence: For fixed ρ0\rho \geq 05, ρ0\rho \geq 06 is nonincreasing as ρ0\rho \geq 07 and ρ0\rho \geq 08. This provides a path to approximate ρ0\rho \geq 09 using increasingly less regularized surrogates (Renaud et al., 17 Sep 2025).
  • Gradient and Smoothness: For g:RdR{+}g : \mathbb{R}^d \to \mathbb{R}\cup\{+\infty\}0, g:RdR{+}g : \mathbb{R}^d \to \mathbb{R}\cup\{+\infty\}1 is g:RdR{+}g : \mathbb{R}^d \to \mathbb{R}\cup\{+\infty\}2; its gradient formula directly exposes the proximal mapping, and under additional regularity (e.g., local Lipschitz continuity or twice differentiability on the image of the prox), g:RdR{+}g : \mathbb{R}^d \to \mathbb{R}\cup\{+\infty\}3 is Lipschitz with constant at most g:RdR{+}g : \mathbb{R}^d \to \mathbb{R}\cup\{+\infty\}4 when g:RdR{+}g : \mathbb{R}^d \to \mathbb{R}\cup\{+\infty\}5 (Renaud et al., 17 Sep 2025).
  • Preservation of Minima and Critical Points: If g:RdR{+}g : \mathbb{R}^d \to \mathbb{R}\cup\{+\infty\}6, then g:RdR{+}g : \mathbb{R}^d \to \mathbb{R}\cup\{+\infty\}7, and more generally, g:RdR{+}g : \mathbb{R}^d \to \mathbb{R}\cup\{+\infty\}8 is a stationary point of g:RdR{+}g : \mathbb{R}^d \to \mathbb{R}\cup\{+\infty\}9 if and only if γ>0\gamma > 00 (i.e., γ>0\gamma > 01). Multiple equivalent characterizations are established: vanishing envelope gradient, being a proximal fixed point, and preservation of function values at minimizers.
  • Modification of Convexity Constants: If γ>0\gamma > 02 is γ>0\gamma > 03–weakly convex, then γ>0\gamma > 04 is γ>0\gamma > 05–weakly convex, and similarly for strong convexity constants.
  • Duality and Convex Conjugacy: There is a direct relation between the Moreau envelope, the proximal mapping, and the convex conjugate. For convex γ>0\gamma > 06, if γ>0\gamma > 07 and γ>0\gamma > 08, then γ>0\gamma > 09 and gγ(x)=infyRd{g(y)+12γxy2}.g^\gamma(x) = \inf_{y \in \mathbb{R}^d} \left\{ g(y) + \frac{1}{2\gamma} \|x-y\|^2 \right\}.0 (Renaud et al., 17 Sep 2025).
  • Regularization and Hamilton–Jacobi Equation: The envelope solves the Hamilton–Jacobi PDE: gγ(x)=infyRd{g(y)+12γxy2}.g^\gamma(x) = \inf_{y \in \mathbb{R}^d} \left\{ g(y) + \frac{1}{2\gamma} \|x-y\|^2 \right\}.1 providing a fundamental variational structure to the regularization scheme.

3. Algorithmic Implications and Stationarity Measures

The differentiability, tractable gradient, and smoothing properties of the Moreau envelope play a pivotal role in modern first-order methods for weakly convex optimization:

  • Stationarity Measure: The envelope’s gradient provides a continuous, computable measure of near-stationarity even when the original function is nonsmooth. For weakly convex gγ(x)=infyRd{g(y)+12γxy2}.g^\gamma(x) = \inf_{y \in \mathbb{R}^d} \left\{ g(y) + \frac{1}{2\gamma} \|x-y\|^2 \right\}.2, gγ(x)=infyRd{g(y)+12γxy2}.g^\gamma(x) = \inf_{y \in \mathbb{R}^d} \left\{ g(y) + \frac{1}{2\gamma} \|x-y\|^2 \right\}.3 quantifies the proximity of gγ(x)=infyRd{g(y)+12γxy2}.g^\gamma(x) = \inf_{y \in \mathbb{R}^d} \left\{ g(y) + \frac{1}{2\gamma} \|x-y\|^2 \right\}.4 to gγ(x)=infyRd{g(y)+12γxy2}.g^\gamma(x) = \inf_{y \in \mathbb{R}^d} \left\{ g(y) + \frac{1}{2\gamma} \|x-y\|^2 \right\}.5–stationary points of the original objective (Davis et al., 2018, Davis et al., 2018).
  • Convergence Guarantees: Many stochastic and deterministic algorithms (proximal point, proximal gradient, forward–backward splitting, ADMM variants) design their update rules around the proximal mapping and analyze convergence in terms of the Moreau envelope’s gradients. Typical results show an gγ(x)=infyRd{g(y)+12γxy2}.g^\gamma(x) = \inf_{y \in \mathbb{R}^d} \left\{ g(y) + \frac{1}{2\gamma} \|x-y\|^2 \right\}.6 decay rate for gγ(x)=infyRd{g(y)+12γxy2}.g^\gamma(x) = \inf_{y \in \mathbb{R}^d} \left\{ g(y) + \frac{1}{2\gamma} \|x-y\|^2 \right\}.7 for stochastic subgradient or model-based minimization methods applied to weakly convex problems, with the Moreau envelope smoothing ensuring analytical tractability and robust performance guarantees (Davis et al., 2018, Davis et al., 2018).
  • Complexity Interpolation: The use of Moreau smoothing allows complexity rates that interpolate between gγ(x)=infyRd{g(y)+12γxy2}.g^\gamma(x) = \inf_{y \in \mathbb{R}^d} \left\{ g(y) + \frac{1}{2\gamma} \|x-y\|^2 \right\}.8 for smooth nonconvex problems and gγ(x)=infyRd{g(y)+12γxy2}.g^\gamma(x) = \inf_{y \in \mathbb{R}^d} \left\{ g(y) + \frac{1}{2\gamma} \|x-y\|^2 \right\}.9 for subgradient methods—e.g., variable smoothing strategies for structured composite problems achieve Proxγg(x)=argminyRd{g(y)+12γxy2}.\operatorname{Prox}_{\gamma g}(x) = \arg\min_{y \in \mathbb{R}^d} \left\{ g(y) + \frac{1}{2\gamma} \|x-y\|^2 \right\}.0 (Böhm et al., 2020).
  • Extensions to Inexact Algorithms: The Moreau envelope supports inexact analysis where proximal mappings are computed with controlled error, and the KL property of the envelope enables global convergence and explicit rates even for approximate iterates (Khanh et al., 2023).

4. Structural Results in Geometric and Non-Euclidean Contexts

The Moreau envelope’s role transcends Euclidean spaces, generalizing to geodesic spaces such as Hadamard spaces and CAT(0) spaces:

  • Hadamard Spaces: The Moreau envelope in Hadamard spaces is defined analogously, using the geodesic metric Proxγg(x)=argminyRd{g(y)+12γxy2}.\operatorname{Prox}_{\gamma g}(x) = \arg\min_{y \in \mathbb{R}^d} \left\{ g(y) + \frac{1}{2\gamma} \|x-y\|^2 \right\}.1 instead of the norm. Fundamental equivalence results are established: pointwise convergence of Moreau envelopes of convex lower semicontinuous functions is equivalent to Mosco convergence (defined via the asymptotic center for weak convergence), even in nonlinear geodesic settings (Bačák et al., 2016).
  • Connections to Set Convergence: For indicator functions of convex sets, the envelope reduces to the squared distance function. Hence, pointwise convergence of Moreau envelopes is equivalent to convergence in the Frolík–Wijsman topology for convex sets (Bačák et al., 2016).
  • Topological and Metric Structures: In separable Hadamard spaces, the space of proper convex lower semicontinuous functions is metrizable using a complete metric for which convergence is equivalent to Mosco convergence (Bačák et al., 2016). This supports rigorous variational analysis and convergence theorems in non-Euclidean contexts.

5. Extensions, Applications, and Examples

The Moreau envelope’s utility encompasses a variety of theoretical and practical optimization contexts:

  • Supremum and Composition Structures: The envelope commutes with taking suprema over uniformly weakly convex families, facilitating smoothing and explicit computations for nonconvex functions defined via maximum operations (López-Rivera et al., 1 Feb 2025).
  • Regularization in Bilevel Programming: Weakly convex value functions arising as lower-level objectives in bilevel programming can be regularized by the Moreau envelope, transforming highly nonsmooth constraints into tractable, weakly convex surrogates and enabling difference-of--weakly convex methods (Gao et al., 2023).
  • Splitting and Variational Methods: The Moreau envelope is fundamental in designing merit functions such as the forward–backward envelope (FBE) and the Douglas–Rachford envelope (DRE), used in splitting methods for weakly convex composite optimization, with strong descent properties and epi-convergence guarantees (Atenas, 2023).
  • First-Order Meta-Learning: Meta-learning formulations using sums of Moreau envelopes yield objectives that are differentiable/smooth (even if task losses are only weakly convex), enabling analysis and improvement of algorithms such as FO-MAML without reliance on Hessian information (Mishchenko et al., 2023).
  • Stochastic and Distributed Optimization: Smoothed surrogates produced by Moreau envelopes facilitate distributed algorithms (e.g., MADM) and single-loop stochastic algorithms for min-max and difference-of-weakly-convex models by enabling efficiently computable gradients and robust convergence (Mirzaeifard et al., 2023, Hu et al., 2024).
  • Algorithmic Acceleration: Recent research extends the classical smoothing regularization to high-order Moreau envelopes and boosted proximal-point frameworks, which under the Kurdyka-Łojasiewicz property yield improved local convergence rates, including linear convergence under adaptive regularization order selection (Kabgani et al., 2024).

6. Analytical Proofs and Duality

Key proofs underpinning the Moreau envelope’s properties for weakly convex functions leverage classical convex analysis:

  • Monotonicity and Pointwise Limit: By constructing technical inequalities linking values of Proxγg(x)=argminyRd{g(y)+12γxy2}.\operatorname{Prox}_{\gamma g}(x) = \arg\min_{y \in \mathbb{R}^d} \left\{ g(y) + \frac{1}{2\gamma} \|x-y\|^2 \right\}.2 at proximal points for different smoothing parameters, strict monotonicity and convergence as Proxγg(x)=argminyRd{g(y)+12γxy2}.\operatorname{Prox}_{\gamma g}(x) = \arg\min_{y \in \mathbb{R}^d} \left\{ g(y) + \frac{1}{2\gamma} \|x-y\|^2 \right\}.3 are established via sandwich arguments and lower semicontinuity (Renaud et al., 17 Sep 2025).
  • Gradient Formula and Stationarity: Construction of quadratic approximations to Proxγg(x)=argminyRd{g(y)+12γxy2}.\operatorname{Prox}_{\gamma g}(x) = \arg\min_{y \in \mathbb{R}^d} \left\{ g(y) + \frac{1}{2\gamma} \|x-y\|^2 \right\}.4 and application of optimality conditions for the proximal mapping yield explicit gradient expressions, demonstrating smoothness even in nonconvex settings.
  • Preservation of Minima:

Proxγg(x)=argminyRd{g(y)+12γxy2}.\operatorname{Prox}_{\gamma g}(x) = \arg\min_{y \in \mathbb{R}^d} \left\{ g(y) + \frac{1}{2\gamma} \|x-y\|^2 \right\}.5

and critical points coincide as Proxγg(x)=argminyRd{g(y)+12γxy2}.\operatorname{Prox}_{\gamma g}(x) = \arg\min_{y \in \mathbb{R}^d} \left\{ g(y) + \frac{1}{2\gamma} \|x-y\|^2 \right\}.6. The dual characterization via convex conjugates provides further analytical structure.

  • Degradation of (Weak) Convexity: Reparameterization allows the convex-concave decomposition of Proxγg(x)=argminyRd{g(y)+12γxy2}.\operatorname{Prox}_{\gamma g}(x) = \arg\min_{y \in \mathbb{R}^d} \left\{ g(y) + \frac{1}{2\gamma} \|x-y\|^2 \right\}.7 to be transported through the envelope, demonstrating explicit changes to weak or strong convexity moduli.

7. Impact and Broader Theoretical Significance

The theoretical development of the Moreau envelope for weakly convex functions thus underpins modern first-order algorithms in nonconvex, nonsmooth, and distributed optimization, and forms the basis of rigorous complexity and convergence guarantees (e.g., Proxγg(x)=argminyRd{g(y)+12γxy2}.\operatorname{Prox}_{\gamma g}(x) = \arg\min_{y \in \mathbb{R}^d} \left\{ g(y) + \frac{1}{2\gamma} \|x-y\|^2 \right\}.8 or Proxγg(x)=argminyRd{g(y)+12γxy2}.\operatorname{Prox}_{\gamma g}(x) = \arg\min_{y \in \mathbb{R}^d} \left\{ g(y) + \frac{1}{2\gamma} \|x-y\|^2 \right\}.9 rates for stationarity). Its ability to preserve minimizers and critical points, together with explicit and tractable gradient formulas, makes it an essential analytic and algorithmic construct for bridging classical convex regularization with the growing domain of weakly convex models in data science, signal processing, and machine learning. Results extending these properties to geodesic and metric spaces, and to more general envelope or smoothing constructs (e.g., via Bregman-Moreau or polar convolutions), highlight the generality and robustness of the Moreau envelope as a tool for variational regularization and nonsmooth analysis (Renaud et al., 17 Sep 2025, Bačák et al., 2016, Bauschke et al., 2017, Friedlander et al., 2018).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Moreau Envelope of Weakly Convex Functions.