Moreau Envelope for Weakly Convex Functions

Updated 19 September 2025

The Moreau envelope is defined via infimal convolution, transforming a weakly convex function into a smooth, differentiable surrogate by using a quadratic regularizer.
Its gradient, expressed through the proximal mapping, ensures monotonicity and preserves critical points, thereby aiding convergence in various nonconvex optimization algorithms.
The concept extends to non-Euclidean settings and supports a range of applications including bilevel programming, stochastic optimization, and advanced variational methods.

The Moreau envelope is a fundamental tool for smoothing, regularization, and convergence analysis of weakly convex functions, providing a bridge between nonsmooth/nonconvex and smooth optimization paradigms. Weakly convex functions—those for which $g + (\rho/2)\|\cdot\|^2$ is convex for some $\rho \geq 0$ —appear broadly in structured nonconvex, nonsmooth optimization, including statistical learning, signal processing, bilevel programming, and variational analysis. The core utility of the Moreau envelope lies in its capacity to induce differentiable surrogates with tractable gradients and to facilitate algorithmic methods that rigorously connect iterates of smoothed problems back to solutions (or stationary points) of the original, possibly weakly convex or highly nonconvex, objective.

1. Definition and Regularization Mechanism

For a proper, lower semicontinuous function $g : \mathbb{R}^d \to \mathbb{R}\cup\{+\infty\}$ and for $\gamma > 0$ , the Moreau envelope is defined as

$g^\gamma(x) = \inf_{y \in \mathbb{R}^d} \left\{ g(y) + \frac{1}{2\gamma} \|x-y\|^2 \right\}.$

The associated proximal mapping is

$\operatorname{Prox}_{\gamma g}(x) = \arg\min_{y \in \mathbb{R}^d} \left\{ g(y) + \frac{1}{2\gamma} \|x-y\|^2 \right\}.$

For $g$ being $\rho$ –weakly convex and $\gamma < 1/\rho$ , the Moreau envelope $g^\gamma$ is well-defined and differentiable. This regularization smooths $g$ by “infimal convolution” with the quadratic $\frac{1}{2\gamma}\|\cdot\|^2$ , and is especially powerful because the gradient of $g^\gamma$ is given by

$\nabla g^\gamma(x) = \frac{1}{\gamma} (x - \operatorname{Prox}_{\gamma g}(x)).$

This differentiability holds even if $g$ is merely weakly convex and nonsmooth (as opposed to convex) (Renaud et al., 17 Sep 2025). The regularization mechanism thereby transforms a potentially intractable nonsmooth problem into a smooth one whose critical points can be interpreted in terms of the original weakly convex function.

2. Fundamental Properties of the Moreau Envelope on Weakly Convex Functions

Multiple key properties extend classical Moreau theory for convex functions to the weakly convex setting:

Monotonicity and Pointwise Convergence: For fixed $x$ , $g^\gamma(x)$ is nonincreasing as $\gamma \to 0$ and $\lim_{\gamma\to 0} g^\gamma(x) = g(x)$ . This provides a path to approximate $g$ using increasingly less regularized surrogates (Renaud et al., 17 Sep 2025).
Gradient and Smoothness: For $\gamma < 1/\rho$ , $g^\gamma$ is $C^1$ ; its gradient formula directly exposes the proximal mapping, and under additional regularity (e.g., local Lipschitz continuity or twice differentiability on the image of the prox), $\nabla g^\gamma$ is Lipschitz with constant at most $1/\gamma$ when $\gamma\rho \leq 1/2$ (Renaud et al., 17 Sep 2025).
Preservation of Minima and Critical Points: If $x\in\arg\min g$ , then $x\in\arg\min g^\gamma$ , and more generally, $x$ is a stationary point of $g$ if and only if $\nabla g^\gamma(x)=0$ (i.e., $x=\operatorname{Prox}_{\gamma g}(x)$ ). Multiple equivalent characterizations are established: vanishing envelope gradient, being a proximal fixed point, and preservation of function values at minimizers.
Modification of Convexity Constants: If $g$ is $\rho$ –weakly convex, then $g^\gamma$ is $(\rho/(1-\gamma\rho))$ –weakly convex, and similarly for strong convexity constants.
Duality and Convex Conjugacy: There is a direct relation between the Moreau envelope, the proximal mapping, and the convex conjugate. For convex $f$ , if $z = x+y$ and $f(x) + f^*(y) = \langle x, y\rangle$ , then $x = \operatorname{Prox}_f(z)$ and $y = \operatorname{Prox}_{f^*}(z)$ (Renaud et al., 17 Sep 2025).
Regularization and Hamilton–Jacobi Equation: The envelope solves the Hamilton–Jacobi PDE: $\frac{\partial g^\gamma}{\partial\gamma}(x) + \frac{1}{2}\|\nabla g^\gamma(x)\|^2 = 0,$ providing a fundamental variational structure to the regularization scheme.

3. Algorithmic Implications and Stationarity Measures

The differentiability, tractable gradient, and smoothing properties of the Moreau envelope play a pivotal role in modern first-order methods for weakly convex optimization:

Stationarity Measure: The envelope’s gradient provides a continuous, computable measure of near-stationarity even when the original function is nonsmooth. For weakly convex $\varphi$ , $\|\nabla\varphi_\gamma(x)\|$ quantifies the proximity of $x$ to $\varepsilon$ –stationary points of the original objective (Davis et al., 2018, Davis et al., 2018).
Convergence Guarantees: Many stochastic and deterministic algorithms (proximal point, proximal gradient, forward–backward splitting, ADMM variants) design their update rules around the proximal mapping and analyze convergence in terms of the Moreau envelope’s gradients. Typical results show an $O(k^{-1/4})$ decay rate for $\|\nabla\varphi_\gamma(x_k)\|$ for stochastic subgradient or model-based minimization methods applied to weakly convex problems, with the Moreau envelope smoothing ensuring analytical tractability and robust performance guarantees (Davis et al., 2018, Davis et al., 2018).
Complexity Interpolation: The use of Moreau smoothing allows complexity rates that interpolate between $O(\epsilon^{-2})$ for smooth nonconvex problems and $O(\epsilon^{-4})$ for subgradient methods—e.g., variable smoothing strategies for structured composite problems achieve $O(\epsilon^{-3})$ (Böhm et al., 2020).
Extensions to Inexact Algorithms: The Moreau envelope supports inexact analysis where proximal mappings are computed with controlled error, and the KL property of the envelope enables global convergence and explicit rates even for approximate iterates (Khanh et al., 2023).

4. Structural Results in Geometric and Non-Euclidean Contexts

The Moreau envelope’s role transcends Euclidean spaces, generalizing to geodesic spaces such as Hadamard spaces and CAT(0) spaces:

Hadamard Spaces: The Moreau envelope in Hadamard spaces is defined analogously, using the geodesic metric $d(x, y)$ instead of the norm. Fundamental equivalence results are established: pointwise convergence of Moreau envelopes of convex lower semicontinuous functions is equivalent to Mosco convergence (defined via the asymptotic center for weak convergence), even in nonlinear geodesic settings (Bačák et al., 2016).
Connections to Set Convergence: For indicator functions of convex sets, the envelope reduces to the squared distance function. Hence, pointwise convergence of Moreau envelopes is equivalent to convergence in the Frolík–Wijsman topology for convex sets (Bačák et al., 2016).
Topological and Metric Structures: In separable Hadamard spaces, the space of proper convex lower semicontinuous functions is metrizable using a complete metric for which convergence is equivalent to Mosco convergence (Bačák et al., 2016). This supports rigorous variational analysis and convergence theorems in non-Euclidean contexts.

5. Extensions, Applications, and Examples

The Moreau envelope’s utility encompasses a variety of theoretical and practical optimization contexts:

Supremum and Composition Structures: The envelope commutes with taking suprema over uniformly weakly convex families, facilitating smoothing and explicit computations for nonconvex functions defined via maximum operations (López-Rivera et al., 1 Feb 2025).
Regularization in Bilevel Programming: Weakly convex value functions arising as lower-level objectives in bilevel programming can be regularized by the Moreau envelope, transforming highly nonsmooth constraints into tractable, weakly convex surrogates and enabling difference-of--weakly convex methods (Gao et al., 2023).
Splitting and Variational Methods: The Moreau envelope is fundamental in designing merit functions such as the forward–backward envelope (FBE) and the Douglas–Rachford envelope (DRE), used in splitting methods for weakly convex composite optimization, with strong descent properties and epi-convergence guarantees (Atenas, 2023).
First-Order Meta-Learning: Meta-learning formulations using sums of Moreau envelopes yield objectives that are differentiable/smooth (even if task losses are only weakly convex), enabling analysis and improvement of algorithms such as FO-MAML without reliance on Hessian information (Mishchenko et al., 2023).
Stochastic and Distributed Optimization: Smoothed surrogates produced by Moreau envelopes facilitate distributed algorithms (e.g., MADM) and single-loop stochastic algorithms for min-max and difference-of-weakly-convex models by enabling efficiently computable gradients and robust convergence (Mirzaeifard et al., 2023, Hu et al., 28 May 2024).
Algorithmic Acceleration: Recent research extends the classical smoothing regularization to high-order Moreau envelopes and boosted proximal-point frameworks, which under the Kurdyka-Łojasiewicz property yield improved local convergence rates, including linear convergence under adaptive regularization order selection (Kabgani et al., 25 Oct 2024).

6. Analytical Proofs and Duality

Key proofs underpinning the Moreau envelope’s properties for weakly convex functions leverage classical convex analysis:

Monotonicity and Pointwise Limit: By constructing technical inequalities linking values of $g$ at proximal points for different smoothing parameters, strict monotonicity and convergence as $\gamma \to 0$ are established via sandwich arguments and lower semicontinuity (Renaud et al., 17 Sep 2025).
Gradient Formula and Stationarity: Construction of quadratic approximations to $g^\gamma$ and application of optimality conditions for the proximal mapping yield explicit gradient expressions, demonstrating smoothness even in nonconvex settings.
Preservation of Minima:

$x \in \arg\min g \iff x \in \arg\min g^\gamma,$

and critical points coincide as $x = \operatorname{Prox}_{\gamma g}(x) \iff \nabla g^\gamma(x)=0$ . The dual characterization via convex conjugates provides further analytical structure.

Degradation of (Weak) Convexity: Reparameterization allows the convex-concave decomposition of $g$ to be transported through the envelope, demonstrating explicit changes to weak or strong convexity moduli.

7. Impact and Broader Theoretical Significance

The theoretical development of the Moreau envelope for weakly convex functions thus underpins modern first-order algorithms in nonconvex, nonsmooth, and distributed optimization, and forms the basis of rigorous complexity and convergence guarantees (e.g., $O(\epsilon^{-4})$ or $O(\epsilon^{-3})$ rates for stationarity). Its ability to preserve minimizers and critical points, together with explicit and tractable gradient formulas, makes it an essential analytic and algorithmic construct for bridging classical convex regularization with the growing domain of weakly convex models in data science, signal processing, and machine learning. Results extending these properties to geodesic and metric spaces, and to more general envelope or smoothing constructs (e.g., via Bregman-Moreau or polar convolutions), highlight the generality and robustness of the Moreau envelope as a tool for variational regularization and nonsmooth analysis (Renaud et al., 17 Sep 2025, Bačák et al., 2016, Bauschke et al., 2017, Friedlander et al., 2018).