Moreau Envelope in Optimization

Updated 9 November 2025

Moreau Envelope is a regularization technique that smooths nonsmooth objective functions by infimal convolution with a quadratic term, preserving key minimization structures.
It provides a differentiable, Lipschitz continuous approximation for both convex and weakly convex functions, ensuring gradient stability via the proximal mapping.
Its framework links proximal mapping, Fenchel duality, and variational convergence, enabling robust first- and second-order algorithm design in nonsmooth optimization.

The Moreau envelope is a fundamental regularization construct in variational analysis and optimization, which enables the smoothing of nonsmooth objective functions while preserving key minimization structures. Originally developed in the context of convex functionals, the Moreau envelope admits precise generalizations and retains substantial regularization power for the broader class of weakly convex objectives (functions whose deviation from convexity is controlled quadratically). Its links with the proximal mapping, infimal convolution, and Fenchel duality yield a toolkit central to contemporary nonsmooth optimization and variational convergence analysis. Below, the essential theory and properties of the Moreau envelope for both convex and weakly convex functions are outlined, following the framework in (Renaud et al., 17 Sep 2025).

1. Definition and Variational Representations

For a proper, lower semicontinuous function $f:\mathbb{R}^n\to(-\infty,+\infty]$ and regularization parameter $\lambda>0$ , the Moreau envelope $E_\lambda f$ and its associated proximal mapping are given by: $E_\lambda f(x) := \inf_{y\in\mathbb{R}^n} \left\{ f(y) + \frac{1}{2\lambda}\|y-x\|^2 \right\}, \quad \operatorname{prox}_{\lambda f}(x) := \arg\min_{y\in\mathbb{R}^n} \left\{ f(y) + \frac{1}{2\lambda}\|y-x\|^2 \right\}.$ This is equivalently formulated as the infimal convolution: $E_\lambda f = f \, \square \, \left( \frac{\|\cdot\|^2}{2\lambda} \right),$ where $(f \, \square \, g)(x) = \inf_y [ f(y) + g(x-y) ]$ .

For convex $f$ , $E_\lambda f$ admits a dual representation in terms of the Fenchel conjugate $f^*$ : $E_\lambda f(x) = \sup_{u\in\mathbb{R}^n} \left\{ \langle u,x \rangle - f^*(u) - \frac{\lambda}{2}\|u\|^2 \right\} = \left[ f^* + \frac{\lambda}{2}\|\cdot\|^2 \right]^*(x).$ This structure links the Moreau envelope directly to convex analysis, duality, and infimal convolution machinery.

2. Classical Properties: The Convex Case

Suppose $f$ is convex, proper, and lower semicontinuous, with $\lambda>0$ .

Convexity and Differentiability:

The function $E_\lambda f$ is convex as the infimum over convex mappings, and is everywhere finite and continuously differentiable on $\mathbb{R}^n$ .

Gradient Formula:

$\nabla E_\lambda f(x) = \frac{1}{\lambda} (x - \operatorname{prox}_{\lambda f}(x)).$

Lipschitz Gradient (Smoothness):

The proximal mapping $\operatorname{prox}_{\lambda f}$ is nonexpansive (1-Lipschitz), so $\nabla E_\lambda f$ is $(1/\lambda)$ -Lipschitz: $\|\nabla E_\lambda f(x) - \nabla E_\lambda f(y)\| \le \frac{1}{\lambda} \|x - y\|.$ Thus, $E_\lambda f$ is $(1/\lambda)$ -smooth.

Minimizer Preservation:

The Moreau envelope preserves both minima and minimizers: $\min f = \min E_\lambda f, \qquad \arg\min f = \arg\min E_\lambda f.$ If $x^*\in\arg\min f$ , then $E_\lambda f(x^*)=f(x^*)$ and $\nabla E_\lambda f(x^*)=0$ .

3. Weakly Convex Case: Extensions and Delicate Properties

Let $f$ be $\rho$ -weakly convex, i.e., $f + (\rho/2)\|\cdot\|^2$ is convex. For all $\lambda\rho < 1$ :

Well-Posedness, Continuity, Monotonicity:

For each $x$ , $\lambda \mapsto E_\lambda f(x)$ is nonincreasing on $(0,1/\rho)$ .
$\lim_{\lambda\to 0^+} E_\lambda f(x) = f(x)$ and $\lim_{\lambda\to 0^+} \operatorname{prox}_{\lambda f}(x) = x$ .

Differentiability and Gradient Formula:

$E_\lambda f$ is differentiable and the gradient formula from the convex case extends: $\nabla E_\lambda f(x) = \frac{1}{\lambda} (x - \operatorname{prox}_{\lambda f}(x)).$

The proof requires careful handling of upper and lower quadratic bounds and uses the one-sided directional envelope properties specific to weak convexity.

Proximal Inverse and Open Image:

If $f$ is differentiable at $x$ ,

$\operatorname{prox}_{\lambda f}(x + \lambda \nabla f(x)) = x.$

If $\nabla f$ is defined everywhere, then $\operatorname{prox}_{\lambda f}^{-1} = I + \lambda\nabla f$ , so $\operatorname{prox}_{\lambda f}$ is bijective with open image.

Cocoercivity and Nonexpansivity:

The generalized nonexpansivity becomes: $\|\operatorname{prox}_{\lambda f}(x) - \operatorname{prox}_{\lambda f}(y)\|^2 \le \frac{1}{1-\lambda\rho} \langle x-y, \operatorname{prox}(x)-\operatorname{prox}(y) \rangle,$ yielding $1/(1-\lambda\rho)$ -Lipschitz continuity for $\operatorname{prox}_{\lambda f}$ .

Convexity/Strong Convexity of the Envelope:

$E_\lambda f$ is $(\rho/(1 - \lambda\rho))$ -weakly convex.
If $f$ is $\mu$ -strongly convex, $E_\lambda f$ is $(\mu/(1+\lambda\mu))$ -strongly convex.

Smoothness Regimes:

The gradient's Lipschitz constant is $L = \max\{1/\lambda, \rho/(1-\lambda\rho)\}$ . For $\lambda\rho\le1/2$ , $L=1/\lambda$ ; for $\lambda\rho \ge 1/2$ , $L=\rho/(1-\lambda\rho)$ .

Minima and Stationarity:

$\arg\min f = \arg\min E_\lambda f$ . The characterizations

$\begin{aligned} &\nabla E_\lambda f(x)=0 \ \Leftrightarrow\ &\operatorname{prox}_{\lambda f}(x)=x \ \Leftrightarrow\ &E_\lambda f(x)=f(x) \ \Leftrightarrow\ &f(\operatorname{prox}_{\lambda f}(x))=f(x) \end{aligned}$

hold. For $f$ differentiable at $x$ , $\nabla f(x)=0\iff\nabla E_\lambda f(x)=0$ .

4. Associated Subdifferential, Epi-Convergence, and Second-Order Structure

Subdifferential (Clarke) Connection:

For general (not necessarily convex) $f$ , critical points of $E_\lambda f$ correspond to Clarke-critical points of $f$ : $\nabla E_\lambda f(x) = \frac{1}{\lambda}(x-y) \iff 0\in\partial_C f(y) + \frac{1}{\lambda}(y-x),$ where $\partial_C$ denotes the Clarke subdifferential.

Epi-Convergence:

As $\lambda\to0$ , $E_\lambda f \to f$ pointwise; $E_\lambda f$ epi-converges to $f$ (in the Attouch–Wets sense), so $\min E_\lambda f\to\min f$ and $\arg\min E_\lambda f \to \arg\min f$ in the Painlevé–Kuratowski sense.

Second-Order Formula:

If $f\in C^2$ locally and $\nabla^2f$ is $L_f$ -Lipschitz with $\lambda L_f < 1$ , then: $J(\operatorname{prox}_{\lambda f})(x) = [I + \lambda \nabla^2f(\operatorname{prox}_{\lambda f}(x))]^{-1},$

$\nabla^2 E_\lambda f(x) = \frac{1}{\lambda}\left[ I - (I + \lambda \nabla^2f(\operatorname{prox}_{\lambda f}(x)))^{-1} \right].$

Geometry of Proximal Image:

The image of $\operatorname{prox}_{\lambda f}$ is "almost convex": the Lebesgue measure of points in $\operatorname{dom} f$ not in $\operatorname{prox}_{\lambda f}(\mathbb{R}^n)$ is zero, and similarly for the convex hull.

5. Overview Table: Main Properties by Function Class

Property	Convex $f$	$\rho$ -weakly convex $f$ ( $\lambda\rho<1$ )
$E_\lambda f$ finite, differentiable	Yes	Yes
Gradient formula	$\frac{1}{\lambda}(x - \operatorname{prox}_{\lambda f}(x))$	Same
$\nabla E_\lambda f$ Lipschitz	$1/\lambda$ -Lipschitz	$\max\{1/\lambda, \rho/(1-\lambda\rho)\}$ -Lipschitz
Preservation of minimizers	Yes	Yes
$E_\lambda f$ convex	Yes	$(\rho/(1-\lambda\rho))$ -weakly convex
$E_\lambda f$ strongly convex if $f$ is	Yes	$(\mu/(1+\lambda\mu))$ -strongly convex if $f$ is $\mu$ -strongly convex

This table condenses the quantitative and qualitative parallels and differences between convex and weakly convex regimes.

6. Significance, Generalization, and Applications

The Moreau envelope provides a universal smoothing mechanism:

For convex $f$ , it produces a $C^1$ , $(1/\lambda)$ -smooth, strictly smaller-than- $f$ function, preserving minimization structure and enabling first-order (and in strongly convex settings, higher-order) algorithm design.
For weakly convex $f$ , the envelope extends this regularization—preserving differentiability and smoothness up to the threshold $\lambda\rho<1$ , with all key stationarity and minimization identifications maintained.

This regularization is pivotal in:

Stochastic and deterministic first-order algorithms, where the Moreau envelope yields descent directions even for nonsmooth or weakly convex objectives.
Analysis of variational convergence (epi-convergence, Painlevé–Kuratowski minimizer limits).
Second-order theory, owing to the preservation or transfer of key curvature properties via explicit Hessian transformations.
Formulation and exact/approximate solution of bilevel optimization problems, composite minimization, and algorithmic frameworks reliant on the proximal point methodology.

7. Structural Limits and Advanced Topics

As $\lambda \to 0$ , the Moreau envelope recovers $f$ pointwise and in epigraphical topology, making it an essential tool in variational approximation.
For $C^2$ (twice differentiable) weakly convex $f$ , the second derivative of the envelope encodes information on the local smoothability of $f$ through the proximal transformation, central for higher-order algorithms.
The relationship between the image of the proximal mapping and the domain of $f$ is "almost full measure," providing strong geometric guarantees for algorithmic coverage.

A plausible implication is that almost every point in $\operatorname{dom} f$ is represented as a prox point for some $x$ , facilitating analysis and justification of regularization or penalization procedures based on the Moreau envelope or related proximal maps.

The Moreau envelope thus constitutes a comprehensive smoothing and regularization theory—initially for convex but robustly extending to the weakly convex regime—endowing nonsmooth and nonconvex optimization with differentiability, smoothness, explicit first- and second-order structures, and robust links to variational analysis and algorithmic convergence (Renaud et al., 17 Sep 2025).

PDF Markdown Chat (Pro)

References (1)

On the Moreau envelope properties of weakly convex functions (2025)

Follow Topic

Get notified by email when new papers are published related to Moreau Envelope.