Normalized Smoothed-L1 Loss

Updated 29 April 2026

Normalized smoothed-L1 loss is a scale-invariant, smooth sparsity-promoting regularizer that leverages a log transformation of the ℓ1/ℓ2 ratio.
Its formulation introduces smoothing parameters to ensure differentiability, enabling efficient gradient and proximal-based optimization.
The algorithmic framework integrates the loss into a variable-metric forward–backward scheme, ensuring convergence to a critical point in blind deconvolution and sparse recovery problems.

The normalized smoothed-L1 loss, also referred to as the normalized smoothed-ℓ₁ or smoothed-ℓ₁/ℓ₂ penalty, is a scale-invariant, smooth, and strongly sparsity-promoting functional designed for use as a regularizer in optimization, particularly in blind deconvolution and sparse recovery. This penalty is a smooth, log-transformed variant of the classical ℓ₁/ℓ₂ ratio and is constructed to retain the scale invariance and strong sparsity-promotion of the original ratio penalty while enabling efficient use of proximal and gradient-based schemes by avoiding nonconvexity and nonsmoothness issues that characterize the exact ratio term (Repetti et al., 2014).

1. Origin and Scale-Invariant Motivation

The normalized smoothed-L1 penalty was introduced to address the limitations of the unsmoothed and nonsmooth ℓ₁/ℓ₂ ratio, which in sparse blind deconvolution and signal recovery applications is used to enforce sparsity while maintaining scale invariance. The classical penalty takes the form

$\frac{\ell_1(x)}{\ell_2(x)}$

where $\ell_1(x)=\sum_n|x_n|$ and $\ell_2(x) = \|\mathbf{x}\|_2 = \sqrt{\sum_n x_n^2}$ . This ratio remains unchanged under scaling $x \mapsto cx$ , a property highly desirable in blind deconvolution. However, it is both nonconvex and nonsmooth, leading to significant difficulties in direct minimization due to the non-differentiability at entries equaling zero and undefined gradients when $\ell_2(x)=0$ . To overcome these drawbacks, the smoothed and log-transformed variant was proposed (Repetti et al., 2014).

2. Mathematical Formulation and Smooth Approximation

The normalized smoothed-ℓ₁ loss is given by

$\varphi(x) = \lambda \cdot \log \left( \frac{\ell_{1,\alpha}(x) + \beta}{\ell_{2,\eta}(x)} \right)$

where

$\ell_{1,\alpha}(x) = \sum_n \left( \sqrt{x_n^2 + \alpha^2} - \alpha \right), \quad \ell_{2,\eta}(x) = \sqrt{ \sum_n x_n^2 + \eta^2 }$

and $\lambda > 0$ is a weighting parameter, $\alpha, \eta > 0$ are smoothing parameters, and $\beta > 0$ is a small positive shift ensuring strict positivity inside the logarithm (Equation (2), (Repetti et al., 2014)). As $\ell_1(x)=\sum_n|x_n|$ 0 and $\ell_1(x)=\sum_n|x_n|$ 1, the expressions recover the original $\ell_1(x)=\sum_n|x_n|$ 2 and $\ell_1(x)=\sum_n|x_n|$ 3 norms. The log transformation both keeps the ratio positive and further amplifies sparsity by increasing the concavity of the penalty.

3. Differential Calculus for Optimization

The smooth structure of $\ell_1(x)=\sum_n|x_n|$ 4 admits an explicit expression for its gradient, crucial for efficient algorithmic implementation. Decomposing $\ell_1(x)=\sum_n|x_n|$ 5 with $\ell_1(x)=\sum_n|x_n|$ 6 and $\ell_1(x)=\sum_n|x_n|$ 7,

$\ell_1(x)=\sum_n|x_n|$ 8,
$\ell_1(x)=\sum_n|x_n|$ 9.

The full gradient is

$\ell_2(x) = \|\mathbf{x}\|_2 = \sqrt{\sum_n x_n^2}$ 0

where “ $\ell_2(x) = \|\mathbf{x}\|_2 = \sqrt{\sum_n x_n^2}$ 1” denotes component-wise multiplication by the vector $\ell_2(x) = \|\mathbf{x}\|_2 = \sqrt{\sum_n x_n^2}$ 2. This explicit smoothness ensures that $\ell_2(x) = \|\mathbf{x}\|_2 = \sqrt{\sum_n x_n^2}$ 3 is globally Lipschitz on any bounded set, which is not the case for the original ℓ₁/ℓ₂ ratio. The proximal mapping of $\ell_2(x) = \|\mathbf{x}\|_2 = \sqrt{\sum_n x_n^2}$ 4 does not admit a closed-form formula and is not used within the main algorithmic framework; instead, $\ell_2(x) = \|\mathbf{x}\|_2 = \sqrt{\sum_n x_n^2}$ 5 is treated as part of the smooth component in the objective (Repetti et al., 2014).

4. Algorithmic Integration: Variable-Metric Forward–Backward Scheme

The normalized smoothed-L1 loss is incorporated into a composite objective of the form

$\ell_2(x) = \|\mathbf{x}\|_2 = \sqrt{\sum_n x_n^2}$ 6

with $\ell_2(x) = \|\mathbf{x}\|_2 = \sqrt{\sum_n x_n^2}$ 7 treating deconvolution, $\ell_2(x) = \|\mathbf{x}\|_2 = \sqrt{\sum_n x_n^2}$ 8 being proper, lower semi-continuous convex functions encoding nonsmooth constraints, and $\ell_2(x) = \|\mathbf{x}\|_2 = \sqrt{\sum_n x_n^2}$ 9 as the smooth sparsity-promoting regularizer. The primary optimization routine is a blockwise variable-metric forward–backward method (Algorithm 1, (Repetti et al., 2014)), which alternates between (x-update and h-update) proximal steps in suitable majorizing variable-metric inner products. The smoothness of $x \mapsto cx$ 0 is crucial, as it avoids explicit computation of its proximal operator, and the per-iteration cost is dominated by convolution and simple proximal projections:

Gradient computation for $x \mapsto cx$ 1: $x \mapsto cx$ 2 operations.
Proximal steps involving $x \mapsto cx$ 3 and $x \mapsto cx$ 4: typically $x \mapsto cx$ 5 and $x \mapsto cx$ 6 for simple constraints. If a single inner forward–backward step is used (i.e., $x \mapsto cx$ 7), the scheme specializes to a PALM-type method.

5. Convergence Properties

Under standard assumptions, including uniform upper and lower bounds on the majorization matrices $x \mapsto cx$ 8 and $x \mapsto cx$ 9, proper stepsize selection in a compact interval $\ell_2(x)=0$ 0, and semi-algebraicity of the non-smooth constraint functions, the iterates $\ell_2(x)=0$ 1 produced by the algorithm converge to a critical point of the full composite functional $\ell_2(x)=0$ 2:

Theorem (Proposition 1, (Repetti et al., 2014)): If the variable-metric majorizers $\ell_2(x)=0$ 3 are uniformly bounded, stepsizes are controlled, and $\ell_2(x)=0$ 4 is semi-algebraic, then every trajectory converges to a critical point, and objective values decrease monotonically to the minimum.

This global convergence result is enabled by the smoothness and regularity properties of $\ell_2(x)=0$ 5, unlike the behavior of the nonsmooth ratio or unregularized objectives.

6. Comparison with Alternative Sparsity Penalties

The table below summarizes key qualities of the normalized smoothed-L1 loss versus classical alternatives:

Penalty	Scale-Invariant	Smooth Gradient	Sparsity Promotion
Unsmoothed ℓ₁/ℓ₂	Yes	No	Strong
Smoothed-ℓ₁/ℓ₂ (φ(x))	Yes	Yes	Strong (log-concave)
Plain ℓ₁	No	No	Moderate
Huber	No	Yes	Weak/Moderate

The smoothed normalized penalty $\ell_2(x)=0$ 6 combines scale invariance, concavity-driven sparsity enhancement, and differentiable structure, features not simultaneously present in the other penalties. In particular, the logarithmic transformation further increases concavity, promoting sparsity more aggressively than the original ratio.

7. Significance and Practical Implications

The normalized smoothed-L1 loss enables component-wise scale-invariant sparsity regularization in nonconvex blind deconvolution settings while accommodating efficient gradient and proximal-based methods due to its smoothness. It avoids pathological behavior at the origin and where $\ell_2(x)=0$ 7. In sparse signal retrieval and deconvolution contexts, this penalty provides greater robustness and algorithmic tractability compared to its unsmoothed counterpart. The approach has been empirically demonstrated, notably in seismic data blind deconvolution, and the methodology generalizes to a variety of applications where strong, smooth sparsity and scale invariance are required (Repetti et al., 2014).

Markdown Report Issue Upgrade to Chat

References (1)

Euclid in a Taxicab: Sparse Blind Deconvolution with Smoothed l1/l2 Regularization (2014)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Normalized Smoothed-L1 Loss.