Normalized Smoothed-L1 Loss
- Normalized smoothed-L1 loss is a scale-invariant, smooth sparsity-promoting regularizer that leverages a log transformation of the ℓ1/ℓ2 ratio.
- Its formulation introduces smoothing parameters to ensure differentiability, enabling efficient gradient and proximal-based optimization.
- The algorithmic framework integrates the loss into a variable-metric forward–backward scheme, ensuring convergence to a critical point in blind deconvolution and sparse recovery problems.
The normalized smoothed-L1 loss, also referred to as the normalized smoothed-ℓ₁ or smoothed-ℓ₁/ℓ₂ penalty, is a scale-invariant, smooth, and strongly sparsity-promoting functional designed for use as a regularizer in optimization, particularly in blind deconvolution and sparse recovery. This penalty is a smooth, log-transformed variant of the classical ℓ₁/ℓ₂ ratio and is constructed to retain the scale invariance and strong sparsity-promotion of the original ratio penalty while enabling efficient use of proximal and gradient-based schemes by avoiding nonconvexity and nonsmoothness issues that characterize the exact ratio term (Repetti et al., 2014).
1. Origin and Scale-Invariant Motivation
The normalized smoothed-L1 penalty was introduced to address the limitations of the unsmoothed and nonsmooth ℓ₁/ℓ₂ ratio, which in sparse blind deconvolution and signal recovery applications is used to enforce sparsity while maintaining scale invariance. The classical penalty takes the form
where and . This ratio remains unchanged under scaling , a property highly desirable in blind deconvolution. However, it is both nonconvex and nonsmooth, leading to significant difficulties in direct minimization due to the non-differentiability at entries equaling zero and undefined gradients when . To overcome these drawbacks, the smoothed and log-transformed variant was proposed (Repetti et al., 2014).
2. Mathematical Formulation and Smooth Approximation
The normalized smoothed-ℓ₁ loss is given by
where
and is a weighting parameter, are smoothing parameters, and is a small positive shift ensuring strict positivity inside the logarithm (Equation (2), (Repetti et al., 2014)). As 0 and 1, the expressions recover the original 2 and 3 norms. The log transformation both keeps the ratio positive and further amplifies sparsity by increasing the concavity of the penalty.
3. Differential Calculus for Optimization
The smooth structure of 4 admits an explicit expression for its gradient, crucial for efficient algorithmic implementation. Decomposing 5 with 6 and 7,
- 8,
- 9.
The full gradient is
0
where “1” denotes component-wise multiplication by the vector 2. This explicit smoothness ensures that 3 is globally Lipschitz on any bounded set, which is not the case for the original ℓ₁/ℓ₂ ratio. The proximal mapping of 4 does not admit a closed-form formula and is not used within the main algorithmic framework; instead, 5 is treated as part of the smooth component in the objective (Repetti et al., 2014).
4. Algorithmic Integration: Variable-Metric Forward–Backward Scheme
The normalized smoothed-L1 loss is incorporated into a composite objective of the form
6
with 7 treating deconvolution, 8 being proper, lower semi-continuous convex functions encoding nonsmooth constraints, and 9 as the smooth sparsity-promoting regularizer. The primary optimization routine is a blockwise variable-metric forward–backward method (Algorithm 1, (Repetti et al., 2014)), which alternates between (x-update and h-update) proximal steps in suitable majorizing variable-metric inner products. The smoothness of 0 is crucial, as it avoids explicit computation of its proximal operator, and the per-iteration cost is dominated by convolution and simple proximal projections:
- Gradient computation for 1: 2 operations.
- Proximal steps involving 3 and 4: typically 5 and 6 for simple constraints. If a single inner forward–backward step is used (i.e., 7), the scheme specializes to a PALM-type method.
5. Convergence Properties
Under standard assumptions, including uniform upper and lower bounds on the majorization matrices 8 and 9, proper stepsize selection in a compact interval 0, and semi-algebraicity of the non-smooth constraint functions, the iterates 1 produced by the algorithm converge to a critical point of the full composite functional 2:
Theorem (Proposition 1, (Repetti et al., 2014)): If the variable-metric majorizers 3 are uniformly bounded, stepsizes are controlled, and 4 is semi-algebraic, then every trajectory converges to a critical point, and objective values decrease monotonically to the minimum.
This global convergence result is enabled by the smoothness and regularity properties of 5, unlike the behavior of the nonsmooth ratio or unregularized objectives.
6. Comparison with Alternative Sparsity Penalties
The table below summarizes key qualities of the normalized smoothed-L1 loss versus classical alternatives:
| Penalty | Scale-Invariant | Smooth Gradient | Sparsity Promotion |
|---|---|---|---|
| Unsmoothed ℓ₁/ℓ₂ | Yes | No | Strong |
| Smoothed-ℓ₁/ℓ₂ (φ(x)) | Yes | Yes | Strong (log-concave) |
| Plain ℓ₁ | No | No | Moderate |
| Huber | No | Yes | Weak/Moderate |
The smoothed normalized penalty 6 combines scale invariance, concavity-driven sparsity enhancement, and differentiable structure, features not simultaneously present in the other penalties. In particular, the logarithmic transformation further increases concavity, promoting sparsity more aggressively than the original ratio.
7. Significance and Practical Implications
The normalized smoothed-L1 loss enables component-wise scale-invariant sparsity regularization in nonconvex blind deconvolution settings while accommodating efficient gradient and proximal-based methods due to its smoothness. It avoids pathological behavior at the origin and where 7. In sparse signal retrieval and deconvolution contexts, this penalty provides greater robustness and algorithmic tractability compared to its unsmoothed counterpart. The approach has been empirically demonstrated, notably in seismic data blind deconvolution, and the methodology generalizes to a variety of applications where strong, smooth sparsity and scale invariance are required (Repetti et al., 2014).