Iterative Reweighted Least Squares (IRLS)

Updated 21 October 2025

Iterative Reweighted Least Squares (IRLS) is a method that reformulates non-smooth or nonconvex penalties into a sequence of weighted quadratic problems for efficient optimization.
It iteratively updates weights to mimic penalties like the ℓ₁ norm or Schatten-p quasi-norm, ensuring robust sparse signal and low-rank recovery.
Empirical results show IRLS reduces computational overhead and achieves competitive accuracy in applications such as face clustering, motion segmentation, and robust PCA.

Iterative Reweighted Least Squares (IRLS) refers to a broad class of optimization algorithms designed to solve problems involving non-smooth or non-convex regularizers by iteratively solving weighted least-squares subproblems. The IRLS paradigm is particularly prominent in the context of sparse signal recovery, low-rank matrix recovery, robust regression, and mixed-structure minimization tasks. At each IRLS step, weights are updated to “mimic” a desired non-smooth penalty—such as the ℓ₁ norm or Schatten-p quasi-norm—allowing the minimization to proceed via computationally efficient quadratic subproblems. The IRLS framework is characterized by its flexibility, strong empirical performance, and (in its modern variants) rigorous convergence properties both in convex and nonconvex domains.

1. General Principles and Mechanisms

The foundational principle of IRLS is the transformation of non-smooth or non-quadratic penalties, such as sparsity or low-rank constraints, into a sequence of surrogate problems where the regularization is replaced by an adaptively weighted quadratic term. Formally, for an optimization problem of the form

$\min_x \; f(x) + R(x),$

where $R(\cdot)$ is a non-smooth penalty, IRLS rewrites $R(x)$ (or its smoothed version) as a limit of weighted quadratic forms: $R(x) \approx \sum_{i} w_i(x^{(k)}) x_i^2,$ where $w_i(\cdot)$ are appropriately chosen reweighting functions that depend on the current or previous iterate $x^{(k)}$ . Each iteration then solves: $x^{(k+1)} = \operatorname{argmin}_{x} f(x) + \sum_i w_i(x^{(k)}) x_i^2.$ The weights $w_i$ are designed to capture the effect of the original regularizer (for example, setting $w_i = 1/\max\{|x_i|, \epsilon\}$ for ℓ₁ minimization, or powers of singular values for Schatten-p). For mixed objectives, the IRLS formulation includes multiple distinct weight matrices, each updating according to the associated penalty component.

The IRLS approach is justified both algorithmically—due to the computational efficiency of solving least-squares problems—and theoretically, since under mild regularity or structure assumptions, the sequence of iterates can be shown to converge to minimizers or stationary points of the original problem.

2. Extensions to Mixed Low-Rank and Sparse Minimization

Traditional IRLS algorithms primarily addressed problems with a single non-smooth regularization, such as sparse signal recovery (ℓ₁ or ℓₚ “norms”) or convex low-rank recovery (via nuclear norm/trace norm minimization). The work of (Lu et al., 2014) extends IRLS to handle joint minimization of (possibly nonconvex) low-rank and sparse objectives, a regime essential for modern matrix decompositions and robust data analysis. The joint objective takes the form: $\min_{Z} \; \|Z\|_{S_p}^p + \lambda \|XZ - X\|_{2,q}^p,$ where

$\|Z\|_{S_p}^p$ is the Schatten-p quasi-norm, penalizing the singular values of $Z$ to promote low-rankness,
$\|XZ - X\|_{2, q}^p$ is a columnwise ℓ₂,q penalty, enforcing group- or entrywise sparsity in the reconstruction errors (e.g., low-rank representation, robust PCA).

The corresponding IRLS algorithm introduces distinct weight matrices $M$ and $N$ for each term: $M = (Z^T Z + \mu^2 I)^{p/2 - 1}, \quad N_{ii} = (\| (XZ - X)_i \|_2^2 + \mu^2 )^{q/2 - 1}.$ At each iteration, given the weights, the update for $Z$ becomes the solution to a matrix equation: $p Z M + \lambda q X^T (XZ - X) N = 0,$ which can be solved as a Sylvester equation. Weights are then updated using the latest $Z$ , and the process repeats.

This framework generalizes previous IRLS approaches by enabling the simultaneous treatment of multiple nonsmooth, nonconvex regularizers and decouples their surrogate construction and update. Algorithmically, this eliminates the need for auxiliary variables and allows the Sylvester equation to be solved without SVDs, providing computational efficiency advantages.

3. Smoothing Techniques and Derivation of Weight Updates

Because penalties such as the Schatten-p or ℓ₂,q norms are non-smooth and sometimes even nonconvex, practical IRLS implementations employ smoothing. For instance, a smoothed Schatten-p surrogate may be

$\| [Z; \mu I] \|_{S_p}^p,$

with $\mu > 0$ ensuring differentiability. The gradient of such a term with respect to $Z$ can be expressed analytically: $\frac{\partial}{\partial Z} \; \| [Z; \mu I] \|_{S_p}^p = p Z (Z^T Z + \mu^2 I)^{p/2 - 1}.$ Similarly, for the smoothed ℓ₂,q penalty, columnwise squared norm surrogates are used: $N_{ii} = (\| (XZ - X)_i \|_2^2 + \mu^2)^{q/2 - 1}.$ This enables the efficient computation of weight updates and makes the gradients globally Lipschitz, facilitating the convergence proofs.

The update sequence alternates:

Solve for variables (e.g., $Z$ ) using fixed weights ( $M, N$ )
Update weights using latest solution
Decrease smoothing parameter $\mu$ if desired (typically in late-stage iterations)

The critical consequence is that the objective is monotonically decreasing and converges to a stationary point of the smoothed surrogate; if $p, q \geq 1$ , the solution is globally optimal due to convexity.

4. Convergence Theory and Monotonicity Properties

The theoretical properties of the smoothed mixed IRLS algorithm are established through a sequence of descent lemmas, leveraging inequalities derived from the concavity of the mapping $x \mapsto x^p$ for $0 < p < 1$. The argument proceeds as follows:

Use smoothing to ensure all functions are differentiable;
Show that the decrease in the objective at each iteration is at least proportional to the squared Frobenius norm of the difference between successive variable estimates;
Prove that the sequence of iterates is bounded and that consecutive differences vanish, so every accumulation point is a stationary point;
Demonstrate that the entire process is globally convergent to stationary points, and in the convex case, recovers globally optimal solutions.

These results extend known convergence properties of IRLS for pure sparsity or low-rank recovery to the simultaneous mixed setting, not requiring special properties of individual regularizers, and ensuring the practical reliability of the algorithm.

5. Algorithmic Efficiency and Experimental Evaluation

Empirical results reported in (Lu et al., 2014) confirm significant computational and statistical advantages for the generalized IRLS scheme:

Avoids heavy SVD computations by reducing the subproblem to a Sylvester equation;
Achieves rapid contraction of the objective (fewer iterations and reduced runtimes versus accelerated proximal gradient, alternating direction method, or linearized variants);
Produces competitive or superior accuracy for joint low-rank and sparse recovery tasks, such as face clustering (Extended Yale B, CMU PIE), motion segmentation (Hopkins 155), and robust PCA instances.

A table summarizing the comparison can be structured as:

Application	IRLS Runtime	Competing Method (Runtime)
Face clustering (Yale B)	Lower	Higher (APG, ADM, LADMAP)
Motion segmentation (Hopkins 155)	Lower	Higher (APG, ADM, LADMAP)
IRPCA image correction	Lower	Higher (ADM-type methods)

Such results highlight that the IRLS approach scales well to high dimensions and is amenable to a wide range of matrix and high-dimensional regression problems requiring structural priors.

6. Impact, Applications, and Extensions

The general IRLS framework for smoothed, mixed low-rank and sparse minimization introduced in (Lu et al., 2014) provides a foundation for joint structural recovery in tasks such as:

Robust subspace clustering and representation learning (Low-Rank Representation models);
Corrupted image restoration (face recognition under occlusion, motion segmentation);
Inductive robust PCA where new data points must be mapped to a low-dimensional representation in the presence of gross corruptions.

The smoothing and alternating reweighting mechanisms developed are directly relevant to extensions:

Generalization to other compositional or hierarchical structural penalties by extending the pattern of weight updates and smoothing mechanisms;
Application to nonconvex loss functions or regression constraints by iterating the same smoothing and reweighting structure;
Incorporation with modern deep learning architectures that require interpretable and structured layer outputs, potentially as recurrent submodules.

The extension to multi-component, nonconvex, non-smooth objectives forms a general template for future algorithmic development in structured estimation, matrix recovery, and high-dimensional data analysis.

In summary, IRLS is a principled, efficient, and highly extensible method for non-smooth and non-convex optimization in signal and matrix recovery. Its formulation via smoothing and alternating weight updates, rigorous convergence proofs, and demonstrated computational effectiveness in complex structural recovery settings underscore its central role in modern statistical machine learning and data science (Lu et al., 2014).

PDF Markdown Chat (Pro)

References (1)

Smoothed Low Rank and Sparse Matrix Recovery by Iteratively Reweighted Least Squares Minimization (2014)

Follow Topic

Get notified by email when new papers are published related to Iterative Reweighted Least Squares (IRLS).