Proximal Alternating Linearized Minimization (PALM)

Updated 25 May 2026

PALM is a structured optimization algorithm that minimizes composite, block-separable objectives through iterative proximal updates and linearized smooth components.
It leverages Lipschitz continuity and adaptive step-size rules to ensure convergence and efficiency in high-dimensional, constrained applications.
Variants like inertial, Bregman, and stochastic extensions enhance its robustness and practical performance in areas such as large-scale control, imaging, and signal processing.

Proximal Alternating Linearized Minimization (PALM) is a widely used algorithmic framework for structured nonconvex, nonsmooth optimization, especially for problems exhibiting block structure, composite objectives, or explicit nonconvex constraints. The method is extensively applied in large-scale control, machine learning, signal processing, compressed sensing, and imaging domains due to its ability to address challenging constraint sets, exploit sparsity-inducing regularization, and provide convergence guarantees under broad conditions. PALM admits a rich zoo of variants—including inertial, stochastic, asynchronous, Bregman, and learned/unrolled extensions—which further extend its scope and practical efficiency.

1. Formulation and Problem Classes

PALM is designed for minimization problems of the following generic composite, block-separable form: $\min_{x = (x_1, \dots, x_m)}\ F(x) = f(x) + \sum_{j=1}^m r_j(x_j),$ where

$f : \mathbb{R}^n \to \mathbb{R}$ is continuously differentiable (possibly nonconvex) and couples all variables,
each $r_j : \mathbb{R}^{n_j} \to (-\infty, +\infty]$ is lower semicontinuous, possibly nonconvex/nonsmooth, and block-separable.

A canonical two-block case writes

$\min_{x_1 \in \mathbb{R}^{n_1}, x_2 \in \mathbb{R}^{n_2}} F(x_1, x_2) = H(x_1, x_2) + f(x_1) + g(x_2),$

where $H$ is $C^1$ (possibly nonconvex), and $f, g$ are proper, l.s.c., possibly prox-bounded nonconvex regularizers, hard constraints (as indicators), or composite functions.

Prototypical applications include:

Sparse output feedback synthesis and sensor/actuator co-design under cardinality constraints (Lin et al., 2017).
Sparse signal recovery, factor analysis, matrix/tensor factorization, low-complexity system identification (Lin et al., 2016, Ahookhosh et al., 2019, Brandoni et al., 2021).
Blind source separation, dictionary learning, and structured NMF (Fahes et al., 2021, Brandoni et al., 2021).
Imaging inverse problems with mixed or nonconvex penalties (e.g., $\ell_0$ , $\ell_{1/2}$ , structured sparsity) (Guo et al., 2023, Liu et al., 2020).
Large-scale optimal control with sparsity or rank constraints (Lin et al., 2017).

2. Algorithmic Principles and Proximal Steps

PALM alternates blockwise updates between variable sets, linearizing the smooth coupling term in each block and performing a proximal step for the nonsmooth/separable part. Consider the two-block version at iteration $k$ :

Block 1 ( $f : \mathbb{R}^n \to \mathbb{R}$ 0) update:

$f : \mathbb{R}^n \to \mathbb{R}$ 1

Block 2 ( $f : \mathbb{R}^n \to \mathbb{R}$ 2) update:

$f : \mathbb{R}^n \to \mathbb{R}$ 3

with $f : \mathbb{R}^n \to \mathbb{R}$ 4, $f : \mathbb{R}^n \to \mathbb{R}$ 5 exceeding the respective local block-Lipschitz constants. The block sequence can be cyclic or randomized.

The general $f : \mathbb{R}^n \to \mathbb{R}$ 6-block update is: $f : \mathbb{R}^n \to \mathbb{R}$ 7 for each $f : \mathbb{R}^n \to \mathbb{R}$ 8, with forward/implicit Gauss–Seidel ordering.

These proximal subproblems reduce to closed-form or simple thresholding/projected steps under indicator or sparsity constraints, $f : \mathbb{R}^n \to \mathbb{R}$ 9/ $r_j : \mathbb{R}^{n_j} \to (-\infty, +\infty]$ 0 penalties, or norm-balls. For instance:

Hard-thresholding for entrywise/cardinality constraints,
Row/column truncation for structured matrix sparsity,
Truncated SVD for rank constraints,
Soft-thresholding for $r_j : \mathbb{R}^{n_j} \to (-\infty, +\infty]$ 1 norm,
Projection for convex constraints or norm balls.

In many control and estimation problems, the smooth subproblem (e.g., Riccati/Lyapunov penalization) is often solved by Anderson–Moore or alternating minimization cycles, and the coupling is handled via penalty or quadratic regularization (Lin et al., 2017, Lin et al., 2016).

3. Lipschitz Analysis, Step Sizes, and Technical Assumptions

PALM requires that for each block variable, the partial gradient of the smooth component is Lipschitz in its respective block, potentially with constants depending on the current values of other blocks. Denoting (for two blocks)

$r_j : \mathbb{R}^{n_j} \to (-\infty, +\infty]$ 2

PALM step sizes are chosen as

$r_j : \mathbb{R}^{n_j} \to (-\infty, +\infty]$ 3

with an over-relaxation factor (e.g., $r_j : \mathbb{R}^{n_j} \to (-\infty, +\infty]$ 4 (Lin et al., 2017)).

These blockwise Lipschitz constants can be:

Explicit (from the problem structure, e.g., $r_j : \mathbb{R}^{n_j} \to (-\infty, +\infty]$ 5 in matrix co-design problems (Lin et al., 2017))
Estimated online or via power iterations for large operators
Adaptive (as in spectral PALM (Brandoni et al., 2021)) using Barzilai–Borwein or backtracking/Armijo strategies for local curvature matching.

Basic technical conditions involve properness, lower-semicontinuity and prox-boundedness of the nonsmooth terms, and the Kurdyka–Łojasiewicz (KL) property (or semi-algebraicity) of the global objective to guarantee convergence (Lin et al., 2017, Davis, 2016, Liu et al., 2020).

4. Convergence Guarantees and Rate Theory

Under these conditions, PALM possesses strong theoretical guarantees:

Sufficient decrease: At each iteration, the objective strictly decreases unless at a fixed point (Lin et al., 2017).
Finite-length property: The total path length $r_j : \mathbb{R}^{n_j} \to (-\infty, +\infty]$ 6 is finite, ensuring $r_j : \mathbb{R}^{n_j} \to (-\infty, +\infty]$ 7 is Cauchy.
Stationarity: Any cluster point is a critical (stationary) point; under KL, the entire sequence converges to a single critical point (Davis, 2016).
Convergence rates: When the objective function is KL with exponent $r_j : \mathbb{R}^{n_j} \to (-\infty, +\infty]$ $r_{j} : R^{n_{j}} \to (- \infty, + \infty]$ 8,
- Finite-step convergence if $r_j : \mathbb{R}^{n_j} \to (-\infty, +\infty]$ 9
- Linear rate for $\min_{x_1 \in \mathbb{R}^{n_1}, x_2 \in \mathbb{R}^{n_2}} F(x_1, x_2) = H(x_1, x_2) + f(x_1) + g(x_2),$ 0
- Sublinear $\min_{x_1 \in \mathbb{R}^{n_1}, x_2 \in \mathbb{R}^{n_2}} F(x_1, x_2) = H(x_1, x_2) + f(x_1) + g(x_2),$ 1 for $\min_{x_1 \in \mathbb{R}^{n_1}, x_2 \in \mathbb{R}^{n_2}} F(x_1, x_2) = H(x_1, x_2) + f(x_1) + g(x_2),$ 2 (Davis, 2016, Lin et al., 2017)
Monotonicity: The objective is monotonically decreasing, often accompanied by a Lyapunov or surrogate sequence for more complex (inexact, asynchronous) variants (Hu et al., 2022).

If sub-block problems are not solved exactly, suitable surrogate measures ensure that inexact PALM (PALM-I) still converges under relaxed (computable) error bounds (Hu et al., 2022).

5. Variants and Extensions

Numerous enhancements of the plain PALM scheme are established for improved performance, robustness, and scalability:

Inertial (iPALM): Adds momentum (Nesterov/Polyak) to each block-update, typically of the form $\min_{x_1 \in \mathbb{R}^{n_1}, x_2 \in \mathbb{R}^{n_2}} F(x_1, x_2) = H(x_1, x_2) + f(x_1) + g(x_2),$ 3. Proven to globally converge to critical points under parameter restrictions, and empirically speeds up convergence (Pock et al., 2017, Hertrich et al., 2020, Guo et al., 2023).
Bregman PALM: Introduces block-separable Bregman distances in lieu of squared Euclidean norms to better match geometry or regularization in each block (Ahookhosh et al., 2019, Guo et al., 2023). This facilitates closed-form updates in non-Euclidean settings and can exploit relative smoothness, e.g., for structured NMF.
Variable Metric and Composite (CPALM): Employs SPD preconditioning matrices or surrogate majorants for composite or non-Euclidean terms, enabling broader classes of objective structure, notably for composite nonsmoothness or concavity (Yashtini, 2022).
Stochastic, Asynchronous, and Parallel PALM: Many large-scale problems replace exact gradients $\min_{x_1 \in \mathbb{R}^{n_1}, x_2 \in \mathbb{R}^{n_2}} F(x_1, x_2) = H(x_1, x_2) + f(x_1) + g(x_2),$ 4 by stochastic or variance-reduced minibatch estimators (SAGA, SARAH), preserve monotonicity in expectation, and establish $\min_{x_1 \in \mathbb{R}^{n_1}, x_2 \in \mathbb{R}^{n_2}} F(x_1, x_2) = H(x_1, x_2) + f(x_1) + g(x_2),$ 5 or linear rates in KL/strongly regularized cases (Driggs et al., 2020, Hertrich et al., 2020, Guo et al., 2023). Fully asynchronous versions (Asynchronous PALM, SAPALM) eliminate update synchronization; the theory shows near-linear speedup with increasing cores (Davis, 2016, Davis et al., 2016).
Spectral and Adaptive Step-size PALM: Spectral (sPALM) harnesses BB or adaptive local step-sizes to accelerate over Lipschitz-based fixed steps (Brandoni et al., 2021).
Unrolling/Learned PALM: Unrolling the PALM architecture into an interpretable, trainable model (LPAM, LPALM-net) integrates residual learning and block coordinate descent as safeguard steps, yielding efficient, data-driven deep network architectures for bilevel or semi-blind problems (Fahes et al., 2021, Chen et al., 2024).
Infeasible Inexact PALM (PALM-I): Handles exactly/intermediately infeasible substep solvers for complex or non-projectable constraints, ensuring convergence via surrogate monotonicity (Hu et al., 2022).
Two-step/inertial Bregman stochastic PALM: Generalizations involving two-step extrapolation and Bregman distances have been shown to further accelerate convergence in structured, nonconvex landscapes (Guo et al., 2023, Guo et al., 2023).

6. Computational Aspects and Implementation

PALM excels in block-structured, high-dimensional, and sparsity-constrained settings by leveraging proximal operators with closed forms. For instance:

In sparse output feedback co-design, K and C updates reduce to hard-thresholding on entries and rows, respectively, while the Lyapunov-constrained feedback is handled by specialized Anderson–Moore cycles (Lin et al., 2017).
For sparse NMF, SVD, and compressed sensing, projection/truncation and soft/hard thresholding operations dominate per-iteration cost (Lin et al., 2016, Ahookhosh et al., 2019, Brandoni et al., 2021).
In machine learning and imaging, scalable parallelism and stochastic mini-batching are straightforward to implement for both PALM and its variants, with linear speedup achievable under mild architectural constraints (Davis et al., 2016, Driggs et al., 2020).

As per the empirical studies, PALM and its extensions routinely handle problem sizes up to tens of thousands of variables in minutes on commodity hardware and outperform classical gradient projection, alternating minimization, or non-inertial alternatives in both convergence speed and solution quality (Lin et al., 2017, Lin et al., 2016, Liu et al., 2020, Brandoni et al., 2021).

7. Representative Applications and Numerical Performance

Applications span a wide array:

Sparse output feedback and sensor-actuator co-design: Achieving convergence and near-optimal performance in high-dimensional, unstable feedback design problems with 60,000 design variables (Lin et al., 2017).
Sparse low-complexity system identification and VAR modeling: PALM outperforms gradient projection even under hard cardinality constraints, achieving accurate system recovery at reduced computation (Lin et al., 2016).
Image reconstruction and denoising: Variants such as PALM-NUT, iSPALM, and sPALM accelerate MR image reconstruction, phase/magnitude estimation, and compressed sensing recovery, attaining faster convergence and lower normalized error and NRMSE compared to other state-of-the-art algorithms (Liu et al., 2020, Hertrich et al., 2020, Brandoni et al., 2021).
Boolean matrix factorization and pattern mining: PALM-based tiling robustly identifies interpretable structures in high-noise regimes, outperformed alternatives in task-specific accuracy and speed (Hess et al., 2019).
Orthogonal NMF and dictionary learning: Bregman PALM instantiations yield direct, closed-form subproblem solutions even for non-negativity and orthogonality-induced nonconvexity (Ahookhosh et al., 2019).
Blind source separation (BSS): LPALM and its learned/unrolled counterparts efficiently perform semi-blind separation, achieving up to $\min_{x_1 \in \mathbb{R}^{n_1}, x_2 \in \mathbb{R}^{n_2}} F(x_1, x_2) = H(x_1, x_2) + f(x_1) + g(x_2),$ 6– $\min_{x_1 \in \mathbb{R}^{n_1}, x_2 \in \mathbb{R}^{n_2}} F(x_1, x_2) = H(x_1, x_2) + f(x_1) + g(x_2),$ 7 times fewer iterations and better recovery than standard PALM (Fahes et al., 2021).

Theoretical and empirical results consistently demonstrate PALM and its variants as the method of choice for large-scale, block-structured nonconvex nonsmooth optimization when closed-form or efficient approximations of subproblems are available and when global convergence to critical points and practical efficiency are crucial.