PALM Algorithm for Nonconvex Optimization

Updated 11 November 2025

PALM is a block-coordinate, first-order method for nonconvex, nonsmooth problems that utilizes proximal operators for efficient block updates.
The algorithm achieves global convergence under the Kurdyka–Łojasiewicz property, ensuring sublinear or linear rates depending on error-bound conditions.
Extensions like SPRING incorporate variance-reduced stochastic estimators and asynchronous updates to scale the method for applications in signal processing and imaging.

Proximal Alternating Linearized Minimization (PALM) is a fundamental class of block-coordinate, first-order algorithms for structured, nonsmooth, possibly nonconvex optimization problems. It is designed to handle objectives that decompose into a sum of a smooth (potentially nonconvex) coupling function and several potentially nonsmooth, prox-computable block-separable terms. PALM and its extensions (including stochastic, asynchronous, and inertial enhancements) have established convergence guarantees, oracle complexity bounds, and broad applicability in signal processing, machine learning, imaging, and control.

1. Canonical Problem Structure and PALM Updates

PALM is formulated for minimization problems in block variables $(x_1, \ldots, x_m)$ : $\min_{x_1, \ldots, x_m} F(x_1,\ldots,x_m) = f(x_1,\ldots,x_m) + \sum_{i=1}^m g_i(x_i)$ where:

$f:\mathbb{R}^d\to \mathbb{R}$ is continuously differentiable (not necessarily convex).
Each $g_i:\mathbb{R}^{d_i}\to \mathbb{R}\cup\{+\infty\}$ is proper, lower-semicontinuous, and possibly nonconvex, but has a proximal operator computable in closed form or efficiently.

Block-wise proximal updates (classical PALM), for step sizes $\{\alpha_i\}$ satisfying $0<\alpha_i<1/L_i$ (with $L_i$ the block-wise Lipschitz constants of $\nabla_{x_i} f$ ), are: $x_i^{k+1} = \text{prox}_{\alpha_i g_i} \left( x_i^k - \alpha_i \nabla_{x_i}f(x_1^{k+1},\ldots,x_{i-1}^{k+1},x_i^k,x_{i+1}^k,\ldots,x_m^k) \right),\quad i=1,\ldots,m$ with the proximal operator defined as: $\text{prox}_{\alpha g}(v) = \arg\min_u \left\{ g(u) + \frac{1}{2\alpha} \|u-v\|^2 \right\}$

The method alternates block-wise, using most recent updates for preceding blocks and current values for subsequent ones ("Gauss-Seidel" ordering).

2. Theoretical Guarantees and Assumptions

PALM's convergence theory relies on three key structural assumptions:

(A1) Block-Lipschitz partial gradients: For all $i$ , $\nabla_{x_i} f$ is $L_i$ -Lipschitz in $x_i$ (other blocks fixed).
(A2) Prox-computability: Each $g_i$ is proper, lower-semicontinuous, bounded below, and admits a cheap proximal operator.
(A3) Objective bounded below: $F$ is bounded from below.

Global convergence is achieved under the Kurdyka–Łojasiewicz (KL) property: $F$ satisfies the KL inequality at all critical points, which provides a desingularizing function $\varphi$ relating objective gaps and generalized gradient mapping norms.

Conclusions under these assumptions:

The sequence $\{x^k\}$ has finite length: $\sum_k \|x^{k+1}-x^k\|<\infty$ .
$\{x^k\}$ converges to a critical point $x^*$ of $F$ .
The generalized gradient mapping $G$ at a random iterate $\alpha \in \{1, \ldots, K\}$ achieves $E[\|G(x^\alpha)\|^2]=O(1/K)$ .
When $F$ has an error-bound property (KL exponent $\theta=1/2$ ), linear convergence is achieved:

$F(x^k) - F^* \leq (1-c)^k [F(x^0)-F^*],\quad c\in (0,1)$

Otherwise, a sublinear rate is achieved.

3. Stochastic and Variance-Reduced Extensions: SPRING

For large-scale problems where $f$ is a finite sum $f(x)=\frac{1}{n} \sum_{i=1}^n F_i(x)$ , evaluating full gradients is expensive. The SPRING algorithm introduces variance-reduced stochastic gradient estimators into the PALM framework (Driggs et al., 2020).

SPRING iteration with mini-batch $B_k \subset \{1,\ldots,n\}$ :

Compute estimators (e.g., for block $x$ $x$ ):
- SGD: $v_x^k = (1/b)\sum_{j\in B_k} \nabla_x F_j(x^k, y^k)$
- SAGA: $v_x^k = (1/b) \sum_{j\in B_k} [\nabla_x F_j(x^k,y^k) - g_{k,j}] + (1/n) \sum_{i=1}^n g_{k,i}$ , with $g_{k,i}$ storing last gradients.
- SARAH: $v_x^k=v_x^{k-1} + (1/b) \sum_{j\in B_k} [\nabla_x F_j(x^k, y^k) - \nabla_x F_j(x^{k-1}, y^{k-1})]$ (else full gradient refresh with probability $1/p$).
Update blocks with stochastic estimators replacing full gradients in classical PALM:

$x^{k+1} = \text{prox}_{\gamma_x J}(x^k - \gamma_x v_x^k),\quad y^{k+1} = \text{prox}_{\gamma_y R}(y^k - \gamma_y v_y^k)$

Oracle Complexity Results

SPRING achieves state-of-the-art oracle complexities for nonconvex finite-sum problems:

SPRING-SAGA with $b=n^{2/3}$ :
- $O(n^{2/3} L/\epsilon^2)$ SFO evaluations required for $E[\|G(x^\alpha)\|^2]\le\epsilon^2$ .
- Under error-bound: $O((n+L n^{2/3}/\mu)\cdot \log(1/\epsilon))$ .
SPRING-SARAH with $p\approx n$ :
- $O(\sqrt{n} L/\epsilon^2)$ SFO calls (matching the lower bound for nonconvex finite sums).
- Under error-bound: $O((n+L\sqrt{n}/\mu)\cdot \log(1/\epsilon))$ .

4. Empirical Performance and Practical Implementation

The PALM and SPRING families have been evaluated on large-scale imaging problems, including:

Sparse nonnegative matrix factorization (sNMF)
Sparse principal component analysis (sPCA)
Blind image deconvolution (e.g., Yale, ORL, Kodak datasets)

Summary of observed behavior:

Deterministic PALM converges reliably but has $O(n)$ per-iteration cost.
Inertial variants (see also (Pock et al., 2017, Hertrich et al., 2020)) accelerate convergence but require full gradients.
Naive SPRING-SGD exhibits slow convergence due to gradient variance.
SPRING-SAGA and SPRING-SARAH exhibit low per-iteration complexity ( $O(b)$ with $b\ll n$ ) and fast variance decay, yielding superior objective decrease per unit work.
In realistic signal and image deconvolution tasks, SPRING-SARAH outperforms full-gradient PALM by an order of magnitude in oracle calls for equivalent accuracy.
The stochastic PALM framework retains global convergence guarantees of deterministic PALM on nonconvex, nonsmooth problems.

5. Algorithmic Enhancements and Variants

Inertial and Acceleration Schemes: Inertial PALM (iPALM) incorporates heavy-ball–style extrapolation to speed convergence (Pock et al., 2017). Stochastic inertial PALM (iSPALM) merges inertia with variance-reduced estimation, showing improved wall-clock and epoch efficiency on high-dimensional mixtures and learning settings (Hertrich et al., 2020).

Asynchronous and Parallel Frameworks: SAPALM generalizes stochastic block-coordinate PALM to asynchronous parallel execution with stale (delayed) reads and writes, achieving near-linear speedup with $P=O(\sqrt{m})$ workers for $m$ blocks (Davis et al., 2016). Atomic block updates are performed on possibly-outdated parameter vectors, preserving theoretical convergence rates under mild additional assumptions.

6. Implementation and Scaling Considerations

When deploying PALM-type algorithms:

Step-sizes must reflect (global or block-wise) Lipschitz constants of the smooth coupling term's partial gradients for convergence. Stochastic variants must adapt step sizes to variance and batch size.
Proximal operators for regularizers $g_i$ must be efficiently computable; in practice, sparsity and cardinality constraints yield closed-form hard-thresholding or $l_0$ projection.
Resource scaling: In stochastic variants, per-iteration cost is $O(b)$ for batch size $b$ , with optimal $b$ analytically established for SAGA/SARAH schemes.
In asynchronous settings, delays must be controlled ( $\tau = O(\sqrt{m})$ ) to guarantee linear speedup.
The approach is suited to large-scale nonconvex signal and image recovery, dictionary learning, and high-volume data analytics, where variance reduction amortizes over the cost of a single gradient evaluation.

7. Comparison and Limitations

PALM Variant	Per-Iter Cost	Convergence Rate	Scalability / Comments
Deterministic	$O(n)$	$O(1/K)$ (sublinear); linear if KL-$1/2$	Expensive for large $n$
SPRING-SAGA	$O(b)$	$O(n^{2/3}L/\epsilon^2)$	Optimal with mini-batch variance reduction
SPRING-SARAH	$O(b)$	$O(\sqrt{n}L/\epsilon^2)$	Best oracle complexity for finite sums
iPALM/iSPALM	$O(n)$ / $O(b)$	Empirically faster; theory for linear under EB	Adds inertia/momentum
SAPALM	$O(1)$ /core	$O(1/T)$ in expectation	Near-linear multicore speedup

Deterministic PALM is preferred when full gradients are tractable. SPRING and its inertial extensions are established choices for large-scale or distributed settings, with provably optimal complexity for finite-sum, nonconvex, nonsmooth objectives. The primary limitations are sensitivity of deterministic PALM to data size and of stochastic PALM to batch size/variance tuning. Inertial and asynchronous versions require precise parameter management and infrastructure support to realize theoretical rates.

PALM and its modern extensions occupy a central role in scalable first-order optimization for data science, signal processing, and large-scale inverse problems. Their theoretical guarantees are underpinned by the Kurdyka–Łojasiewicz property, and their flexibility is exemplified by a range of practical enhancements and broad empirical validation (Driggs et al., 2020).