Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 148 tok/s
Gemini 2.5 Pro 48 tok/s Pro
GPT-5 Medium 34 tok/s Pro
GPT-5 High 40 tok/s Pro
GPT-4o 101 tok/s Pro
Kimi K2 183 tok/s Pro
GPT OSS 120B 443 tok/s Pro
Claude Sonnet 4.5 35 tok/s Pro
2000 character limit reached

PALM Algorithm for Nonconvex Optimization

Updated 11 November 2025
  • PALM is a block-coordinate, first-order method for nonconvex, nonsmooth problems that utilizes proximal operators for efficient block updates.
  • The algorithm achieves global convergence under the Kurdyka–Łojasiewicz property, ensuring sublinear or linear rates depending on error-bound conditions.
  • Extensions like SPRING incorporate variance-reduced stochastic estimators and asynchronous updates to scale the method for applications in signal processing and imaging.

Proximal Alternating Linearized Minimization (PALM) is a fundamental class of block-coordinate, first-order algorithms for structured, nonsmooth, possibly nonconvex optimization problems. It is designed to handle objectives that decompose into a sum of a smooth (potentially nonconvex) coupling function and several potentially nonsmooth, prox-computable block-separable terms. PALM and its extensions (including stochastic, asynchronous, and inertial enhancements) have established convergence guarantees, oracle complexity bounds, and broad applicability in signal processing, machine learning, imaging, and control.

1. Canonical Problem Structure and PALM Updates

PALM is formulated for minimization problems in block variables (x1,,xm)(x_1, \ldots, x_m): minx1,,xmF(x1,,xm)=f(x1,,xm)+i=1mgi(xi)\min_{x_1, \ldots, x_m} F(x_1,\ldots,x_m) = f(x_1,\ldots,x_m) + \sum_{i=1}^m g_i(x_i) where:

  • f:RdRf:\mathbb{R}^d\to \mathbb{R} is continuously differentiable (not necessarily convex).
  • Each gi:RdiR{+}g_i:\mathbb{R}^{d_i}\to \mathbb{R}\cup\{+\infty\} is proper, lower-semicontinuous, and possibly nonconvex, but has a proximal operator computable in closed form or efficiently.

Block-wise proximal updates (classical PALM), for step sizes {αi}\{\alpha_i\} satisfying 0<αi<1/Li0<\alpha_i<1/L_i (with LiL_i the block-wise Lipschitz constants of xif\nabla_{x_i} f), are: xik+1=proxαigi(xikαixif(x1k+1,,xi1k+1,xik,xi+1k,,xmk)),i=1,,mx_i^{k+1} = \text{prox}_{\alpha_i g_i} \left( x_i^k - \alpha_i \nabla_{x_i}f(x_1^{k+1},\ldots,x_{i-1}^{k+1},x_i^k,x_{i+1}^k,\ldots,x_m^k) \right),\quad i=1,\ldots,m with the proximal operator defined as: proxαg(v)=argminu{g(u)+12αuv2}\text{prox}_{\alpha g}(v) = \arg\min_u \left\{ g(u) + \frac{1}{2\alpha} \|u-v\|^2 \right\}

The method alternates block-wise, using most recent updates for preceding blocks and current values for subsequent ones ("Gauss-Seidel" ordering).

2. Theoretical Guarantees and Assumptions

PALM's convergence theory relies on three key structural assumptions:

  • (A1) Block-Lipschitz partial gradients: For all ii, xif\nabla_{x_i} f is LiL_i-Lipschitz in xix_i (other blocks fixed).
  • (A2) Prox-computability: Each gig_i is proper, lower-semicontinuous, bounded below, and admits a cheap proximal operator.
  • (A3) Objective bounded below: FF is bounded from below.

Global convergence is achieved under the Kurdyka–Łojasiewicz (KL) property: FF satisfies the KL inequality at all critical points, which provides a desingularizing function φ\varphi relating objective gaps and generalized gradient mapping norms.

Conclusions under these assumptions:

  • The sequence {xk}\{x^k\} has finite length: kxk+1xk<\sum_k \|x^{k+1}-x^k\|<\infty.
  • {xk}\{x^k\} converges to a critical point xx^* of FF.
  • The generalized gradient mapping GG at a random iterate α{1,,K}\alpha \in \{1, \ldots, K\} achieves E[G(xα)2]=O(1/K)E[\|G(x^\alpha)\|^2]=O(1/K).
  • When FF has an error-bound property (KL exponent θ=1/2\theta=1/2), linear convergence is achieved:

F(xk)F(1c)k[F(x0)F],c(0,1)F(x^k) - F^* \leq (1-c)^k [F(x^0)-F^*],\quad c\in (0,1)

Otherwise, a sublinear rate is achieved.

3. Stochastic and Variance-Reduced Extensions: SPRING

For large-scale problems where ff is a finite sum f(x)=1ni=1nFi(x)f(x)=\frac{1}{n} \sum_{i=1}^n F_i(x), evaluating full gradients is expensive. The SPRING algorithm introduces variance-reduced stochastic gradient estimators into the PALM framework (Driggs et al., 2020).

SPRING iteration with mini-batch Bk{1,,n}B_k \subset \{1,\ldots,n\}:

  • Compute estimators (e.g., for block xx):
    • SGD: vxk=(1/b)jBkxFj(xk,yk)v_x^k = (1/b)\sum_{j\in B_k} \nabla_x F_j(x^k, y^k)
    • SAGA: vxk=(1/b)jBk[xFj(xk,yk)gk,j]+(1/n)i=1ngk,iv_x^k = (1/b) \sum_{j\in B_k} [\nabla_x F_j(x^k,y^k) - g_{k,j}] + (1/n) \sum_{i=1}^n g_{k,i}, with gk,ig_{k,i} storing last gradients.
    • SARAH: vxk=vxk1+(1/b)jBk[xFj(xk,yk)xFj(xk1,yk1)]v_x^k=v_x^{k-1} + (1/b) \sum_{j\in B_k} [\nabla_x F_j(x^k, y^k) - \nabla_x F_j(x^{k-1}, y^{k-1})] (else full gradient refresh with probability $1/p$).
  • Update blocks with stochastic estimators replacing full gradients in classical PALM:

xk+1=proxγxJ(xkγxvxk),yk+1=proxγyR(ykγyvyk)x^{k+1} = \text{prox}_{\gamma_x J}(x^k - \gamma_x v_x^k),\quad y^{k+1} = \text{prox}_{\gamma_y R}(y^k - \gamma_y v_y^k)

Oracle Complexity Results

SPRING achieves state-of-the-art oracle complexities for nonconvex finite-sum problems:

  • SPRING-SAGA with b=n2/3b=n^{2/3}:
    • O(n2/3L/ϵ2)O(n^{2/3} L/\epsilon^2) SFO evaluations required for E[G(xα)2]ϵ2E[\|G(x^\alpha)\|^2]\le\epsilon^2.
    • Under error-bound: O((n+Ln2/3/μ)log(1/ϵ))O((n+L n^{2/3}/\mu)\cdot \log(1/\epsilon)).
  • SPRING-SARAH with pnp\approx n:
    • O(nL/ϵ2)O(\sqrt{n} L/\epsilon^2) SFO calls (matching the lower bound for nonconvex finite sums).
    • Under error-bound: O((n+Ln/μ)log(1/ϵ))O((n+L\sqrt{n}/\mu)\cdot \log(1/\epsilon)).

4. Empirical Performance and Practical Implementation

The PALM and SPRING families have been evaluated on large-scale imaging problems, including:

  • Sparse nonnegative matrix factorization (sNMF)
  • Sparse principal component analysis (sPCA)
  • Blind image deconvolution (e.g., Yale, ORL, Kodak datasets)

Summary of observed behavior:

  • Deterministic PALM converges reliably but has O(n)O(n) per-iteration cost.
  • Inertial variants (see also (Pock et al., 2017, Hertrich et al., 2020)) accelerate convergence but require full gradients.
  • Naive SPRING-SGD exhibits slow convergence due to gradient variance.
  • SPRING-SAGA and SPRING-SARAH exhibit low per-iteration complexity (O(b)O(b) with bnb\ll n) and fast variance decay, yielding superior objective decrease per unit work.
  • In realistic signal and image deconvolution tasks, SPRING-SARAH outperforms full-gradient PALM by an order of magnitude in oracle calls for equivalent accuracy.
  • The stochastic PALM framework retains global convergence guarantees of deterministic PALM on nonconvex, nonsmooth problems.

5. Algorithmic Enhancements and Variants

Inertial and Acceleration Schemes: Inertial PALM (iPALM) incorporates heavy-ball–style extrapolation to speed convergence (Pock et al., 2017). Stochastic inertial PALM (iSPALM) merges inertia with variance-reduced estimation, showing improved wall-clock and epoch efficiency on high-dimensional mixtures and learning settings (Hertrich et al., 2020).

Asynchronous and Parallel Frameworks: SAPALM generalizes stochastic block-coordinate PALM to asynchronous parallel execution with stale (delayed) reads and writes, achieving near-linear speedup with P=O(m)P=O(\sqrt{m}) workers for mm blocks (Davis et al., 2016). Atomic block updates are performed on possibly-outdated parameter vectors, preserving theoretical convergence rates under mild additional assumptions.

6. Implementation and Scaling Considerations

When deploying PALM-type algorithms:

  • Step-sizes must reflect (global or block-wise) Lipschitz constants of the smooth coupling term's partial gradients for convergence. Stochastic variants must adapt step sizes to variance and batch size.
  • Proximal operators for regularizers gig_i must be efficiently computable; in practice, sparsity and cardinality constraints yield closed-form hard-thresholding or l0l_0 projection.
  • Resource scaling: In stochastic variants, per-iteration cost is O(b)O(b) for batch size bb, with optimal bb analytically established for SAGA/SARAH schemes.
  • In asynchronous settings, delays must be controlled (τ=O(m)\tau = O(\sqrt{m})) to guarantee linear speedup.
  • The approach is suited to large-scale nonconvex signal and image recovery, dictionary learning, and high-volume data analytics, where variance reduction amortizes over the cost of a single gradient evaluation.

7. Comparison and Limitations

PALM Variant Per-Iter Cost Convergence Rate Scalability / Comments
Deterministic O(n)O(n) O(1/K)O(1/K) (sublinear); linear if KL-$1/2$ Expensive for large nn
SPRING-SAGA O(b)O(b) O(n2/3L/ϵ2)O(n^{2/3}L/\epsilon^2) Optimal with mini-batch variance reduction
SPRING-SARAH O(b)O(b) O(nL/ϵ2)O(\sqrt{n}L/\epsilon^2) Best oracle complexity for finite sums
iPALM/iSPALM O(n)O(n)/O(b)O(b) Empirically faster; theory for linear under EB Adds inertia/momentum
SAPALM O(1)O(1)/core O(1/T)O(1/T) in expectation Near-linear multicore speedup

Deterministic PALM is preferred when full gradients are tractable. SPRING and its inertial extensions are established choices for large-scale or distributed settings, with provably optimal complexity for finite-sum, nonconvex, nonsmooth objectives. The primary limitations are sensitivity of deterministic PALM to data size and of stochastic PALM to batch size/variance tuning. Inertial and asynchronous versions require precise parameter management and infrastructure support to realize theoretical rates.

PALM and its modern extensions occupy a central role in scalable first-order optimization for data science, signal processing, and large-scale inverse problems. Their theoretical guarantees are underpinned by the Kurdyka–Łojasiewicz property, and their flexibility is exemplified by a range of practical enhancements and broad empirical validation (Driggs et al., 2020).

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Proximal Alternating Linearized Minimization (PALM) Algorithm.