Proximal Alternating Minimization (PAM) Overview
- PAM is a block-coordinate descent method that partitions variables into blocks and updates each with a proximal term to handle structured optimization challenges.
- Its methodology combines exact, linearized, or surrogate minimization with proximal regularization, ensuring convergence even for nonconvex or nonsmooth problems.
- PAM underpins diverse applications in signal processing, machine learning, and image processing, offering flexibility in algorithm design and practical acceleration.
Proximal Alternating Minimization (PAM) is a foundational class of block-coordinate descent algorithms designed for structured convex, nonconvex, and composite optimization problems. In PAM, variables are partitioned into blocks, and each block is updated via an exact or regularized minimization that combines the original objective with a proximal term, cycling through all blocks in succession. PAM and its linearized, accelerated, and adaptive variants underpin a wide array of algorithms in signal processing, machine learning, statistical estimation, and numerical analysis, with rigorous convergence guarantees under explicit regularity and smoothness hypotheses.
1. Problem Class and Formulation
The canonical form addressed by PAM is
where is (typically) jointly smooth or continuously differentiable in all blocks, while each block term can be nonsmooth (e.g., indicator, norm, or regularizer) (Byrne et al., 2015). In the constrained convex case, general affine or set-like constraints are encoded via penalization and projection as in the quadratic penalty/augmented Lagrangian framework (Tran-Dinh, 2017, Bitterlich et al., 2018).
A key generalization involves additional coupling constraints: where is convex, and both may be nonsmooth or non-strongly convex (Tran-Dinh, 2017).
2. Algorithmic Structure and Proximal Regularization
Each PAM iteration alternates block minimizations of the form
where is a (possibly weighted or Bregman) distance penalizing deviation from the last iterate (Byrne et al., 2015).
Proximal regularization () is essential for:
- Well-posedness of subproblems, especially under nonconvexity or lack of strong convexity.
- Facilitating analytic or closed-form solutions (e.g., soft-thresholding in -norm, spectral thresholding in nuclear norm) (Liu et al., 2020, Adam et al., 2023).
- Regularizing ill-conditioned subproblems or ill-posed inversions.
In multiblock settings (e.g., for tensor or matrix factorization), the classical Gauss–Seidel cycle is extended, with each block update augmented by a quadratic (or Bregman or other) proximity term, stabilizing convergence even under nonconvex combined interactions (Chen et al., 31 Dec 2025).
3. Theoretical Convergence and Acceleration
The convergence of PAM relies on descent conditions, blockwise optimality, and regularity properties such as the SUMMA or SUMMA2 inequalities, which ensure that the objective value forms a telescoping sequence converging to its infimum (Byrne et al., 2015). In convex settings with mere blockwise convexity, last-iterate convergence can be established under suitable three-point or "gap" inequalities.
Accelerated and linearized variants are important for both practical efficiency and sharper theoretical rates:
- Linearized PAM (PALM): Each block update uses first-order approximation of the coupling function, followed by a proximal step (Pock et al., 2017, Hu et al., 2022). Convergence to a critical point is shown under the Kurdyka–Łojasiewicz (KL) property, with rates determined by the KL exponent.
- Momentum/Inertial Extensions (iPALM): Heavy-Ball or Nesterov-type extrapolation is introduced, showing improved empirical rates and practical acceleration at the expense of nonmonotone evolution in the objective (Pock et al., 2017).
- Adaptive Step Sizes: Linesearch-free strategies based on local curvature estimates enable large, dynamically adjusted steps while preserving convergence, illustrated by the adaptive AMA in (Latafat et al., 2023).
Under strong convexity or blockwise semi-strong convexity, optimal complexity bounds can be obtained. For example, the "Proximal Alternating Penalty Algorithm" (PAPA) achieves an non-ergodic rate for generic convex problems and when one block is strongly convex, with acceleration achieved without full gradient information or blockwise smoothness (Tran-Dinh, 2017).
4. Major Variants and Practical Algorithm Design
PAM is instantiated in several forms to handle nonconvexity, nonsmoothness, and constraints:
- Quadratic-Penalty/Alternating Minimization: Introduces smooth penalty terms for constraint violation, alternating blockwise minimization for the penalized problem (Tran-Dinh, 2017).
- Proximal Alternating Majorization-Minimization: Each subproblem solves a majorant surrogate, often a convexified upper bound, before a proximal step for improved tractability and convergence (Tao et al., 2024, Liu et al., 2020).
- Reweighted and Surrogate Schemes: For nonconvex or non-smooth block terms, majorization via tangent linearizations or concave surrogates leads to weighted proximal problems with closed-form solutions (e.g., weighted singular-value or shrinkage) (Liu et al., 2020, Adam et al., 2023).
- Subspace Correction and SVD Rotation: In low-rank and matrix/tensor factorization, additional SVD-based subspace corrections enable columnwise/separable block updates and enable global convergence even under rank or sparsity constraints (Tao et al., 2024, Chen et al., 31 Dec 2025).
Table: Selected PAM Variants and Features
| Variant | Block Update Type | Acceleration |
|---|---|---|
| Classic PAM | Exact minimization + proximal | None |
| PALM / linearized PAM | Prox-grad linearization | Blockwise stepsize |
| Inertial PALM (iPALM) | Linearized + momentum | Heavy Ball/Nesterov |
| Majorized PAM | Majorant surrogate | Majorization |
| Adaptive PAM / AMA | Adaptive stepsize, no linesearch | Dynamic penalty |
| Reweighted/Surrogate PAM | Convexification per block | Weighted thresholding |
| Tensor-based PAM | Multi-block tensor surrogates | Closed-form per block |
5. Applications and Empirical Performance
PAM and its extensions are ubiquitous in applications requiring efficient handling of structure, constraints, and nonconvexity.
- Matrix/Tensor Factorization and Completion: PARSuMi combines PAM with nonconvex rank and sparsity constraints for robust low-rank completion and corruption recovery, achieving empirical success in cases where convex relaxations fail (Wang et al., 2013). SVD-based subspace corrections are critical for global convergence under explicit low-rank structure (Tao et al., 2024).
- Image Processing and Deblurring: PAM algorithms with iteratively reweighted surrogates enable tractable minimization for nonconvex, non-smooth regularizers (e.g., -TV, $0
Adam et al., 2023). Analogous frameworks are employed in nonlocal low-rank denoising models (Liu et al., 2020).
- Convex Constrained Optimization: Proximal AMA generalizes the alternating minimization algorithm for two-block separable objectives with linear constraints, converting each update to a proximal operator evaluation and yielding global convergence to saddle points (Bitterlich et al., 2018).
- Learning and Regularized Estimation: Proximal alternating linearized methods provide globally convergent, structure-preserving algorithms for sparse or low-rank state-space and autoregressive model estimation under Lyapunov or cardinality constraints (Lin et al., 2016).
6. Limitations, Extensions, and Open Questions
While theoretical guarantees for PAM and its linearized/accelerated variants are well established in semi-algebraic, KL, or blockwise convex settings, several challenges and questions remain:
- Rates for nonconvex, non-smooth settings are typically asymptotic and, outside the KL regime, sublinear only for ergodic/averaged quantities (Byrne et al., 2015, Tao et al., 2024).
- Effective choices of proximity functions or surrogates for complicated regularizer blocks or highly coupled objective functions are mostly open. The optimal design of or surrogate majorants is application dependent and may have strong effects on practical performance (Byrne et al., 2015, Liu et al., 2020).
- Understanding global versus local minima and basin of attraction for highly nonconvex settings, especially with block non-separability or multi-modal loss surfaces, remains an active research direction (Byrne et al., 2015).
- Parallel, randomized, and Jacobi-style variants are under investigation for scaling to large blocks and distributed settings (Byrne et al., 2015).
- For infeasible or inexact subsolvers in constrained and large-scale settings, convergence can be salvaged via surrogate sequences absorbing nonmonotonicity in the objective, with explicit, implementable residual criteria and asymptotic rates derived via KL theory (Hu et al., 2022).
7. Connections to Other Algorithms and Theoretical Equivalence
PAM establishes a unifying framework encompassing:
- Proximal Minimization Algorithms (PMA) and Majorization-Minimization (MM): PAM, PMA, and MM are mathematically equivalent under the replacement of full minimization with blockwise, proximal, or surrogate-based updates (Byrne et al., 2015). Classical methods such as gradient descent, Landweber iteration, expectation-maximization (EM), and cross-entropy reconstruction can be derived as instances of PAM with specific proximities or surrogates.
- Augmented Lagrangian and ADMM: Quadratic-penalty formulations in PAM frameworks subsume augmented Lagrangian and alternating direction methods, but with explicit attention to the choice and adaptivity of penalty parameters and surrogate smoothness (Tran-Dinh, 2017).
- Learned and Unrolled Variants: Architectures such as LPAM-net “unroll” the algorithmic structure of PAM into trainable neural networks, preserving interpretable algorithmic steps, and inherit convergence and stationarity properties via their algorithm-mimetic design (Chen et al., 2024).
In conclusion, Proximal Alternating Minimization and its family of variants provide a rigorous, flexible, and widely applicable methodology for high-dimensional, structured, and potentially nonconvex optimization problems across numerous domains (Byrne et al., 2015, Tran-Dinh, 2017, Bitterlich et al., 2018, Adam et al., 2023, Liu et al., 2020, Chen et al., 2024, Tao et al., 2024, Chen et al., 31 Dec 2025, Hu et al., 2022, Latafat et al., 2023, Lin et al., 2016, Pock et al., 2017, Wang et al., 2013, Shen et al., 2017).