Block-Coordinate MM Optimization
- Block-Coordinate MM is an optimization framework that decomposes high-dimensional problems into variable blocks while employing surrogate functions to ensure global descent.
- It unifies methods like coordinate descent, expectation-maximization, and proximal gradient, thereby enhancing scalability and convergence.
- Applications span statistical learning, signal processing, and matrix/tensor factorization, demonstrating its practical and theoretical significance.
Block-Coordinate Majorization–Minimization (Block-Coordinate MM) is a principled optimization framework for decomposing high-dimensional, often nonconvex or nonsmooth problems into subproblems along blocks of variables. Each block is updated by minimizing a majorizing surrogate function, ensuring global descent of the original objective via blockwise upper bounds. This approach unites and generalizes classical coordinate descent, expectation-maximization, proximal gradient, and alternating minimization strategies, and extends efficacy to settings with manifold constraints, overlapping variables, and large-scale structure. Block-Coordinate MM methods are foundational in statistical learning, signal processing, matrix/tensor factorization, and large-scale distributed optimization.
1. Core Principles and Algorithmic Structure
Block-Coordinate MM iteratively partitions the optimization variable into blocks, constructing for each block a surrogate (or ) satisfying:
- Majorization: for all .
- Tangency: .
- First-order matching: The directional derivatives of the surrogate and are aligned at the current iterate.
At each iteration, block(s) are selected—cyclically, in parallel, or randomly—and updated via
with for 0 (Hong et al., 2015, Li et al., 2023). This update guarantees 1.
Block surrogates can be exact restrictions, local linearizations, quadratic (proximal) upper bounds, or structured composite approximations. The block MM scheme yields embedded descent and decomposability by leveraging problem structure—convexity, separability, or blockwise manifold constraints.
2. Majorization Properties and Surrogate Constructions
The surrogate construction is central to both theoretical guarantees and practical efficiency. Required surrogate properties (see (Hong et al., 2013, Hong et al., 2015)) include:
- Tangency (Tightness): 2.
- Majorization (Upper-Bound): 3 for all 4.
- First-order consistency: 5.
- (Optional) Strong convexity and Lipschitz continuity for quantitative convergence analysis.
Common surrogates include:
- Exact blockwise minimization: 6.
- Quadratic/proximal bounds: 7.
- Composite surrogates: For 8 with 9 smooth, 0 (Hong et al., 2015, Hong et al., 2013).
Manifold-constrained problems require that surrogates respect tangent space structure and retraction maps, as developed for Grassmann and Stiefel blocks (Lopez et al., 2024, Li et al., 2023, Tian et al., 2019).
3. Convergence Analysis and Complexity
Block-Coordinate MM methods guarantee monotonic decrease of the objective. Under mild and explicit conditions—surrogate quasi-convexity, continuity, unique minimizers in all but one block, and boundedness of initial level sets—every limit point is stationary (Hong et al., 2013, Hong et al., 2015, Li et al., 2023). Convergence rates are summarized as follows:
| Setting | Rate | Reference |
|---|---|---|
| General convex, strongly convex surrogates | 1 | (Hong et al., 2015, Hong et al., 2013) |
| Block minimization (no strong convexity) | 2 | (Hong et al., 2013) |
| Two-block acceleration | 3 | (Hong et al., 2013) |
| Nonconvex (Euclidean, with proximal/trust region) | 4 | (Lyu et al., 2020, Li et al., 2023) |
| Riemannian manifold blocks | 5 | (Li et al., 2023) |
The method in (Lyu et al., 2020) introduces a trust-region variant (BMM-DR), improving complexity for "flat" surrogates and weak convexity by controlling the step's radius.
For block-separable convex programs with linear constraints, specialized block-coordinate MM strategies such as the Parallel Direction Method of Multipliers (PDMM) exhibit global 6 convergence in ergodic averages under minimal assumptions, even allowing randomized and overlapping block updates (Wang et al., 2014).
4. Generalizations: Riemannian and Structured Settings
Block MM extends naturally to variables living on Riemannian manifolds (e.g., Grassmann or Stiefel), dropping the requirement for Euclidean convexity (Lopez et al., 2024, Li et al., 2023, Tian et al., 2019). In these settings:
- The block update projects onto manifold subspaces (using exponential or retraction maps).
- Monotonic descent persists; under geodesic 7-smoothness, compact feasibility, and suitable surrogates, convergence to stationary points is guaranteed.
- The blockwise stationarity condition is formulated in terms of Riemannian gradients in the corresponding product tangent spaces.
Block MM has been successfully applied to Burer–Monteiro formulations of large-scale semidefinite programs, robust PCA on fixed-rank manifolds, and tensor dictionary learning, with rigorous complexity guarantees. The algorithm in (Tian et al., 2019) achieves global convergence to first-order critical points for block minimization of Stiefel-constrained variables in SDPs.
5. Representative Applications
Block-Coordinate MM encompasses a broad spectrum of classic and modern algorithms:
- Expectation-Maximization (EM): Single-block MM via Jensen-based surrogates for the log-likelihood, leading to alternate E- and M-steps (Hong et al., 2015, Lyu et al., 2020).
- Convex-Concave Procedure (CCCP): Decomposing a nonconvex objective as a difference of convex functions, updating via blockwise linearization (Hong et al., 2015).
- Tensor CP Decomposition: Proximal alternating least squares as block MM often with superior convergence to naive ALS (Hong et al., 2015).
- Nonnegative Matrix/Tensor Factorization: Lee–Seung multiplicative update algorithms are block MM with structured surrogates and possible proximal stabilization (Lyu et al., 2020).
- Overlapping Group Lasso and Composite Penalized Models: Consensus constraints are accommodated via block copies; block MM is applied to alternating minimization and splitting methods (Wang et al., 2014).
- Robust PCA: Low-rank manifold block MM yields provable stationary convergence (Li et al., 2023).
- Large-scale SDPs: BCM on product Stiefel manifolds provides scalable and provably convergent alternatives to interior-point methods (Tian et al., 2019).
6. Block-Selection Rules, Parallelism, and Practical Issues
Block MM admits flexibility in block-selection:
- Cyclic, essentially cyclic, or randomized block update schedules (Hong et al., 2015, Hong et al., 2013).
- Parallel and asynchronous updates of multiple blocks are allowed, enhancing scalability (Wang et al., 2014).
- Non-uniform stochastic block selection ("Lipschitz sampling") accelerates convergence proportional to blockwise curvature (Lee et al., 2018).
- Inexact block minimization and variable-metric (e.g., quasi-Newton) subproblem solutions retain convergence under mild conditions (Lee et al., 2018).
Practical stabilization mechanisms include step-size tuning, proximal and trust-region regularization, and overlap-aware block updating. Empirical evidence demonstrates block MM's reliability and scalability, especially when communication or computation per block is a bottleneck (Wang et al., 2014, Lyu et al., 2020).
7. Connections to Related Frameworks and Limitations
Block MM subsumes:
- Block Coordinate Descent (BCD), Block Proximal Gradient (BCPG), and Proximal Alternating Linearized Minimization (PALM) as special cases (Hong et al., 2015, Hong et al., 2013).
- The Parallel Direction Method of Multipliers (PDMM) as a block-coordinate MM extension of ADMM, supporting randomized and overlapping blocks, with 8 ergodic rates (Wang et al., 2014).
- Gauss–Seidel block Alternating Direction Methods (GSADMM) under suitable surrogates (Wang et al., 2014).
Limitations occur when surrogates are not tight, convergence rates are sublinear unless strong convexity or error bounds hold, and acceleration beyond 9 is generally possible only in special two-block settings or under additional regularity (Hong et al., 2013). For nonconvex settings, block MM converges to stationary points rather than global optima, and explicit complexity rates may require stronger smoothness or regularity (Lyu et al., 2020, Li et al., 2023).
Block-Coordinate MM provides a unifying, rigorous, and highly adaptable framework for large-scale, structured, and constrained optimization. Its rich theory is matched by significant empirical and algorithmic breadth across statistical, machine learning, and mathematical optimization domains (Hong et al., 2015, Lyu et al., 2020, Li et al., 2023).