Papers
Topics
Authors
Recent
Search
2000 character limit reached

Block-Coordinate MM Optimization

Updated 12 June 2026
  • Block-Coordinate MM is an optimization framework that decomposes high-dimensional problems into variable blocks while employing surrogate functions to ensure global descent.
  • It unifies methods like coordinate descent, expectation-maximization, and proximal gradient, thereby enhancing scalability and convergence.
  • Applications span statistical learning, signal processing, and matrix/tensor factorization, demonstrating its practical and theoretical significance.

Block-Coordinate Majorization–Minimization (Block-Coordinate MM) is a principled optimization framework for decomposing high-dimensional, often nonconvex or nonsmooth problems into subproblems along blocks of variables. Each block is updated by minimizing a majorizing surrogate function, ensuring global descent of the original objective via blockwise upper bounds. This approach unites and generalizes classical coordinate descent, expectation-maximization, proximal gradient, and alternating minimization strategies, and extends efficacy to settings with manifold constraints, overlapping variables, and large-scale structure. Block-Coordinate MM methods are foundational in statistical learning, signal processing, matrix/tensor factorization, and large-scale distributed optimization.

1. Core Principles and Algorithmic Structure

Block-Coordinate MM iteratively partitions the optimization variable x=(x1,,xB)x=(x_1,\ldots,x_B) into blocks, constructing for each block ii a surrogate Mi(xi;xi)M_i(x_i; x_{-i}) (or ui(xi;z)u_i(x_i;z)) satisfying:

  • Majorization: Mi(xi;xi)f(xi,xi)M_i(x_i; x_{-i}) \geq f(x_i, x_{-i}) for all xix_i.
  • Tangency: Mi(xik;xik)=f(xk)M_i(x^k_i; x_{-i}^k) = f(x^k).
  • First-order matching: The directional derivatives of the surrogate and ff are aligned at the current iterate.

At each iteration, block(s) are selected—cyclically, in parallel, or randomly—and updated via

xik+1=argminxiXiMi(xi;xik),x_i^{k+1} = \arg\min_{x_i \in \mathcal{X}_i} M_i(x_i; x_{-i}^k),

with xjk+1=xjkx_j^{k+1} = x_j^k for ii0 (Hong et al., 2015, Li et al., 2023). This update guarantees ii1.

Block surrogates can be exact restrictions, local linearizations, quadratic (proximal) upper bounds, or structured composite approximations. The block MM scheme yields embedded descent and decomposability by leveraging problem structure—convexity, separability, or blockwise manifold constraints.

2. Majorization Properties and Surrogate Constructions

The surrogate construction is central to both theoretical guarantees and practical efficiency. Required surrogate properties (see (Hong et al., 2013, Hong et al., 2015)) include:

  • Tangency (Tightness): ii2.
  • Majorization (Upper-Bound): ii3 for all ii4.
  • First-order consistency: ii5.
  • (Optional) Strong convexity and Lipschitz continuity for quantitative convergence analysis.

Common surrogates include:

  • Exact blockwise minimization: ii6.
  • Quadratic/proximal bounds: ii7.
  • Composite surrogates: For ii8 with ii9 smooth, Mi(xi;xi)M_i(x_i; x_{-i})0 (Hong et al., 2015, Hong et al., 2013).

Manifold-constrained problems require that surrogates respect tangent space structure and retraction maps, as developed for Grassmann and Stiefel blocks (Lopez et al., 2024, Li et al., 2023, Tian et al., 2019).

3. Convergence Analysis and Complexity

Block-Coordinate MM methods guarantee monotonic decrease of the objective. Under mild and explicit conditions—surrogate quasi-convexity, continuity, unique minimizers in all but one block, and boundedness of initial level sets—every limit point is stationary (Hong et al., 2013, Hong et al., 2015, Li et al., 2023). Convergence rates are summarized as follows:

Setting Rate Reference
General convex, strongly convex surrogates Mi(xi;xi)M_i(x_i; x_{-i})1 (Hong et al., 2015, Hong et al., 2013)
Block minimization (no strong convexity) Mi(xi;xi)M_i(x_i; x_{-i})2 (Hong et al., 2013)
Two-block acceleration Mi(xi;xi)M_i(x_i; x_{-i})3 (Hong et al., 2013)
Nonconvex (Euclidean, with proximal/trust region) Mi(xi;xi)M_i(x_i; x_{-i})4 (Lyu et al., 2020, Li et al., 2023)
Riemannian manifold blocks Mi(xi;xi)M_i(x_i; x_{-i})5 (Li et al., 2023)

The method in (Lyu et al., 2020) introduces a trust-region variant (BMM-DR), improving complexity for "flat" surrogates and weak convexity by controlling the step's radius.

For block-separable convex programs with linear constraints, specialized block-coordinate MM strategies such as the Parallel Direction Method of Multipliers (PDMM) exhibit global Mi(xi;xi)M_i(x_i; x_{-i})6 convergence in ergodic averages under minimal assumptions, even allowing randomized and overlapping block updates (Wang et al., 2014).

4. Generalizations: Riemannian and Structured Settings

Block MM extends naturally to variables living on Riemannian manifolds (e.g., Grassmann or Stiefel), dropping the requirement for Euclidean convexity (Lopez et al., 2024, Li et al., 2023, Tian et al., 2019). In these settings:

  • The block update projects onto manifold subspaces (using exponential or retraction maps).
  • Monotonic descent persists; under geodesic Mi(xi;xi)M_i(x_i; x_{-i})7-smoothness, compact feasibility, and suitable surrogates, convergence to stationary points is guaranteed.
  • The blockwise stationarity condition is formulated in terms of Riemannian gradients in the corresponding product tangent spaces.

Block MM has been successfully applied to Burer–Monteiro formulations of large-scale semidefinite programs, robust PCA on fixed-rank manifolds, and tensor dictionary learning, with rigorous complexity guarantees. The algorithm in (Tian et al., 2019) achieves global convergence to first-order critical points for block minimization of Stiefel-constrained variables in SDPs.

5. Representative Applications

Block-Coordinate MM encompasses a broad spectrum of classic and modern algorithms:

6. Block-Selection Rules, Parallelism, and Practical Issues

Block MM admits flexibility in block-selection:

  • Cyclic, essentially cyclic, or randomized block update schedules (Hong et al., 2015, Hong et al., 2013).
  • Parallel and asynchronous updates of multiple blocks are allowed, enhancing scalability (Wang et al., 2014).
  • Non-uniform stochastic block selection ("Lipschitz sampling") accelerates convergence proportional to blockwise curvature (Lee et al., 2018).
  • Inexact block minimization and variable-metric (e.g., quasi-Newton) subproblem solutions retain convergence under mild conditions (Lee et al., 2018).

Practical stabilization mechanisms include step-size tuning, proximal and trust-region regularization, and overlap-aware block updating. Empirical evidence demonstrates block MM's reliability and scalability, especially when communication or computation per block is a bottleneck (Wang et al., 2014, Lyu et al., 2020).

Block MM subsumes:

Limitations occur when surrogates are not tight, convergence rates are sublinear unless strong convexity or error bounds hold, and acceleration beyond Mi(xi;xi)M_i(x_i; x_{-i})9 is generally possible only in special two-block settings or under additional regularity (Hong et al., 2013). For nonconvex settings, block MM converges to stationary points rather than global optima, and explicit complexity rates may require stronger smoothness or regularity (Lyu et al., 2020, Li et al., 2023).

Block-Coordinate MM provides a unifying, rigorous, and highly adaptable framework for large-scale, structured, and constrained optimization. Its rich theory is matched by significant empirical and algorithmic breadth across statistical, machine learning, and mathematical optimization domains (Hong et al., 2015, Lyu et al., 2020, Li et al., 2023).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Block-Coordinate MM.