Block Coordinate Descent (BCD) Method

Updated 30 November 2025

Block Coordinate Descent (BCD) Method is a technique that partitions decision variables into blocks and iteratively optimizes them to solve composite optimization problems.
It employs surrogate functions and majorization–minimization within each block to allow efficient updates even in non-Euclidean and complex structured settings.
BCD methods guarantee convergence under regularity and geometric conditions, making them widely applicable in matrix factorization, sparse regression, and manifold optimization.

Block Coordinate Descent (BCD) Method is a central meta-heuristic in modern large-scale and structured optimization, particularly in settings involving composite constraints or objective decomposability. BCD refers to the class of algorithms that partition the decision variables into blocks and iteratively optimize each block (possibly inexactly) while holding other blocks fixed. Conceptually and practically, BCD regimes arise both as standalone algorithms and as core procedures within extensions of the Majorization–Minimization (MM) framework, low-rank matrix factorization, sparse regression, and manifold optimization.

1. Formal Definition and Geometric Scope

At its core, the BCD paradigm addresses composite optimization problems of the form

$\min_{X = (X_1,\ldots,X_N)\in \mathcal{M}} f(X)$

where $f : \mathcal{M} \to \mathbb{R}$ is a continuous or smooth cost function, and the feasible set $\mathcal{M}$ is a product set, often a Cartesian product $\mathcal{M}_1\times\cdots\times\mathcal{M}_N$ . Each block variable $X_n$ typically lives in a closed convex set or a more structured manifold. In a recent extension, BCD is generalized to non-Euclidean settings, where at least one of the block spaces is a geodesically convex subset of a Riemannian manifold (notably, the Grassmann manifold) (Lopez et al., 2024).

The prototypical BCD framework proceeds by cyclically or greedily updating one block at a time:

Fix all blocks except block $n$ , and solve (possibly inexactly) the subproblem

$X_n^{k+1} = \arg\min_{X_n \in \mathcal{M}_n} f(X_1^{k+1},\ldots,X_{n-1}^{k+1}, X_n, X_{n+1}^k, \ldots, X_N^k)$

before proceeding to the next block. In practice, each subproblem might itself require specialized methods, particularly for non-Euclidean blocks.

2. Block Majorization–Minimization: Algorithmic Synthesis and Surrogates

A powerful synthesis is obtained by integrating BCD within the MM framework, yielding Block MM (sometimes termed Block Coordinate MM or block coordinate surrogate minimization). The MM framework requires, for each block, the construction of a surrogate function that majorizes the original cost in that block (Lopez et al., 2024). At iteration $i$ :

Given current state $X^{(i)} = (X_1^{(i)},\ldots,X_N^{(i)})$ $X^{(i)} = (X_{1}^{(i)}, \dots, X_{N}^{(i)})$ , and for each block $n$ $n$ , construct $g_n(X_n \mid X^{(i)})$ $g_{n} (X_{n} ∣ X^{(i)})$ such that
- $g_n(X_n^{(i)} \mid X^{(i)}) = f(X^{(i)})$ ,
- $g_n(\cdot \mid X^{(i)}) \geq f(\ldots, X_n, \ldots)$ (block-wise majorization),
- first-order derivatives match in the block at $X^{(i)}$ ,
- joint continuity, and (geo-)quasi-convexity if in a non-Euclidean block.

The algorithm proceeds by alternating block updates:

$X_n^{(i+1)} = \arg\min_{X_n \in \mathcal{M}_n} g_n(X_n \mid X^{(i)})$

This specialization enables tractable per-block surrogates in challenging nonconvex and structured problems, e.g., Riemannian manifolds. The block MM approach is central in practice because, even when the global surrogate is difficult or impossible to minimize efficiently, the block surrogates (especially in the presence of structure) often admit closed-form or efficient updates.

3. Convergence Theory: Regularity, Geodesic Structure, and Block-wise Minimization

The convergence properties of BCD methods, and block MM in particular, are established under a set of regularity and geometric conditions (Lopez et al., 2024):

Regularity of f: $f$ is regular if stationarity with respect to all feasible and blockwise directions implies stationarity globally.
Surrogate conditions: Surrogates must be jointly continuous, match derivatives, and (geo-)quasi-convex along shortest geodesics in manifold blocks.
Unique surrogates/subproblem solutions: At each block step, the surrogate admits a unique minimizer.
Compactness: The sublevel set $S = \{ X \in \mathcal{M} : f(X) \leq f(X^{(0)}) \}$ is compact.

Under these conditions:

The sequence $\{ X^{(i)} \}$ is deterministic, cost function values are non-increasing,
Limit points exist; in fact, if all surrogates/subproblems are strictly convex or geodesically quasi-convex, the sequence converges to a unique stationary point,
The proof leverages monotonic decrease and block-wise "sandwiching" via surrogate quasi-convexity to control the distance in each block (Lopez et al., 2024).

If the objective is differentiable and the feasible set is classical (i.e., all blocks Euclidean and convex), block MM reduces to the classical BCD framework with standard convergence guarantees as in Razaviyayn–Hong–Luo (Lopez et al., 2024).

4. Block Structure in Riemannian and Nonconvex Optimization

An important extension is to settings where blocks are non-Euclidean, most notably:

Grassmann manifolds $Gr(N,D)$ : Optimization over subspaces, where each point is an equivalence class of orthonormal matrices,
Geodesically convex subsets: For any two feasible points, the unique shortest geodesic between them lies entirely within the variable block's feasible set,
Geo-quasi-convex surrogates and regularity: Surrogates are required to be geodesically quasi-convex, which ensures that update sequences, while possibly traversing highly nonconvex ambient spaces, do not stray from essential optimality structure (Lopez et al., 2024).

The algorithmic structure is preserved: local minimization within geodesically convex blocks followed by classical convex minimization in Euclidean blocks.

5. Algorithmic Workflow and Typical Use Cases

A canonical workflow for block MM/BCD in manifold/product domains (Lopez et al., 2024):

Initialization at feasible point $X^{(0)} \in \mathcal{M}$ .
Alternating block updates:
- For each block (e.g., subspace G and coefficient c), construct tight surrogates respecting tangent space structure and majorization.
- Update $G$ via geodesically constrained minimization.
- Update $c$ via convex minimization in ℝ^M.
Convergence check based on cost decrease and blockwise stationarity.
Termination when iterates, or cost values, stabilize.

Applications include, but are not limited to:

Multi-block matrix factorization,
Low-rank regression with block constraints,
Signal processing on manifolds,
Sparse coding or dictionary learning with geometric structure.

6. Relation to Broader MM, BCD, and Splitting Schemes

Block coordinate descent intersects and generalizes multiple optimization paradigms:

Classical BCD: Updates decouple into blockwise minimizations, often only requiring feasibility and continuity; global convergence demands convexity or regularity.
Block MM: Each blockwise update builds in the flexibility of surrogate-based optimization, crucial when closed-form solutions do not exist or per-block geometry is complex.
Alternating minimization (AM): Special case where surrogates are identities, and each block subproblem is the exact restriction of $f$ .
Splitting methods: ADMM and similar algorithms can be viewed as blockwise update schemes with dual variable updates, especially when embedded as subroutines in MM frameworks.

The block MM framework provides a unifying structure for global convergence by ensuring monotonicity and leveraging both block partitioning and surrogate tightness to control complex constraints (including nonconvexity and manifold structure) while providing explicit convergence guarantees under geometric regularity (Lopez et al., 2024).

References:

On the convergence of Block Majorization-Minimization algorithms on the Grassmann Manifold (Lopez et al., 2024)

Markdown Upgrade to Chat

References (1)

On the convergence of Block Majorization-Minimization algorithms on the Grassmann Manifold (2024)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Block Coordinate Descent (BCD) Method.

Block Coordinate Descent (BCD) Method

1. Formal Definition and Geometric Scope

2. Block Majorization–Minimization: Algorithmic Synthesis and Surrogates

3. Convergence Theory: Regularity, Geodesic Structure, and Block-wise Minimization

4. Block Structure in Riemannian and Nonconvex Optimization

5. Algorithmic Workflow and Typical Use Cases

6. Relation to Broader MM, BCD, and Splitting Schemes

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research

Block Coordinate Descent (BCD) Method

1. Formal Definition and Geometric Scope

2. Block Majorization–Minimization: Algorithmic Synthesis and Surrogates

3. Convergence Theory: Regularity, Geodesic Structure, and Block-wise Minimization

4. Block Structure in Riemannian and Nonconvex Optimization

5. Algorithmic Workflow and Typical Use Cases

6. Relation to Broader MM, BCD, and Splitting Schemes

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research