Randomized Subspace Correction Methods

Updated 4 July 2025

Randomized subspace correction methods are algorithms that speed up convex optimization by iteratively updating randomly chosen subspaces, unifying diverse approaches.
They offer flexibility through arbitrary, overlapping decompositions and inexact local solves, yielding provably faster convergence under suitable conditions.
These methods are applied in numerical PDEs, imaging, and machine learning, enhancing robustness and efficiency in large-scale computational problems.

Randomized subspace correction methods are a class of algorithms that accelerate the solution of convex optimization problems by performing iterative updates restricted to randomly chosen subspaces of the ambient variable space. In the abstract and general framework advanced by Jiang, Park, and Xu (2025), these methods unify and extend a broad family of algorithms, including block coordinate descent (BCD), domain decomposition, multigrid, and stochastic block-relaxation algorithms, by allowing for arbitrary space decompositions, inexact local solvers, and minimal assumptions on convexity and smoothness. Randomization in the choice of subspace improves robustness and, under suitable conditions, leads to provably faster convergence relative to deterministic (cyclic) analogues.

1. Abstract Framework and Methodology

The core abstraction is the solution of a convex optimization problem of the form

$\min_{v \in V} \left\{ E(v) := F(v) + G(v) \right\},$

where $V$ is a (possibly infinite-dimensional) reflexive Banach space, $F: V \to \mathbb{R}$ is convex and Gâteaux differentiable, and $G: V \to \overline{\mathbb{R}}$ is convex, proper, and lower semicontinuous. The variable space is decomposed as $V = \sum_{j=1}^J V_j$ , with each $V_j$ a (possibly overlapping) closed subspace.

At each step, a random index $j \in \{1,\ldots, J\}$ is drawn (usually uniformly), and a correction is computed by (exactly or approximately) minimizing a local model: $\min_{w_j \in V_j} E_j(w_j; v) := F_j(w_j; v) + G_j(w_j; v),$ where $F_j, G_j$ are local approximations to $F, G$ at the current iterate $v$ . The update is $v \mapsto v + w_j$ . This architecture generalizes additive Schwarz-type domain decomposition, block coordinate descent, and subspace-proximal splitting methods. Inexact local solves—such as an inexact Newton, proximal point, or even a single block gradient step—are permitted, which allows for efficient implementations even when local minimization is impractical.

2. Convergence Rate Theory

The convergence analysis encompasses both expectation and high probability results, with the flexibility to accommodate minimal regularity (i.e., general convexity and coercivity of $E$ ), as well as settings with strong convexity or sharpness (Hölder-type error bounds).

A generic descent lemma guarantees that each randomized subspace correction step yields, in expectation, a decrease in the energy functional proportional to a global improvement term and inverse to the number of subspaces: $\mathbb{E}[E(u^{(n+1)}) | u^{(n)}] \leq E(u^{(n)}) + \frac{\theta}{J} \Psi(u^{(n)}),$ where $\theta \in (0,1]$ represents the inexactness/stability of local corrections, and $\Psi$ embodies the global improvement accrued by local steps (typically a Bregman distance or norm square).

Sublinear convergence is achieved for general convex problems: $\mathbb{E}[E(u^{(n)})] - E(u) \leq \frac{C}{(n + C')^{\beta}},$ with $\beta = q-1$ stemming from problem-specific sharpness (e.g., $\beta=1$ yields $O(1/n)$ rate).

Linear convergence (geometric decay) emerges when the problem satisfies a sharpness or strong convexity assumption, specifically when an error bound of the form

$F(v) - F(u) \geq \mu \|v - u\|^p,$

for some $p>1$ , holds over relevant sets. Explicit expressions for the contraction factor involve the stable decomposition constant, sharpness modulus, and the number of subspaces.

The theory also rigorously treats the general setting of inexact (approximate) solvers, overlapping decompositions, and arbitrary Banach space norms, broadening classic coordinate descent and Schwarz-type analysis beyond Hilbert or Euclidean spaces.

3. Generalization and Flexibility of Space Decomposition

Unlike most classical frameworks, which are limited to nonoverlapping coordinate or block decompositions in Euclidean spaces, this framework extends to arbitrary—possibly overlapping, multilevel, or even infinite—subspace decompositions. This includes classical overlapping Schwarz methods (with physical subdomains overlapping on grids), block relaxation in imaging, or function/batch decompositions in machine learning.

Local solvers can be exact (e.g., full minimization in $V_j$ ), but are often chosen to be approximate for practical reasons. The analysis allows for a systematic trade-off between accuracy of local steps and convergence, with explicit stability constants quantifying the impact of inexactness.

The only essential technical requirement is a stable decomposition property: for any $v$ , there exists a decomposition $v = \sum_j v_j$ , $v_j \in V_j$ , such that the sum of local Bregman distances or norms does not exceed a global multiple of the total. The theory allows for nonorthogonal, overlapping, and even dynamically chosen decompositions.

4. Applications Across Scientific Computing and Data Science

Randomized subspace correction methods are applicable in a range of areas:

Numerical PDEs and Domain Decomposition: The framework accommodates both classical and nonlinear PDEs. For nonlinear or nonsmooth models (e.g., $p$ -Laplacian, total variation minimization), it supports both exact and surrogate local solves and offers convergence in cases where traditional Schwarz theory is inapplicable.
Multigrid Methods: Multilevel decompositions (e.g., for elliptic problems) are included, with randomization shown to often lead to improved theoretical and empirical performance.
Imaging Inverse Problems: TV minimization and related imaging models can be addressed with overlapping decompositions and randomized updating, a regime where classic BCD theory is inadequate.
Data Science and Machine Learning: Mini-batch, block, or coordinate stochastic optimization maps onto RSC by associating batches or features to subspaces. Stochastic dual/Primal-Dual splitting methods also fit this template, with the theory justifying randomization in operator splitting strategies.
Variational Inequalities and Constraints: The theory extends to primal-dual update schemes commonly used for functional or pointwise constraints in imaging or PDE problems.

5. Comparison with Classical and Cyclic Approaches

A distinguishing aspect is the support for general decompositions and inexactness, in contrast to much prior work on coordinate descent, which is typically limited to nonoverlapping decompositions and strongly convex, Lipschitz-smooth objectives. Classical Schwarz domain decomposition and multigrid methods often require full smoothness and deterministic, cyclic sweeps, leading to complex or fragile analyses.

Randomization has been shown, both in this and prior work, to yield provably better convergence rates than deterministic cyclic schemes for broad classes of decompositions, often tightening to optimality in the limit. For instance, in certain settings, randomization eliminates harmonic sum slowdowns, yielding an $\mathcal{O}(n^2)$ improvement over cyclic block updates. Furthermore, the abstract convergence rates in this framework remain valid under overlapping, arbitrary inexact, and Banach-space generalizations, enhancing robustness and applicability.

The analysis for cyclic (deterministic order) schemes is less sharp in this generality and is identified as a direction for future work.

6. Practical Implications and Algorithmic Design

The framework postulated enables the systematic design and analysis of randomized Schwarz-type, domain decomposition, block coordinate, and multigrid algorithms across applications. By allowing arbitrary decompositions and local solves, practitioners can tailor the method to the geometry or physics of their problem; for instance, designing overlapping or multilevel subspaces to match coarse and fine features in PDEs or adapting subspace choice to data locality in distributed learning contexts.

The explicit treatment of inexact local solvers means that computational cost can be tightly managed, trading off more frequent, cheap local updates for global convergence.

Randomized subspace correction methods also provide a foundation for the rigorous development of asynchronous or fully parallel solvers, as randomized selection naturally avoids worst-case dependencies and ensures robust expected progress.

7. Table: Feature Comparison

Feature	Traditional BCD/Schwarz	Multigrid	Randomized Subspace Correction (this work)
Decomposition Type	Nonoverlap/block	Multilevel	Arbitrary/overlap/multilevel
Update Order	Cyclic/det.	Cyclic	Randomized
Local Solve	Usually exact	Level-wise	Exact or inexact/local
Smoothness Requirement	Smooth, strongly convex	Smooth	Arbitrary smoothness/convexity
Applicability	Euclidean finite-dim.	Hilbert	Banach/Hilbert/infinite-dim.

Randomized subspace correction methods, as formalized in this general framework, represent a robust, unifying approach to decompositional strategies in convex optimization, leading to algorithmic architectures that are both broadly applicable and theoretically sound across scientific computing, imaging, and data science problems.

PDF Markdown Chat (Pro)

Whiteboard

Generate a whiteboard explanation of this topic.

Follow Topic

Get notified by email when new papers are published related to Randomized Subspace Correction Methods.

Continue Learning

We haven't generated follow-up questions for this topic yet.

Generate Now