Doubly Stochastic Block Gauss-Seidel

Updated 5 February 2026

The paper introduces a randomized block update scheme that achieves provable linear convergence for both consistent and inconsistent systems.
It generalizes classical methods like Gauss-Seidel, Kaczmarz, and coordinate descent by sampling block pairs proportional to their Frobenius norms.
The method demonstrates significant speedups—ranging from 2× to 15×—in convergence on real-world and synthetic data, enhancing scalability and parallel execution.

The doubly stochastic block Gauss–Seidel (DSBGS) method is a class of randomized iterative algorithms designed for large-scale optimization and linear algebra, particularly for solving linear systems and structured convex/nonconvex optimization problems. DSBGS algorithms generalize the classical Gauss–Seidel, randomized Kaczmarz, and coordinate descent methods by introducing blockwise updates in both rows and columns, and employing a double-layer randomization scheme in their sampling process. These methods are characterized by provable linear convergence rates in mean-square or expectation, robust performance on consistent and inconsistent systems, and scalability due to efficient blockwise computations and parallelization-friendly design (Razaviyayn et al., 2018, &&&1&&&, Du et al., 2019, Xu et al., 2014).

1. Formulation and Algorithmic Scheme

DSBGS considers the linear system $A x = b$ with $A \in \mathbb{R}^{m \times n}$ , or more generally, composite objective functions with block-partitioned variables. The rows and columns of $A$ are partitioned respectively into block sets $\{\mathcal I_i\}$ and $\{\mathcal J_j\}$ , and the set of all block pairs is $\Omega = \{(\mathcal I_i, \mathcal J_j)\}$ . At each iteration, a block pair $(i, j)$ is sampled with probability $p_{i,j} \propto \|A_{\mathcal I_i,\mathcal J_j}\|_F^2$ .

Upon sampling $(i,j)$ , the DSBGS method forms a block-residual $r^{(i)} = A_{\mathcal I_i,:} x - b_{\mathcal I_i}$ and updates only the $j$ th variable block $x_{\mathcal J_j}$ :

$x_{\mathcal J_j}^{k+1} = x_{\mathcal J_j}^{k} - \alpha \frac{A_{\mathcal I_i,\mathcal J_j}^\top r^{(i)}}{\|A_{\mathcal I_i,\mathcal J_j}\|_F^2}$

All other blocks are unaltered. In the scalar-block case, this reduces to a doubly stochastic coordinate update:

$x_j^{k+1} = x_j^{k} - \alpha \frac{A_{ij}(A_{i,:} x^k - b_i)}{|A_{ij}|^2}$

where $(i,j)$ is sampled with probability $|A_{ij}|^2 / \|A\|_F^2$ (Du et al., 2019, Du et al., 2020, Razaviyayn et al., 2018).

For convex or nonconvex composite optimization, the doubly stochastic block Gauss–Seidel (DS-BSG) method iterates over blocks in Gauss–Seidel order (possibly shuffled), and within each block, uses stochastic gradient samples from a finite sum or expectation to compute a blockwise proximal-gradient step (Xu et al., 2014).

2. Theoretical Properties and Convergence Rates

DSBGS methods offer rigorous convergence guarantees for both consistent and inconsistent systems, as well as for convex and nonconvex objectives. The main results take the form:

Linear convergence in expectation: For full column-rank $A$ and an appropriate stepsize $\alpha$ , the expected error contracts exponentially:

$\mathbb{E}[\|x^k - x_\star\|^2] \leq \rho^k \|x^0 - x_\star\|^2$

where $\rho = 1 - \Theta(1/(N\kappa(A)^2))$ , $N$ is the number of variables, and $\kappa(A) = \|A\|_F \|A^\dagger\|_2$ (Razaviyayn et al., 2018, Du et al., 2019).

Convergence for inconsistent systems: For underdetermined or inconsistent $A$ , the iterates converge to the minimum-norm least-squares solution with a quantifiable “convergence horizon” (Du et al., 2020).
Convergence for convex and nonconvex programs: For block-stochastic gradient variants, under weak assumptions (Lipschitz continuity, bounded variance), the method achieves $O(1/\sqrt K)$ expected suboptimality (convex case) or expected first-order stationarity (nonconvex case) (Xu et al., 2014).

The contraction rate depends on block sizes, the Frobenius norms of the block matrices, spectral properties of $A$ , and the choice of stepsize. For block size selection and stepsize tuning, spectral information such as the minimum nonzero singular value of $A$ and the block-size parameter $t$ (number of column blocks) are key determinants (Du et al., 2019, Razaviyayn et al., 2018).

3. Connections to Classical Algorithms

DSBGS unifies and generalizes several fundamental randomized iterative methods via choice of block parameters:

Method	Row blocks $r$	Col blocks $c$	Block sizes	Sampling Probabilities
Landweber (GD)	1	1	all rows, all columns	1
Randomized Kaczmarz	$m$	1	singleton rows, all columns	$\\|A_{i,:}\\|^2 /\\|A\\|_F^2$
Randomized Gauss–Seidel	1	$n$	all rows, singleton columns	$\\|A_{:,j}\\|^2 /\\|A\\|_F^2$
Doubly Stochastic GS	$m$	$n$	singleton rows, singleton cols	$\|A_{ij}\|^2 /\\|A\\|_F^2$

This generality allows DSBGS to interpolate between stochastic gradient descent, coordinate descent, block Kaczmarz, and Gauss–Seidel, providing a flexible platform for algorithmic design (Du et al., 2019, Du et al., 2020).

4. Practical Implementation Considerations

Block Partitioning and Sampling

Optimal block partitioning leverages data locality and parallel processing capacity, as well as the computational cost of applying block pseudoinverses (where used). Sampling probabilities should always be set proportional to the squared Frobenius norm of each block, which both maximizes theoretical convergence rates and eliminates the risk of selecting blocks yielding no progress (Razaviyayn et al., 2018).

Computational Complexity

Per-iteration cost is $O(m_iN + n_jm_i)$ for row-block $i$ and column-block $j$ .
In the scalar-block case, it reduces to $O(N)$ flops.
There is a tradeoff between block size and convergence factor—the fastest wall-clock times are often attained at intermediate block sizes, balancing per-iteration cost with contraction rate (Du et al., 2019, Razaviyayn et al., 2018).

Stepsize Tuning and Precomputation

Stepsize $\alpha$ is chosen based on spectral information: $\alpha = 1/(\max_j n_j)$ or more finely, $\alpha = \min(1/\sum_j n_j,\, 2/\sum_j n_j)$ in the full-rank case.
Precomputing block pseudoinverses ( $A_{i,j}^\dagger$ ) or their QR decompositions, as well as block norms, is essential for high-throughput implementations.
In distributed environments, updates for disjoint column-blocks can proceed in parallel (Razaviyayn et al., 2018, Du et al., 2020).

5. Numerical Performance and Applications

Extensive numerical studies on synthetic and real-world data sets (UF sparse matrix collection) show that DSBGS:

Dramatically accelerates convergence compared to standard randomized Kaczmarz or Gauss–Seidel, achieving 2× to 15× speedup in wall-clock time for large systems (Du et al., 2019).
Maintains efficient scaling when the coefficient matrix is large, sparse, ill-conditioned, full-rank, underdetermined, or rank-deficient.
Outperforms classical SG and block mirror descent in logistic regression, least-squares, low-rank tensor recovery, and bilinear logistic regression, due to its ability to take larger steps on well-conditioned blocks and leverage Gauss–Seidel ordering (Xu et al., 2014).

6. Extensions to General Stochastic and Composite Optimization

The doubly-stochastic block Gauss–Seidel paradigm has been extended to stochastic convex and nonconvex problems with nonsmooth regularization. In these settings:

At each iteration, a mini-batch of random data/functions, or a subset of summands, is sampled for gradient estimation.
Variable blocks are updated in a Gauss–Seidel sweep, with each block receiving a proximal-gradient update using a mini-batch-based partial gradient.
The method achieves $O(1/\sqrt K)$ convergence for convex objectives, and stationarity in expectation for nonconvex smooth or composite problems (Xu et al., 2014).

Numerical experiments on regression and tensor recovery confirm that DS-BSG can handle large-scale, structured, and nonconvex optimization efficiently, outperforming pure SG and deterministic block coordinate descent in time to solution and statistical accuracy.

References

(Xu et al., 2014) Block stochastic gradient iteration for convex and nonconvex optimization (Razaviyayn et al., 2018) A Linearly Convergent Doubly Stochastic Gauss-Seidel Algorithm for Solving Linear Equations and A Certain Class of Over-Parameterized Optimization Problems (Du et al., 2019) A doubly stochastic block Gauss-Seidel algorithm for solving linear equations (Du et al., 2020) Pseudoinverse-free randomized block iterative algorithms for consistent and inconsistent linear systems