Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 180 tok/s
Gemini 2.5 Pro 55 tok/s Pro
GPT-5 Medium 34 tok/s Pro
GPT-5 High 37 tok/s Pro
GPT-4o 95 tok/s Pro
Kimi K2 205 tok/s Pro
GPT OSS 120B 433 tok/s Pro
Claude Sonnet 4.5 38 tok/s Pro
2000 character limit reached

Diagonal Preconditioners for Improved Convergence

Updated 21 October 2025
  • Diagonal preconditioners are positive definite diagonal matrices that rescale linear systems to improve convergence by reducing condition numbers.
  • They are computed using methods like subgradient optimization, SDP formulations, and interior point approaches to achieve near-optimal conditioning.
  • Their applications span iterative solvers, stochastic optimization, and large-scale computations, offering scalable, efficient performance in practice.

A diagonal preconditioner is a positive definite diagonal matrix designed to rescale a linear system, optimization problem, or iterative procedure so that convergence rates are improved by reducing the (generalized) condition number of the operator involved. Diagonal preconditioning is a ubiquitous strategy for accelerating convergence in iterative solvers, enhancing robustness in first-order methods, stabilizing numerical optimization, and facilitating large-scale computations where storage or computational constraints preclude the use of dense or full-matrix preconditioners. In modern computational mathematics and data science, diagonal preconditioners are recognized for their scalability, efficiency, and amenability to both theoretical analysis and practical implementation.

1. Mathematical Foundations and Definitions

Let ARn×nA \in \mathbb{R}^{n \times n} be a symmetric positive definite (SPD) matrix. The classical (worst-case) condition number is κ(A)=λmax(A)/λmin(A)\kappa(A) = \lambda_{\max}(A)/\lambda_{\min}(A), where λmax\lambda_{\max} and λmin\lambda_{\min} are the largest and smallest eigenvalues, respectively. Given a positive diagonal matrix D=Diag(d)D = \operatorname{Diag}(d) with d>0d > 0, diagonal preconditioning transforms AA to A~=D1/2AD1/2\tilde{A} = D^{1/2} A D^{1/2} or, equivalently, rescales the variables xD1/2xx \mapsto D^{-1/2} x to obtain a better conditioned system.

The preconditioning objective is to select DD such that κ(A~)\kappa(\tilde{A}) is minimized. A parallel, “average-case” conditioning measure is the ω\omega-condition number, defined as:

ω(A)=tr(A)/ndet(A)1/n\omega(A) = \frac{\operatorname{tr}(A) / n}{\det(A)^{1/n}}

which is minimized for diagonal preconditioners corresponding to the so-called “equilibration” or “log-determinant maximization” (Ghadimi et al., 27 Sep 2025).

Diagonal preconditioning is not limited to square matrices; in overdetermined least-squares problems (ARm×n,mnA \in \mathbb{R}^{m \times n}, m \geq n), one seeks diagonal WW so that κ(ATWA)\kappa(A^T W A) is minimized (“inner scaling”), or, for SPD matrices, diagonal SS on both sides such that SASSAS is optimally conditioned (“outer scaling”).

2. Algorithmic Approaches and Computational Methods

Several algorithmic frameworks have been developed for computing optimal or near-optimal diagonal preconditioners.

2.1 Affine Pseudoconvex Reformulation and Subgradient Methods:

The condition number minimization problem can be reformulated as a pseudoconvex optimization over d>0d > 0. By viewing the mapping D(d)=ADiag(d)\mathcal{D}(d) = A \operatorname{Diag}(d), the eigenvalues of D(d)\mathcal{D}(d) coincide with those of D1/2AD1/2D^{1/2} A D^{1/2}, making the minimization of κ(D(d))\kappa(\mathcal{D}(d)) equivalent to minimizing κ(A~)\kappa(\tilde{A}) (Ghadimi et al., 27 Sep 2025). The gradient of κ(d)\kappa(d) is

κ(d)=κ(d)(1x1TDx1(x1x1)1xnTDxn(xnxn))\nabla \kappa(d) = \kappa(d)\left(\frac{1}{x_1^T D x_1}(x_1 \bullet x_1) - \frac{1}{x_n^T D x_n}(x_n \bullet x_n)\right)

where x1x_1 and xnx_n are eigenvectors corresponding to λmax\lambda_{\max} and λmin\lambda_{\min}, and \bullet denotes the Hadamard (elementwise) square. The necessary optimality condition for a κ\kappa-optimal diagonal preconditioner is x1x1=xnxnx_1 \bullet x_1 = x_n \bullet x_n.

A projected subgradient method,

vk+1=ΠΩ^(vktkgkgk)v_{k+1} = \Pi_{\hat\Omega}\left( v_k - t_k \frac{g_k}{\|g_k\|} \right)

converges to the global minimizer due to pseudoconvexity, with each iteration requiring only the computation of the dominant eigenpairs (Ghadimi et al., 27 Sep 2025).

2.2 SDP and Matrix-Dictionary Methods:

Semidefinite programming offers an exact formulation: maxτsuch that MτDM,D diagonal,D>0\max \tau \quad \text{such that } M \tau \preceq D \preceq M, \, D \text{ diagonal}, \, D > 0 This approach, while theoretically optimal, is prohibitively expensive for large nn.

Efficient alternatives, such as the matrix-dictionary approximation and cutting-plane SIP/column generation, reduce the search space to a low-dimensional subspace (Gao et al., 2023). The resulting SIP is solved via iterative linear programs enhanced by black-box eigenvalue computations, making them practical for very large sparse systems.

2.3 Interior Point & Bisection:

“Optimal Diagonal Preconditioning” (Qu et al., 2022) recasts the problem as a quasi-convex program, enabling efficient bisection or interior point algorithms. The bisection approach iteratively solves

minD>0tsuch thatAD12t\min_{D > 0} \,\, t \quad \text{such that} \|AD^{-1}\|_2 \leq t

and uses the Nesterov–Todd direction for improved convergence. For one-sided preconditioning, dual SDP reformulations are used.

2.4 Randomized and Sampling-based Acceleration:

For matrices available only via matrix-vector products, random projections or sampling techniques can be used to restrict the diagonal search space, with theoretical guarantees that random subspaces of modest size yield constant-factor approximations to the optimal preconditioner (Gao et al., 2023, Qu et al., 2022).

2.5 Classical Heuristics and Jacobi Preconditioning:

The Jacobi preconditioner, D=diag(A)D = \operatorname{diag}(A), guarantees a condition number within a quadratic factor of optimal (Jambulapati et al., 2020). However, more sophisticated SDP-based or affine methods can yield at least a square-root improvement; for some matrices the Jacobi approach cannot be improved in worst-case scaling beyond the quadratic factor.

3. Practical Applications and Empirical Performance

Diagonal preconditioners appear across a spectrum of computational science and engineering domains.

  • Iterative Solvers for SPD Systems: Accelerating convergence in the preconditioned conjugate gradient method (PCG) for Ax=bAx=b is a fundamental application. Numerical results demonstrate that optimally constructed diagonal preconditioners can reduce PCG iteration counts by an order of magnitude compared to heuristics (Ghadimi et al., 27 Sep 2025, Gao et al., 2023).
  • Optimization Algorithms: In stochastic gradient descent (SGD), diagonal scaling equates to coordinate-wise adaptive step sizes. AdaGrad and its variants use cumulative squared gradient histories to form the diagonal preconditioner, leading to robust regret bounds and efficient convergence (Xie et al., 13 Mar 2025). Notably, in many realistic regimes, diagonal adaptive methods—despite their computational thrift—can match or even outperform richer, more expensive full-matrix schemes.
  • Interior Point and First-Order Methods: In large convex programs, diagonal scaling (e.g., via the graph projection splitting variant of ADMM) substantially improves primal-dual residual balancing and accelerates convergence across synthetic and real-world problem instances (Takapoui et al., 2016).
  • Eigenvalue Problems and Electronic Structure: In electronic structure calculations with ill-conditioned generalized eigenproblems, hybrid global/local preconditioning strategies are used where a global diagonal or block preconditioner (e.g., LDLT^T factorization of HϵSH - \epsilon S) is amortized and refined via local iterative solves (Cai et al., 2013).
  • Large-Scale and Matrix-Free Environments: Recent algorithms utilizing SIP/cutting-plane and matrix-dictionary strategies achieve scalable, near-optimal diagonal preconditioning for matrices with dimension and nnz in the 10710^7 range, using only black-box access to matrix-vector products (Gao et al., 2023, Jambulapati et al., 2023).

4. Theoretical Guarantees and Optimality Conditions

Optimal diagonal preconditioning is possible due to the following facts:

  • For every SPD AA, D>0\exists D > 0 (diagonal) achieving minimum condition number for D1/2AD1/2D^{1/2} A D^{1/2} (Ghadimi et al., 27 Sep 2025, Qu et al., 2022, Jambulapati et al., 2020).
  • Under pseudoconvexity of the reparameterized objective, any stationary point is a global optimum, enabling subgradient methods to avoid local minima traps (Ghadimi et al., 27 Sep 2025).
  • For average-case criteria (e.g., ω\omega-condition number), convexity facilitates tractable optimization.
  • For the classical Jacobi scaling, κ(diag(A)1/2Adiag(A)1/2)(κ)2\kappa(\operatorname{diag}(A)^{1/2}A\operatorname{diag}(A)^{1/2}) \leq (\kappa^*)^2 where κ\kappa^* is the optimal achievable condition number by any diagonal scaling (Jambulapati et al., 2020, Jambulapati et al., 2023). There exist matrices where this bound is tight.
  • Application of an ω\omega-optimal preconditioner to a matrix already κ\kappa-optimally scaled yields further, sometimes dramatic, convergence gains (Ghadimi et al., 27 Sep 2025).

5. Extensions, Limitations, and Future Directions

Structured Preconditioners:

Diagonal preconditioning is a particular case of structured preconditioning; generalizations include block-diagonal, Kronecker-product, and variable-wise diagonal schemes. Recent unified analysis in adaptive optimization shows that more structured (e.g., layerwise, block, or diagonal) preconditioners may be not just computationally preferable, but can also be competitive or superior in actual convergence (Xie et al., 13 Mar 2025).

Limitations:

  • Off-diagonal couplings are entirely neglected in pure diagonal schemes; in the presence of strong correlations, block or full-matrix preconditioners may outperform.
  • In some settings, further improvements require incorporating off-diagonal structure, e.g., via incomplete Cholesky or block Jacobi.
  • The quality of diagonal scaling is fundamentally limited if the matrix is nearly singular or has global coupling structure invisible to coordinate-wise rescaling.

Scalability:

Modern developments leverage randomized and black-box based algorithms to enable application to matrices of extreme size, including highly sparse and streaming contexts (Gao et al., 2023, Jambulapati et al., 2023).

Research Directions:

Key avenues include plugin approaches for adaptive selection of candidate preconditioner subspaces, extensions to indefinite matrices, hybrid (variable/blockwise) scaling, integration with higher-order, deterministic and stochastic optimization algorithms, and statistical risk-aware preconditioning.

Approach Storage/Compute Structure Used Optimality
Jacobi (diag) O(n)O(n) Diag(A) Quadratic factor from optimal (Jambulapati et al., 2020)
SDP-based optimal diagonal O(n3)O(n^3) Full Optimal
Subgradient affine, SIP, CG O(n)O(n) Matrix-vector Near-optimal; scalable (Gao et al., 2023, Ghadimi et al., 27 Sep 2025)
Block/variable-wise (OVDP) Low Operator Norms Variable-level optimal (Naganuma et al., 2023)
Randomized/dictionary CG O(k)O(k) Subspaces With knk \ll n, constant-factor approx. (Gao et al., 2023)

These approaches serve different computational and application regimes.

7. Impact and Empirical Findings

Empirical results across diverse contexts—large-scale linear systems, graph Laplacian solvers, first-order optimization, and machine learning—support the efficacy of optimal and near-optimal diagonal preconditioners in reducing both iteration count and wall-clock time. Notably:

  • Random sampling and column-generation strategies rapidly achieve up to 2×2\times improvement over Jacobi preconditioning for real-world sparse matrices (Gao et al., 2023).
  • In PCG for Hessian systems from logistic regression or interior-point methods, optimized diagonal scaling decimates the number of matrix-vector products required for convergence (Ghadimi et al., 27 Sep 2025).
  • Diagonal AdaGrad matches or outperforms more sophisticated preconditioners experimentally, especially in high dimensions (Xie et al., 13 Mar 2025).
  • Iterative schemes that sequentially combine κ\kappa- and ω\omega-optimal preconditioners result in order-of-magnitude gains for conjugate gradient approaches (Ghadimi et al., 27 Sep 2025).

Diagonal preconditioners, due to their minimal storage and computational overhead, intrinsic parallelizability, and amenability to rigorous theoretical analysis, have become a mainstay tool in scientific computing, optimization, and data science for enhancing the numerical stability and computational efficiency of iterative algorithms.

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Diagonal Preconditioners.