Papers
Topics
Authors
Recent
Search
2000 character limit reached

Pivoted-Cholesky Preconditioner

Updated 14 March 2026
  • The pivoted-Cholesky preconditioner is a deterministic, low-rank approximation method that clusters the spectrum of SPD kernel matrices to improve iterative solver convergence.
  • It employs a greedy, diagonal pivoting strategy to adaptively capture the dominant eigenvalues, reducing the condition number and accelerating PCG iterations.
  • This technique enables scalable and numerically stable computations in Gaussian process models and spatial statistics with proven convergence guarantees.

The pivoted-Cholesky preconditioner is a deterministic, low-rank approximation method for preconditioning large symmetric positive definite (SPD) kernel matrices. It targets acceleration of iterative solvers—primarily preconditioned conjugate gradients (PCG)—by clustering the spectrum of the preconditioned system near 1, thereby improving the convergence behavior. The pivoted-Cholesky construction exploits a greedy, column-wise, diagonal pivoting strategy to adaptively approximate the dominant eigenspace of the kernel matrix, enabling scalable, numerically stable solvers for kernel linear systems and log-determinant estimation in Gaussian process (GP) models and spatial statistics.

1. Mathematical Formulation and Algorithmic Structure

Given a noisy kernel (covariance) matrix K~=K+σ2In\tilde{K} = K + \sigma^2 I_n, the pivoted-Cholesky preconditioner of rank knk \ll n constructs a factor LkRn×kL_k \in \mathbb{R}^{n \times k} such that

K~P=LkLk+σ2I.\tilde{K} \approx P = L_k L_k^\top + \sigma^2 I.

The algorithm proceeds as follows:

  1. Initialization: Set R(0)K~R^{(0)} \leftarrow \tilde{K}.
  2. Iterative Updates for i=1,,ki=1, \dots, k:
    • Pivot selection: pi=argmaxjRjj(i1)p_i = \arg\max_j R^{(i-1)}_{jj}
    • Column computation: Lk[pi,i]=Rpipi(i1)L_k[p_i, i] = \sqrt{R^{(i-1)}_{p_i p_i}}, and for all jj, Lk[j,i]=Rjpi(i1)/Lk[pi,i]L_k[j, i] = R^{(i-1)}_{j p_i}/L_k[p_i, i]
    • Residual update: R(i)=R(i1)Lk[:,i]Lk[:,i]R^{(i)} = R^{(i-1)} - L_k[:, i] L_k[:, i]^\top
  3. Termination: After kk steps, LkL_k forms the low-rank factor. The preconditioner matrix is P=LkLk+σ2InP = L_k L_k^\top + \sigma^2 I_n.

The complexity to build LkL_k is O(nk2+k3)O(n k^2 + k^3), with O(nk)O(n k) storage. Each application of P1P^{-1} to a vector during iterative solution requires O(nk)O(n k) computation via two triangular solves and one diagonal solve (Gyger et al., 2024, Gardner et al., 2018, Liu et al., 2015, Roos et al., 28 Jul 2025).

2. Incorporation in Iterative Solvers

In the PCG method, one solves systems of the form

K~x=b,\tilde{K} x = b,

using PP as a left-preconditioner, yielding P1K~x=P1bP^{-1} \tilde{K} x = P^{-1} b. Each CG iteration requires:

  • A matrix-vector product (MVM) with K~\tilde{K}: O(n2)O(n^2) for dense, O(nm+nnγ)O(n m + n n_\gamma) in structured settings.
  • Application of P1P^{-1}: O(nk)O(n k), two triangular solves with LkL_k and a scale by σ2\sigma^2.

The preconditioner can also be efficiently inverted via the Woodbury identity, enabling direct application to vectors:

(P)1=σ2Iσ4Lk(I+σ2LkLk)1Lk.(P)^{-1} = \sigma^{-2} I - \sigma^{-4} L_k (I + \sigma^{-2} L_k^\top L_k)^{-1} L_k^\top.

Sampling and log-determinant calculations in stochastic Lanczos quadrature (SLQ) exploit this efficient structure (Gardner et al., 2018).

3. Theoretical Convergence and Preconditioning Guarantees

A rank-kk pivoted-Cholesky approximation captures the leading kk eigenvalues of K~\tilde{K}. The eigenvalues λ^j\hat{\lambda}_j of P1K~P^{-1} \tilde{K} satisfy:

  • λ^1,,λ^k1\hat{\lambda}_1, \dots, \hat{\lambda}_k \approx 1
  • Remaining λ^k+1,,λ^n\hat{\lambda}_{k+1}, \dots, \hat{\lambda}_n lie in [σ2/(λk+1+σ2),1][\sigma^2/(\lambda_{k+1}+\sigma^2), 1]

Thus, the condition number

κ(P1K~)λk+1+σ2σ2\kappa(P^{-1} \tilde{K}) \approx \frac{\lambda_{k+1}+\sigma^2}{\sigma^2}

decreases as kk increases, improving CG convergence. For exponentially decaying eigenvalues (e.g., analytic kernels), this clustering ensures rapid convergence, often with small kk (Gyger et al., 2024, Gardner et al., 2018, Liu et al., 2015).

Convergence rates for the maximum-norm residual have been made precise for kernels of minimal regularity. For SPD, Lipschitz-continuous kernels on compact ΩRd\Omega\subset\mathbb{R}^d and complete pivoting, the residual satisfies KLkLk=O(k1/d)\|K-L_kL_k^\top\|_\infty=O(k^{-1/d}), improving to O(k2/d)O(k^{-2/d}) for kernels with C1,1C^{1,1} regularity (Jeong et al., 16 Sep 2025).

4. Implementation Variants and Pivot Selection Strategies

The canonical variant uses greedy diagonal pivoting: at each step, select the pivot maximizing the diagonal of the current residual ("uncertainty sampling", "maximum entropy" in GP terms) (Roos et al., 28 Jul 2025). Recent work has introduced alternative, data-adaptive selection strategies:

  • PCov ("projected covariance"): Greedily maximizes reduction in tr(KLL)\mathrm{tr}(K-L L^\top), targeting the A-optimal objective.
  • WPCov ("weighted PCov"): Incorporates data fit by monitoring yy-weighted projections.

Both alternatives preserve O(nk2)O(nk^2) building cost, and empirical results demonstrate substantial reductions in required CG iterations versus standard pivoting, particularly on GP regression problems (Roos et al., 28 Jul 2025).

5. Computational Complexity and Empirical Performance

Step Time Complexity Storage
Build LkL_k O(nk2+k3)O(n k^2 + k^3) O(nk)O(n k)
Apply P1P^{-1} (per iter) O(nk)O(nk) O(nk)O(n k)
Apply K~\tilde{K} (per it) O(n2)O(n^2) or structured

Empirical benchmarks consistently show that for nn up to 10510^5, kk in the range $200$–$500$ is typical. For example, on a dense FSA system of size n=105n=10^5 (Gyger et al., 2024):

  • No preconditioner: 190 iterations, 54 s
  • Pivoted-Cholesky (k=200, 500, 1000): Iteration counts $91, 52, 32$; wall times $52, 83, 266$ s.
  • Custom FITC preconditioner (as a comparator): 14 iterations, \sim10 s.

Increasing kk reduces iteration counts but raises overhead as O(nk2)O(nk^2); above k500k \sim 500 diminishing returns are observed. In GPyTorch (Gardner et al., 2018), even m=5m=5 suffices for an order-of-magnitude reduction in CG steps in some kernel regimes.

6. Stopping Criteria, Rank Selection, and Best Practices

Practical stopping criteria involve monitoring the maximum diagonal element of the residual:

  • Stop when maxidiεtol\max_i d_i \le \varepsilon_{\text{tol}} or the number of iterations reaches kmaxk_{\max}
  • In preconditioning, select kk such that Rkδλmin(K)\|R_k\|_\infty \leq \delta \lambda_{\min}(K) for target κ\kappa

For kernels with regularity below C2C^2, complete pivoting guarantees deterministic decay rates for the residual, outperforming unpivoted incomplete Cholesky or random Nyström methods in uniform norm error (Jeong et al., 16 Sep 2025). Rank selection can be guided by the required target condition number and domain dimension.

7. Practical Impact and Scope of Application

The pivoted-Cholesky preconditioner is broadly used in scalable GP inference, covariance approximation, kernel interpolation, and log-determinant approximation. Its rapid, deterministic convergence in the uniform norm, with provable guarantees under weak regularity, makes it state-of-the-art for kernel methods in scientific computing and machine learning. Nevertheless, for very large ranks or when the low-rank spectrum decays slowly (e.g., high-dimensional or rough kernels), overhead can become prohibitive; alternatives such as structure-exploiting or inducing-point preconditioners (e.g., FITC) can outperform in those regimes (Gyger et al., 2024). Recent improvements in pivot selection and adaptive rank determination further enhance its efficiency and effectiveness in practical settings (Roos et al., 28 Jul 2025, Liu et al., 2015, Gardner et al., 2018).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Pivoted-Cholesky Preconditioner.