Block-Orthonormal Lanczos Quadrature (BOLT)

Updated 22 June 2026

BOLT is a Krylov-based method that generalizes scalar Lanczos quadrature to block settings, enabling efficient approximation of spectral functionals.
It utilizes block Gauss and Gauss-Radau quadrature with proven monotonic error bounds for robust trace and transfer function estimation.
The method supports stochastic trace estimation and adaptive low-rank corrections, making it ideal for applications in PDEs, kernel methods, and covariance analysis.

Block-Orthonormal Lanczos Quadrature (BOLT) is a Krylov-based methodology for approximating quantities of the form $B^T \phi(A) B$ and related spectral functionals, where $A$ is a large symmetric positive definite (s.p.d.) matrix, $B$ is a tall matrix of orthonormal columns, and $\phi$ is a matrix function analytic near the spectrum of $A$ . BOLT generalizes the classical (scalar) orthonormal Lanczos quadrature to the block setting, enabling efficient, robust, and highly accurate computation of trace estimates, transfer functions, and related quantities in large-scale settings such as MIMO PDE discretizations, kernel methods, and covariance estimation. The method unifies block Krylov Lanczos recurrences, block Gauss- and Gauss-Radau quadrature, and Stieltjes/Hermite–Padé approximation theory, supporting error analysis, extrapolation, and extensions for partial-matrix access and stochastic estimation.

1. Block Lanczos Framework and S-Fraction Representation

BOLT leverages the construction of a block Krylov subspace

$\mathcal{K}_m(A, B) = \mathrm{span}\{B, AB, A^2B, \ldots, A^{m-1}B\}$

where $A \in \mathbb{R}^{n\times n}$ is s.p.d. and $B \in \mathbb{R}^{n\times p}$ has full column rank ( $p \ll n$ ) and $B^TB = I_p$ by assumption. An orthonormal block basis $A$ 0 ( $A$ 1) is generated by a three-term block Lanczos recurrence:

$A$ 2

with symmetric block tridiagonal matrix $A$ 3 capturing the projected action of $A$ 4:

$A$ 5

This yields a block-tridiagonal pencil $A$ 6 for shifted operators.

The block-Lanczos process provides a surrogate for quantities such as

$A$ 7

via a matrix S-fraction (Stieltjes continued fraction) in terms of $A$ 8:

$A$ 9

where $B$ 0 selects the first $B$ 1 rows. This S-fraction encapsulates the exactness of the first $B$ 2 block moments of the spectral measure associated with $B$ 3 (Zimmerling et al., 2024).

2. Block Gauss and Gauss-Radau Quadrature, Monotonicity, and Error Bounds

BOLT achieves high fidelity through the use of block Gauss and block Gauss-Radau quadrature rules. The block Gauss-Lanczos estimate is

$B$ 4

while block Gauss-Radau augments the last diagonal block to interpolate at a prescribed endpoint (typically zero), yielding a modified block

$B$ 5

and corresponding $B$ 6. The rule is

$B$ 7

A fundamental property is the Loewner monotonicity:

$B$ 8

for $B$ 9, allowing a two-sided, fully computable a-posteriori error bound:

$\phi$ 0

A norm bound,

$\phi$ 1

is available, with additional bounds in terms of Ritz values and extremal eigenvalues (Zimmerling et al., 2024, Xu et al., 2022).

3. Hermite–Padé and Kreĭn–Nudelman Extensions for Dense Spectra

In regimes where $\phi$ 2 approximates a continuous-spectrum operator (e.g., PDEs on unbounded domains), classical block Gaussian quadrature may exhibit only linear convergence in $\phi$ 3 due to discrete approximation stagnation. BOLT introduces a Kreĭn–Nudelman-type adaptive low-rank modification to the block-Lanczos matrix, parameterized by $\phi$ 4, in the spirit of Hermite–Padé approximants for branch-cut functions:

$\phi$ 5

modifies the final denominator of the S-fraction. The resulting quadrature

$\phi$ 6

with

$\phi$ 7

restores exponential convergence under analyticity away from the branch cut; i.e.,

$\phi$ 8

with $\phi$ 9 depending on $A$ 0 and the spectral interval (Druskin et al., 9 Apr 2025). Limits recover Gauss and Gauss–Radau as $A$ 1 and $A$ 2 respectively. The method thereby emulates an “absorbing” boundary, pushing spurious poles away and dramatically reducing transfer-function errors for dense spectral problems.

4. Stochastic BOLT, Trace Estimation, and Access-Constrained Variants

BOLT generalizes to unbiased stochastic estimation of traces of matrix functions via block-orthonormal random probes:

Draw $A$ 3 (e.g., Rademacher or Gaussian), thin-QR to obtain $A$ 4, then execute $A$ 5 block Lanczos steps and form $A$ 6.
The estimator

$A$ 7

where $A$ 8, is unbiased for $A$ 9, with variance $\mathcal{K}_m(A, B) = \mathrm{span}\{B, AB, A^2B, \ldots, A^{m-1}B\}$ 0 over $\mathcal{K}_m(A, B) = \mathrm{span}\{B, AB, A^2B, \ldots, A^{m-1}B\}$ 1 independent probes (Yeon et al., 18 May 2025).

Block-orthonormality (“self-averaging”) yields $\mathcal{K}_m(A, B) = \mathrm{span}\{B, AB, A^2B, \ldots, A^{m-1}B\}$ 2 convergence in matvecs, robust even in flat-spectrum regimes where methods like Hutch++ degrade to $\mathcal{K}_m(A, B) = \mathrm{span}\{B, AB, A^2B, \ldots, A^{m-1}B\}$ 3. The method extends to Subblock SLQ for situations with only partial (submatrix) access to $\mathcal{K}_m(A, B) = \mathrm{span}\{B, AB, A^2B, \ldots, A^{m-1}B\}$ 4, guaranteeing unbiasedness and enabling localization for polynomial filters (Yeon et al., 18 May 2025).

Applications include:

Log-determinant and matrix norm estimation,
Distributional divergence proxies (KL, Wasserstein-2) between possibly singular or partially observed covariances,
Regularization in statistical learning.

5. Implementation, FLOP Counts, and Practical Considerations

Each BOLT iteration requires $\mathcal{K}_m(A, B) = \mathrm{span}\{B, AB, A^2B, \ldots, A^{m-1}B\}$ 5 time for the sparse matrix–block product and $\mathcal{K}_m(A, B) = \mathrm{span}\{B, AB, A^2B, \ldots, A^{m-1}B\}$ 6 for orthogonalization. Total cost for $\mathcal{K}_m(A, B) = \mathrm{span}\{B, AB, A^2B, \ldots, A^{m-1}B\}$ 7 Lanczos steps is $\mathcal{K}_m(A, B) = \mathrm{span}\{B, AB, A^2B, \ldots, A^{m-1}B\}$ 8 and storage cost is $\mathcal{K}_m(A, B) = \mathrm{span}\{B, AB, A^2B, \ldots, A^{m-1}B\}$ 9 (or just $A \in \mathbb{R}^{n\times n}$ 0 for the tridiagonal pencil under short-recurrence strategies).

On the small block-tridiagonal matrix $A \in \mathbb{R}^{n\times n}$ 1 (or shifted variant), evaluation of matrix functions, spectral decompositions, or block system solves is $A \in \mathbb{R}^{n\times n}$ 2 per shift, but $A \in \mathbb{R}^{n\times n}$ 3, and structure can be exploited. For trace estimation, diagonalization of the $A \in \mathbb{R}^{n\times n}$ 4 tridiagonal is dominant, but block size accelerates convergence and enables tradeoffs between memory and matvecs (Yeon et al., 18 May 2025).

Practical guidance includes $A \in \mathbb{R}^{n\times n}$ 5 for block size, monitoring $A \in \mathbb{R}^{n\times n}$ 6 for breakdown/deflation, and balancing spectral clustering with per-iteration cost (Xu et al., 2022, Zimmerling et al., 2024). Averaging of Gauss and Gauss–Radau approximants, or log-averaging in MIMO settings, further accelerates error decay at no additional matrix–vector product cost (Zimmerling et al., 2024).

6. Numerical Validation and Applications

BOLT demonstrates strong empirical performance on canonical large-scale discretizations:

Heat-diffusion PDEs: e.g., for $A \in \mathbb{R}^{n\times n}$ 7 (block size $A \in \mathbb{R}^{n\times n}$ 8) on a $A \in \mathbb{R}^{n\times n}$ 9 grid, BOLT achieves error $B \in \mathbb{R}^{n\times p}$ 0 at $B \in \mathbb{R}^{n\times p}$ 1 where block-Gauss plateaus at $B \in \mathbb{R}^{n\times p}$ 2 (Druskin et al., 9 Apr 2025, Zimmerling et al., 2024).
Maxwell’s equations: for $B \in \mathbb{R}^{n\times p}$ 3, $B \in \mathbb{R}^{n\times p}$ 4, BOLT yields $B \in \mathbb{R}^{n\times p}$ 5 smaller error than Gauss or Gauss–Radau, with preserved monotonicity and faster error decay (up to two orders of magnitude improvement) (Druskin et al., 9 Apr 2025).
Graph Laplacians: accurate estimation is validated in SNAP network experiments.

In all benchmarks, the a-posteriori error bounds are sharp, extrapolation via averaging yields acceleration by factors of $B \in \mathbb{R}^{n\times p}$ 6– $B \in \mathbb{R}^{n\times p}$ 7, and block enrichment further reduces iterations at minor extra cost (Zimmerling et al., 2024, Druskin et al., 9 Apr 2025).

7. Theoretical and Methodological Significance

BOLT unifies and generalizes block Krylov methods, numerical quadrature, and stochastic trace estimation, with rigorous error control, provable unbiasedness, and robust performance in both clustered and continuous spectra regimes. By incorporating Hermite–Padé corrections and block-orthonormal random probing, the method avoids the convergence pathologies of traditional Krylov, Hutchinson, and sketching methods, and flexibly adapts to hardware and access constraints.

BOLT is supported by theoretical analysis of convergence rates, monotonicity, a-posteriori error, and by potential-theoretic acceleration, with applications ranging from sparse PDEs and operator discretizations to high-dimensional statistics and machine learning (Druskin et al., 9 Apr 2025, Zimmerling et al., 2024, Yeon et al., 18 May 2025, Xu et al., 2022).