Exact Fast Kernel MVM Algorithm

Updated 6 August 2025

The paper introduces an algorithm that computes exact kernel matrix-vector multiplications via analytical decompositions, reducing time from O(N²) to O(N(log N)^(d-1)).
The method leverages a divide-and-conquer approach to evaluate multivariate weighted empirical CDFs, enabling scalable computation in low-dimensional spaces.
Incorporating linear fixed effects and iterative solvers, the algorithm supports fast Gaussian process regression on large datasets while preserving numerical exactness.

An exact fast kernel matrix–vector multiplication (MVM) algorithm is a method that computes products of the form $y = K x$ , where $K$ is a dense kernel matrix (e.g., Matérn or Gaussian), in subquadratic time while preserving exactness for a specified class of kernels. This approach, motivated by the computational bottleneck of kernel methods in large-scale Gaussian process inference, leverages explicit analytical decompositions of certain kernels and algorithmic techniques from empirical cumulative distribution function (CDF) evaluation. A prototypical instance targets Matérn kernels with half-integer smoothness, enabling efficient Gaussian process regression on data sets with hundreds of thousands of points in low dimensions by reducing the dominant cost from $O(N^2)$ to $O(N(\log N)^{d-1})$ for dimension $d$ .

1. Analytical Kernel Decomposition and Algorithm Principle

This algorithm is predicated on the existence of an explicit, finite decomposition of the kernel function into a sum of separable terms: $k(u-v) = \sum_{p=1}^{P} \phi_{1,p}(u) \, \phi_{2,p}(v)$ where $\{\phi_{1,p}\}$ and $\{\phi_{2,p}\}$ are univariate functions and $P$ depends on kernel parameters. Kernels compatible with this include the one-dimensional Matérn–$1/2$ kernel, $k_{1/2}(u-v) = \exp(-|u-v|)$ , which factorizes as: $\exp(-|u-v|) = \exp(-u) \exp(v) \cdot \mathbb{I}(u \geq v) + \exp(u) \exp(-v) \cdot \mathbb{I}(v > u)$ Similar decompositions exist for Matérn–$3/2$ and $5/2$ kernels using larger values of $P$ and more complex $\phi_{1,p},\phi_{2,p}$ .

For a vector $y \in \mathbb{R}^N$ and points $\{x_i\}_{i=1}^N$ , the kernel MVM is recast as a sum of weighted empirical CDFs: $[K y]_j = \sum_{p=1}^{P} \bigg( \phi_{1,p}(x_j) S_p^{(\leq)}(x_j) + \phi_{1,p}(-x_j) S_p^{(>)}(x_j) \bigg)$ where $S_p^{(\leq)}(x_j) = \sum_{i: x_i \leq x_j} y_i \phi_{2,p}(x_i)$ and $S_p^{(>)}(x_j) = \sum_{i: x_i > x_j} y_i \phi_{2,p}(-x_i)$ . This representation enables evaluating the full kernel MVM by evaluating weighted sums over data restricted to orderings on the real line (univariate), or, in the multivariate setting, quadrant-restricted sums.

2. Divide-and-Conquer Evaluation of Multivariate Empirical CDFs

The main computational bottleneck is the repeated evaluation of weighted empirical CDFs—partial sums of $y_i \phi_{2,p}(x_i)$ over data that satisfy certain multivariate order constraints (for example, $x_{i,k} \leq z_k$ for $k=1, \dots, d$ ). In one dimension, after sorting $\{x_i\}$ , all $S_p^{(\leq)}(x_j)$ and $S_p^{(>)}(x_j)$ are computed in $O(N)$ time via prefix and suffix sums. The initial sorting costs $O(N \log N)$ .

In higher dimensions, efficient CDF computation cannot be done by simple sorting. The algorithm employs a divide-and-conquer technique, following the recursive subdivision strategies described in Bentley (1980), Bouchard (2012), and related works. This approach evaluates all necessary quadrant sums simultaneously in

$O\big( N \, (\log N)^{d-1} \big)$

by recursively partitioning the $d$ -dimensional input space and, at each stage, constructing data structures (often balanced trees) that store appropriately sorted subsets of the data. The outputs of internal nodes are reused across multiple queries, amortizing the cost and providing speed-ups by a factor of at least 10 in practical implementations for low $d$ .

3. Incorporation of Linear Fixed Effects and Mean Functions

Many Gaussian process regression models assume a mean function that is linear or affine in the inputs: $m(x) = \beta_0 + \beta^\top x$ or $m(x) = \beta^\top h(x)$ . This algorithm integrates mean estimation within the fast MVM framework:

The ordinary least squares (OLS) estimate of $\beta$ is computed from the data and subtracted from observed responses, yielding mean-centered targets.
For joint estimation with kernel hyperparameters (including nugget/variance parameters), an iterative block coordinate descent is employed: the kernel and noise parameters are updated (using, e.g., ADAM for maximal likelihood) alternating with recalibration of the generalized least squares (GLS) linear coefficients.
Each CG or Lanczos iteration for the residual kernel system relies on the fast MVM, ensuring that total computational cost remains $O(N (\log N)^{d-1})$ per kernel–vector product.

4. Performance and Scalability Results

Empirical evaluation in the paper confirms that the method is effective for inference tasks involving hundreds of thousands of points in one, two, and three spatial dimensions. Example results:

Dimension $d$	Dataset Size $N$	Iterations (GP fit)	Wall Time (sec)
1	200,000	7,500	$\sim$ 35,000
2	400,000	not listed	comparable

For $d=1$ , all necessary weighted cumulative sums are computed in linear time after sorting.
For $d=2$ and $3$, the $O(N (\log N)^{d-1})$ cost remains practical for moderate $N$ (e.g., $N \sim 4 \times 10^5$ ).
Inclusion of mean effects does not materially increase overall runtime.

This suggests that, within "large $N$ , small $d$ " regimes—typical in spatial statistics and geostatistics—kernel inference can be performed at unprecedented scale with exactness.

5. Implementation and Integration into Gaussian Process Regression

The algorithm is implemented in C++ and is available as open-source code at https://gitlab.com/warin/fastgaussiankernelregression.git. Critical aspects include:

Iterative solvers: The fast MVM is embedded within CG for solving $(K + \sigma^2 I) \alpha = y$ and within Lanczos tridiagonalization for log-determinant and gradient calculations.
Fast gradient-based hyperparameter estimation is facilitated by estimating the gradient of the log-marginal likelihood using stochastic trace estimators, with repeated MVMs accelerated by the same CDF-based algorithm.
Numerical stability is addressed by incorporating an incomplete Cholesky or pivoted Cholesky preconditioner for the kernel matrix.

For challenging nonlinear optimization, the ADAM optimizer is favored over L-BFGS due to occasional convergence issues in the latter.

6. Mathematical and Algorithmic Foundations

Key mathematical aspects:

The method targets kernels for which the decomposition $k(u-v) = \sum_{p=1}^P \phi_{1,p}(u) \phi_{2,p}(v)$ is exact and finite—this holds for all Matérn kernels with half-integer smoothness ( $\nu = 1/2, 3/2, 5/2$ ).
In the multivariate setting, both product and $L_1$ -norm-based kernels can be expressed as sums over $2^d$ quadrant-weighted empirical CDFs, each weighted by precomputed kernel coefficients.
The CDF evaluation strategy is generalizable and connects directly to classical computational geometry techniques.

A table summarizing kernel decomposition for common Matérn kernels follows:

Kernel (Matérn, $\nu$ )	$k(u-v)$ Expression	$P$
$1/2$ (Laplacian)	$e^{-\|u-v\|}$	$1$
$3/2$	$\left(1 + \sqrt{3}\|u-v\|\right)e^{-\sqrt{3}\|u-v\|}$	$2$
$5/2$	$\left(1 + \sqrt{5}\|u-v\| + (5/3)(u-v)^2\right) e^{-\sqrt{5}\|u-v\|}$	$3$

7. Impact, Limitations, and Availability

This approach establishes that for compatible kernels (especially Matérn with half-integer smoothness), it is possible to perform exact Gaussian process inference at scales previously accessible only to approximate methods. The critical restriction is to low-dimensional domains ( $d < 5$ ), owing to the exponential factor in the multivariate CDF computation.

By enabling fast, scalable, and exact matrix–vector kernel products, this method broadens applicability in spatial statistics, geostatistics, and related fields requiring massive kernel inference, as evidenced by empirical results in the referenced paper (Langrené et al., 3 Aug 2025). The availability of open-source implementation facilitates adoption and further methodological developments.

PDF Markdown Chat (Pro)

References (1)

Fast Gaussian process inference by exact Matérn kernel decomposition (2025)

Whiteboard

Generate a whiteboard explanation of this topic.

Follow Topic

Get notified by email when new papers are published related to Exact Fast Kernel MVM Algorithm.