Exact Fast Kernel MVM Algorithm
- The paper introduces an algorithm that computes exact kernel matrix-vector multiplications via analytical decompositions, reducing time from O(N²) to O(N(log N)^(d-1)).
- The method leverages a divide-and-conquer approach to evaluate multivariate weighted empirical CDFs, enabling scalable computation in low-dimensional spaces.
- Incorporating linear fixed effects and iterative solvers, the algorithm supports fast Gaussian process regression on large datasets while preserving numerical exactness.
An exact fast kernel matrix–vector multiplication (MVM) algorithm is a method that computes products of the form , where is a dense kernel matrix (e.g., Matérn or Gaussian), in subquadratic time while preserving exactness for a specified class of kernels. This approach, motivated by the computational bottleneck of kernel methods in large-scale Gaussian process inference, leverages explicit analytical decompositions of certain kernels and algorithmic techniques from empirical cumulative distribution function (CDF) evaluation. A prototypical instance targets Matérn kernels with half-integer smoothness, enabling efficient Gaussian process regression on data sets with hundreds of thousands of points in low dimensions by reducing the dominant cost from to for dimension .
1. Analytical Kernel Decomposition and Algorithm Principle
This algorithm is predicated on the existence of an explicit, finite decomposition of the kernel function into a sum of separable terms: where and are univariate functions and depends on kernel parameters. Kernels compatible with this include the one-dimensional Matérn–$1/2$ kernel, , which factorizes as: Similar decompositions exist for Matérn–$3/2$ and $5/2$ kernels using larger values of and more complex .
For a vector and points , the kernel MVM is recast as a sum of weighted empirical CDFs: where and . This representation enables evaluating the full kernel MVM by evaluating weighted sums over data restricted to orderings on the real line (univariate), or, in the multivariate setting, quadrant-restricted sums.
2. Divide-and-Conquer Evaluation of Multivariate Empirical CDFs
The main computational bottleneck is the repeated evaluation of weighted empirical CDFs—partial sums of over data that satisfy certain multivariate order constraints (for example, for ). In one dimension, after sorting , all and are computed in time via prefix and suffix sums. The initial sorting costs .
In higher dimensions, efficient CDF computation cannot be done by simple sorting. The algorithm employs a divide-and-conquer technique, following the recursive subdivision strategies described in Bentley (1980), Bouchard (2012), and related works. This approach evaluates all necessary quadrant sums simultaneously in
by recursively partitioning the -dimensional input space and, at each stage, constructing data structures (often balanced trees) that store appropriately sorted subsets of the data. The outputs of internal nodes are reused across multiple queries, amortizing the cost and providing speed-ups by a factor of at least 10 in practical implementations for low .
3. Incorporation of Linear Fixed Effects and Mean Functions
Many Gaussian process regression models assume a mean function that is linear or affine in the inputs: or . This algorithm integrates mean estimation within the fast MVM framework:
- The ordinary least squares (OLS) estimate of is computed from the data and subtracted from observed responses, yielding mean-centered targets.
- For joint estimation with kernel hyperparameters (including nugget/variance parameters), an iterative block coordinate descent is employed: the kernel and noise parameters are updated (using, e.g., ADAM for maximal likelihood) alternating with recalibration of the generalized least squares (GLS) linear coefficients.
- Each CG or Lanczos iteration for the residual kernel system relies on the fast MVM, ensuring that total computational cost remains per kernel–vector product.
4. Performance and Scalability Results
Empirical evaluation in the paper confirms that the method is effective for inference tasks involving hundreds of thousands of points in one, two, and three spatial dimensions. Example results:
Dimension | Dataset Size | Iterations (GP fit) | Wall Time (sec) |
---|---|---|---|
1 | 200,000 | 7,500 | 35,000 |
2 | 400,000 | not listed | comparable |
- For , all necessary weighted cumulative sums are computed in linear time after sorting.
- For and $3$, the cost remains practical for moderate (e.g., ).
- Inclusion of mean effects does not materially increase overall runtime.
This suggests that, within "large , small " regimes—typical in spatial statistics and geostatistics—kernel inference can be performed at unprecedented scale with exactness.
5. Implementation and Integration into Gaussian Process Regression
The algorithm is implemented in C++ and is available as open-source code at https://gitlab.com/warin/fastgaussiankernelregression.git. Critical aspects include:
- Iterative solvers: The fast MVM is embedded within CG for solving and within Lanczos tridiagonalization for log-determinant and gradient calculations.
- Fast gradient-based hyperparameter estimation is facilitated by estimating the gradient of the log-marginal likelihood using stochastic trace estimators, with repeated MVMs accelerated by the same CDF-based algorithm.
- Numerical stability is addressed by incorporating an incomplete Cholesky or pivoted Cholesky preconditioner for the kernel matrix.
For challenging nonlinear optimization, the ADAM optimizer is favored over L-BFGS due to occasional convergence issues in the latter.
6. Mathematical and Algorithmic Foundations
Key mathematical aspects:
- The method targets kernels for which the decomposition is exact and finite—this holds for all Matérn kernels with half-integer smoothness ().
- In the multivariate setting, both product and -norm-based kernels can be expressed as sums over quadrant-weighted empirical CDFs, each weighted by precomputed kernel coefficients.
- The CDF evaluation strategy is generalizable and connects directly to classical computational geometry techniques.
A table summarizing kernel decomposition for common Matérn kernels follows:
Kernel (Matérn, ) | Expression | |
---|---|---|
$1/2$ (Laplacian) | $1$ | |
$3/2$ | $2$ | |
$5/2$ | $3$ |
7. Impact, Limitations, and Availability
This approach establishes that for compatible kernels (especially Matérn with half-integer smoothness), it is possible to perform exact Gaussian process inference at scales previously accessible only to approximate methods. The critical restriction is to low-dimensional domains (), owing to the exponential factor in the multivariate CDF computation.
By enabling fast, scalable, and exact matrix–vector kernel products, this method broadens applicability in spatial statistics, geostatistics, and related fields requiring massive kernel inference, as evidenced by empirical results in the referenced paper (Langrené et al., 3 Aug 2025). The availability of open-source implementation facilitates adoption and further methodological developments.