Fast kernel methods: Sobolev, physics-informed, and additive models

Published 2 Sep 2025 in stat.ML, cs.LG, math.ST, stat.ME, and stat.TH | (2509.02649v1)

Abstract: Kernel methods are powerful tools in statistical learning, but their cubic complexity in the sample size n limits their use on large-scale datasets. In this work, we introduce a scalable framework for kernel regression with O(n log n) complexity, fully leveraging GPU acceleration. The approach is based on a Fourier representation of kernels combined with non-uniform fast Fourier transforms (NUFFT), enabling exact, fast, and memory-efficient computations. We instantiate our framework in three settings: Sobolev kernel regression, physics-informed regression, and additive models. When known, the proposed estimators are shown to achieve minimax convergence rates, consistent with classical kernel theory. Empirical results demonstrate that our methods can process up to tens of billions of samples within minutes, providing both statistical accuracy and computational scalability. These contributions establish a flexible approach, paving the way for the routine application of kernel methods in large-scale learning tasks.

Abstract PDF Upgrade to Chat

Authors (4)

Summary

The paper introduces a novel Fourier-based kernel regression method using NUFFT operations to achieve O(n log n) complexity on GPUs.
It demonstrates minimax optimal convergence rates for Sobolev, physics-informed, and additive models with rigorous theoretical guarantees.
Empirical results validate the method's scalability and efficiency on massive datasets, outperforming traditional approaches in both runtime and memory usage.

Fast Kernel Methods via Fourier-NUFFT: Sobolev, Physics-Informed, and Additive Models

Introduction and Motivation

This paper presents a unified framework for scalable kernel regression, leveraging Fourier representations and non-uniform fast Fourier transforms (NUFFT) to achieve $\mathcal{O}(n \log n)$ complexity for both time and memory. The approach is instantiated for three classes of models: Sobolev kernel regression, physics-informed regression, and additive models. The framework is designed to fully exploit GPU acceleration, enabling exact kernel learning on datasets with up to tens of billions of samples, a regime previously inaccessible to classical kernel methods due to their cubic complexity in $n$ .

Fourier-Based Kernel Regression and Computational Acceleration

The core innovation is the use of truncated Fourier bases to represent functions in the RKHS, reducing the kernel regression problem to a finite-dimensional linear system. The design matrix $\Phi$ is constructed from the truncated Fourier basis, and the empirical risk minimization is performed over the parameter vector $\theta$ :

$\hat{\theta} = \arg\min_{\theta \in \mathbb{C}^D} \left( n^{-1} \|\Phi \theta - \mathbb{Y}\|_2^2 + \lambda \|M \theta\|_2^2 \right)$

where $M$ encodes the kernel norm. The key computational bottlenecks—computation of $\Phi^*\Phi$ and $\Phi^*\mathbb{Y}$ —are recast as NUFFT operations, which are highly parallelizable and efficiently implemented on GPUs. The covariance matrix exhibits a block Toeplitz structure, further enabling fast matrix-vector products via FFTs.

This approach yields exact solutions (not approximations) for a broad class of kernels, in contrast to Nyström or random feature methods, which introduce additional approximation error and complicate theoretical analysis.

Statistical Guarantees and Minimax Rates

The framework is rigorously analyzed under standard nonparametric regression assumptions, with the target function $f^\star$ in a Sobolev space $H^s(\Omega)$ . The estimator achieves the minimax optimal rate:

$\mathbb{E}\left[\|f_{\hat{\theta}} - f^\star\|_{L^2(\mathbb{P}_X)}^2\right] = \mathcal{O}\left(n^{-2s/(2s+d)}\right)$

for appropriate choices of the truncation level $m$ and regularization parameter $\lambda$ . The analysis covers both standard Sobolev kernel regression and a low-bias variant that penalizes the unweighted $\ell_2$ norm of the coefficients, mitigating the regularization bias associated with high-order Sobolev norms.

Figure 1: Sobolev regression (Left), and low-bias regression (Right) in dimension $d=2$ .

Empirical results confirm that the observed test error closely matches the theoretical minimax rate, even for $n=10^{10}$ , with GPU runtimes on the order of one minute.

Low-Bias Sobolev Regression: Bias-Variance Trade-off

The paper highlights a critical trade-off in the choice of the smoothness parameter $s$ . While higher $s$ yields faster asymptotic rates, the associated Sobolev norm can grow rapidly, increasing regularization bias. The low-bias variant, which uses an unweighted $\ell_2$ penalty, consistently outperforms the standard Sobolev kernel in finite-sample regimes, especially for large $s$ and moderate $n$ . The optimal $s$ in practice is often lower than the theoretical smoothness of $f^\star$ , and increases with $n$ .

Physics-Informed Kernel Regression

The framework naturally extends to physics-informed machine learning (PIML), where prior knowledge is encoded as linear PDE constraints. The penalty term $\int_\Omega \mathcal{D}(f_\theta, x)^2 dx$ is efficiently incorporated in the Fourier domain, as differential operators act diagonally on Fourier coefficients. The resulting estimator remains amenable to NUFFT acceleration and GPU implementation, with no increase in asymptotic complexity.

Figure 2: Physics-informed kernel regression (averaged over 20 samples).

Empirical results demonstrate that incorporating physics-based constraints improves predictive accuracy without increasing computational cost.

Additive Models and High-Dimensional Scalability

To address the curse of dimensionality, the framework supports additive models, where $f^\star(x) = \sum_{\ell=1}^d g^\star_\ell(x_\ell)$ . The additive structure allows the estimator to achieve the univariate minimax rate $n^{-2s/(2s+1)}$ , independent of $d$ , provided the component functions are sufficiently smooth. The block structure of the design and covariance matrices is exploited for efficient computation, and the method remains fully GPU-compatible.

Figure 3: Additive model with $d=5$ .

The additive kernel estimator outperforms spline-based GAMs (e.g., PyGAM) in both runtime and memory usage, especially as $n$ increases. The GPU implementation remains tractable for $n=10^8$ , while PyGAM exceeds available memory at $n=10^7$ .

Hyperparameter Selection and Grid Search

A notable practical advantage is that the most expensive step is the NUFFT computation, which is independent of the regularization parameter $\lambda$ . This enables efficient grid search over $\lambda$ for model selection, as the matrix inversion step is comparatively cheap.

Figure 4: Grid search with 300 hyperparameters for the additive model with $d=5$ .

The GPU-based kernel implementation completes a 300-point grid search in under 30 seconds for $n=10^8$ , while PyGAM becomes infeasible for $n \geq 10^5$ .

Implications and Future Directions

This work demonstrates that exact kernel methods can be made practical for massive datasets and high-dimensional problems, provided the kernel admits a Fourier representation. The approach is flexible, supporting a range of structural constraints (smoothness, PDEs, additivity) and is compatible with modern hardware. Theoretical guarantees are preserved, and empirical results confirm both statistical and computational efficiency.

Potential future directions include extending the framework to non-Euclidean domains, exploring non-stationary or data-adaptive bases, and integrating with deep kernel learning architectures. The ability to efficiently incorporate complex prior knowledge (e.g., via PDEs or shape constraints) opens new avenues for scientific machine learning and large-scale structured regression.

Conclusion

The paper establishes a unified, scalable, and theoretically sound framework for kernel regression, leveraging Fourier-NUFFT acceleration and GPU hardware. The methods achieve minimax optimal rates for Sobolev, physics-informed, and additive models, and are empirically validated on datasets of unprecedented scale. This work significantly broadens the applicability of kernel methods to modern large-scale learning tasks, while maintaining rigorous statistical guarantees.

Markdown Report Issue