Papers
Topics
Authors
Recent
Search
2000 character limit reached

Fast kernel methods: Sobolev, physics-informed, and additive models

Published 2 Sep 2025 in stat.ML, cs.LG, math.ST, stat.ME, and stat.TH | (2509.02649v1)

Abstract: Kernel methods are powerful tools in statistical learning, but their cubic complexity in the sample size n limits their use on large-scale datasets. In this work, we introduce a scalable framework for kernel regression with O(n log n) complexity, fully leveraging GPU acceleration. The approach is based on a Fourier representation of kernels combined with non-uniform fast Fourier transforms (NUFFT), enabling exact, fast, and memory-efficient computations. We instantiate our framework in three settings: Sobolev kernel regression, physics-informed regression, and additive models. When known, the proposed estimators are shown to achieve minimax convergence rates, consistent with classical kernel theory. Empirical results demonstrate that our methods can process up to tens of billions of samples within minutes, providing both statistical accuracy and computational scalability. These contributions establish a flexible approach, paving the way for the routine application of kernel methods in large-scale learning tasks.

Summary

  • The paper introduces a novel Fourier-based kernel regression method using NUFFT operations to achieve O(n log n) complexity on GPUs.
  • It demonstrates minimax optimal convergence rates for Sobolev, physics-informed, and additive models with rigorous theoretical guarantees.
  • Empirical results validate the method's scalability and efficiency on massive datasets, outperforming traditional approaches in both runtime and memory usage.

Fast Kernel Methods via Fourier-NUFFT: Sobolev, Physics-Informed, and Additive Models

Introduction and Motivation

This paper presents a unified framework for scalable kernel regression, leveraging Fourier representations and non-uniform fast Fourier transforms (NUFFT) to achieve O(nlogn)\mathcal{O}(n \log n) complexity for both time and memory. The approach is instantiated for three classes of models: Sobolev kernel regression, physics-informed regression, and additive models. The framework is designed to fully exploit GPU acceleration, enabling exact kernel learning on datasets with up to tens of billions of samples, a regime previously inaccessible to classical kernel methods due to their cubic complexity in nn.

Fourier-Based Kernel Regression and Computational Acceleration

The core innovation is the use of truncated Fourier bases to represent functions in the RKHS, reducing the kernel regression problem to a finite-dimensional linear system. The design matrix Φ\Phi is constructed from the truncated Fourier basis, and the empirical risk minimization is performed over the parameter vector θ\theta:

θ^=argminθCD(n1ΦθY22+λMθ22)\hat{\theta} = \arg\min_{\theta \in \mathbb{C}^D} \left( n^{-1} \|\Phi \theta - \mathbb{Y}\|_2^2 + \lambda \|M \theta\|_2^2 \right)

where MM encodes the kernel norm. The key computational bottlenecks—computation of ΦΦ\Phi^*\Phi and ΦY\Phi^*\mathbb{Y}—are recast as NUFFT operations, which are highly parallelizable and efficiently implemented on GPUs. The covariance matrix exhibits a block Toeplitz structure, further enabling fast matrix-vector products via FFTs.

This approach yields exact solutions (not approximations) for a broad class of kernels, in contrast to Nyström or random feature methods, which introduce additional approximation error and complicate theoretical analysis.

Statistical Guarantees and Minimax Rates

The framework is rigorously analyzed under standard nonparametric regression assumptions, with the target function ff^\star in a Sobolev space Hs(Ω)H^s(\Omega). The estimator achieves the minimax optimal rate:

E[fθ^fL2(PX)2]=O(n2s/(2s+d))\mathbb{E}\left[\|f_{\hat{\theta}} - f^\star\|_{L^2(\mathbb{P}_X)}^2\right] = \mathcal{O}\left(n^{-2s/(2s+d)}\right)

for appropriate choices of the truncation level mm and regularization parameter λ\lambda. The analysis covers both standard Sobolev kernel regression and a low-bias variant that penalizes the unweighted 2\ell_2 norm of the coefficients, mitigating the regularization bias associated with high-order Sobolev norms. Figure 1

Figure 1

Figure 1: Sobolev regression (Left), and low-bias regression (Right) in dimension d=2d=2.

Empirical results confirm that the observed test error closely matches the theoretical minimax rate, even for n=1010n=10^{10}, with GPU runtimes on the order of one minute.

Low-Bias Sobolev Regression: Bias-Variance Trade-off

The paper highlights a critical trade-off in the choice of the smoothness parameter ss. While higher ss yields faster asymptotic rates, the associated Sobolev norm can grow rapidly, increasing regularization bias. The low-bias variant, which uses an unweighted 2\ell_2 penalty, consistently outperforms the standard Sobolev kernel in finite-sample regimes, especially for large ss and moderate nn. The optimal ss in practice is often lower than the theoretical smoothness of ff^\star, and increases with nn.

Physics-Informed Kernel Regression

The framework naturally extends to physics-informed machine learning (PIML), where prior knowledge is encoded as linear PDE constraints. The penalty term ΩD(fθ,x)2dx\int_\Omega \mathcal{D}(f_\theta, x)^2 dx is efficiently incorporated in the Fourier domain, as differential operators act diagonally on Fourier coefficients. The resulting estimator remains amenable to NUFFT acceleration and GPU implementation, with no increase in asymptotic complexity. Figure 2

Figure 2

Figure 2: Physics-informed kernel regression (averaged over 20 samples).

Empirical results demonstrate that incorporating physics-based constraints improves predictive accuracy without increasing computational cost.

Additive Models and High-Dimensional Scalability

To address the curse of dimensionality, the framework supports additive models, where f(x)==1dg(x)f^\star(x) = \sum_{\ell=1}^d g^\star_\ell(x_\ell). The additive structure allows the estimator to achieve the univariate minimax rate n2s/(2s+1)n^{-2s/(2s+1)}, independent of dd, provided the component functions are sufficiently smooth. The block structure of the design and covariance matrices is exploited for efficient computation, and the method remains fully GPU-compatible. Figure 3

Figure 3

Figure 3: Additive model with d=5d=5.

The additive kernel estimator outperforms spline-based GAMs (e.g., PyGAM) in both runtime and memory usage, especially as nn increases. The GPU implementation remains tractable for n=108n=10^8, while PyGAM exceeds available memory at n=107n=10^7.

A notable practical advantage is that the most expensive step is the NUFFT computation, which is independent of the regularization parameter λ\lambda. This enables efficient grid search over λ\lambda for model selection, as the matrix inversion step is comparatively cheap. Figure 4

Figure 4

Figure 4: Grid search with 300 hyperparameters for the additive model with d=5d=5.

The GPU-based kernel implementation completes a 300-point grid search in under 30 seconds for n=108n=10^8, while PyGAM becomes infeasible for n105n \geq 10^5.

Implications and Future Directions

This work demonstrates that exact kernel methods can be made practical for massive datasets and high-dimensional problems, provided the kernel admits a Fourier representation. The approach is flexible, supporting a range of structural constraints (smoothness, PDEs, additivity) and is compatible with modern hardware. Theoretical guarantees are preserved, and empirical results confirm both statistical and computational efficiency.

Potential future directions include extending the framework to non-Euclidean domains, exploring non-stationary or data-adaptive bases, and integrating with deep kernel learning architectures. The ability to efficiently incorporate complex prior knowledge (e.g., via PDEs or shape constraints) opens new avenues for scientific machine learning and large-scale structured regression.

Conclusion

The paper establishes a unified, scalable, and theoretically sound framework for kernel regression, leveraging Fourier-NUFFT acceleration and GPU hardware. The methods achieve minimax optimal rates for Sobolev, physics-informed, and additive models, and are empirically validated on datasets of unprecedented scale. This work significantly broadens the applicability of kernel methods to modern large-scale learning tasks, while maintaining rigorous statistical guarantees.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 17 likes about this paper.