Gaussian and Polynomial Kernels
- Gaussian and Polynomial Kernels are fundamental positive definite functions that enable regression, classification, and surrogate modeling through induced feature maps.
- They leverage finite (polynomial) or infinite-dimensional (Gaussian) feature spaces to capture complex data structures and provide universal or exact approximations.
- Recent advances such as Taylor expansions and sketching techniques improve computational efficiency for scalable kernel methods in high-dimensional settings.
Gaussian and polynomial kernels are foundational classes of positive definite kernels central to machine learning, approximation theory, and computational physics. These kernels induce concrete feature maps into (potentially infinite-dimensional) Hilbert spaces, yielding flexible hypothesis classes for regression, classification, and surrogate modeling. Their algebraic structure, universal approximation properties, connections via polynomial expansions, asymptotic regimes, and efficient computation underlie much of the recent progress in scalable kernel methods, functional data analysis, and surrogate modeling for complex scientific problems.
1. Definitions, Feature Maps, and Algebraic Structure
The most widely used kernels on are:
- Gaussian (RBF) kernel (universal):
where is Euclidean norm, and is the length-scale (bandwidth).
- Polynomial kernel (order ; not universal):
with (often ) controlling the offset, and the degree.
Both are positive-definite; the induced feature map satisfies . For the polynomial kernel, the feature map enumerates all monomials of total degree , and the feature-space dimension is . The Gaussian kernel, by contrast, admits an infinite-dimensional feature space, often represented via the Hermite expansion or random Fourier features (Klus et al., 2021, Cotter et al., 2011, Gorodetsky et al., 2015).
2. Taylor Expansions and the Gaussian-Polynomial Connection
The Gaussian kernel’s analytic structure allows for polynomial approximations via its Taylor (Mercer) expansion in the inner product:
Truncating at yields a finite-dimensional polynomial kernel approximation. Each term is an inner product of degree- monomials, giving a hierarchy of polynomial spaces, and the truncation error for is bounded by (Cotter et al., 2011). For , only terms are required to achieve error on datasets of size , yielding efficient polynomial approximations to the Gaussian kernel (Song et al., 2021, Ahle et al., 2019).
3. Approximation Power and Limiting Regimes
Polynomial kernels are non-universal: the associated RKHS comprises polynomials up to degree , with rapid convergence to zero error after samples in general position. In contrast, the Gaussian kernel’s RKHS is dense in (for compact ), achieving super-exponential approximation rates:
for samples and width parameter (Karvonen et al., 2022). However, in high-dimensional, sparse-data regimes, the optimal Gaussian bandwidth diverges, and the kernel regresses to low-degree polynomials—formally, for data diameter, the Gaussian kernel’s expansion truncates after finitely many terms, and kernel regression reduces to polynomial regression (Manzhos et al., 2023). This regime transition is rigorously characterized: if optimal for -dimensional data, only degrees matter regardless of (Manzhos et al., 2023).
The minimal polynomial degree required for uniform approximation of to error on is (Aggarwal et al., 2022).
4. Symmetric and Antisymmetric Kernel Lifting
For tasks requiring invariance or equivariance to permutations (e.g., indistinguishable particles, point-cloud data in quantum chemistry), standard kernels can be lifted to symmetric (invariant) or antisymmetric (sign-changing) forms via averaging over the symmetric group :
- Symmetrized kernel:
- Antisymmetrized kernel:
$k_a(x, x') = \frac{1}{D!} \sum_{\pi \in S_D} \sgn(\pi)\,k\!\bigl(\pi(x), x'\bigr)$
These remain positive-definite. Symmetrized polynomial kernels have feature spaces indexed by integer partitions, reducing dimension and thus sample complexity compared to the conventional tensor basis. Antisymmetric polynomial kernels require all monomial exponents to be distinct; for the Gaussian kernel, the antisymmetrization yields the Slater-determinant representation:
which is standard in many-fermion wavefunction modeling and can be evaluated in time (Klus et al., 2021).
Imposing (anti)symmetry reduces the hypothesis space dimension and can halve or further reduce the necessary training data, improving data efficiency and conditioning in physics and chemistry applications.
5. Efficient Computation: Oblivious Sketching and Feature Approximations
Naïve kernel methods scale poorly with sample size and feature dimension, especially for high-degree polynomial or infinite-dimensional Gaussian kernels. Recent advances use oblivious sketching—combining fast Johnson–Lindenstrauss (SRHT) and TensorSketch methods—to embed the lifted polynomial feature space into , with only polynomial in (degree) and dataset size, independently of (Song et al., 2021, Ahle et al., 2019).
For the Gaussian kernel, Taylor truncation plus polynomial sketch achieves kernel inner-product approximation in time, eliminating the linear-in-degree slowdown typical of prior approaches. Similarly, random Fourier features and explicit monomial “Taylor features” provide alternative approximations: for -sparse data and moderate , Taylor features often outperform random Fourier features per FLOP for bandwidths not much smaller than the data radius (Cotter et al., 2011). Sketching enables kernel ridge regression, SVMs, and kernel PCA to operate efficiently at large scale.
6. RKHS, Mercer Expansions, and Operator-Theoretic Perspective
Both Gaussian and polynomial kernels admit Mercer (eigenfunction) expansions:
- Gaussian kernel on compact :
with super-exponential eigenvalue decay.
- Polynomial kernel (degree ):
with finitely many nonzero eigenvalues, corresponding to the monomial basis of order .
Truncating the Gaussian Mercer series reduces it to polynomial kernels, and in the “flat-kernel” () regime, the induced RKHS converges to the polynomial subspace of degree determined by the sample configuration (Karvonen et al., 2019, Karvonen et al., 2022). The Gaussian kernel’s infinite smoothness yields super-exponential convergence for function approximation, while finite-rank polynomial kernels offer algebraic rates and exact interpolation upon sufficient sampling.
Kernel ridge regression with a truncated Gaussian Mercer expansion is equivalent to pseudospectral polynomial projection, provided the data locations form an exact quadrature rule for the polynomial space—establishing direct equivalence between Gaussian-process regression and polynomial least-squares in this regime (Gorodetsky et al., 2015, Karvonen et al., 2019).
7. Summary Table: Key Properties
| Kernel Type | RKHS Dimension | Universality | Approximation Rate | Efficient Sketch/Ftrs |
|---|---|---|---|---|
| Polynomial () | Non-universal | Exact after samples | Oblivious sketching, explicit monomials | |
| Gaussian (RBF) | Infinite | Universal | Super-exponential (factorial in ) | Taylor features, random Fourier, sketching |
| Sym. Poly. | integer partitions | Non-universal | Reduced monomial count | Permutationally invariant polynomial bases |
| Antisym. Poly./Gauss. | partition constraints | Universal on antisymmetric subspaces | Reduced monomials/Slater det. | Slater determinant, sketching |
The interplay of analytic structure, spectral decay, permutation invariance, and computational approximation governs the selection and deployment of Gaussian and polynomial kernels across scientific and data-driven domains. Key algorithmic advances have made scalable kernel methods feasible for a range of large-scale and high-dimensional problems, while deep operator-theoretic and algebraic connections clarify regimes in which each kernel class is most effective (Klus et al., 2021, Song et al., 2021, Cotter et al., 2011, Manzhos et al., 2023, Karvonen et al., 2022, Gorodetsky et al., 2015).
Sponsored by Paperpile, the PDF & BibTeX manager trusted by top AI labs.
Get 30 days free