Papers
Topics
Authors
Recent
2000 character limit reached

Gaussian and Polynomial Kernels

Updated 21 November 2025
  • Gaussian and Polynomial Kernels are fundamental positive definite functions that enable regression, classification, and surrogate modeling through induced feature maps.
  • They leverage finite (polynomial) or infinite-dimensional (Gaussian) feature spaces to capture complex data structures and provide universal or exact approximations.
  • Recent advances such as Taylor expansions and sketching techniques improve computational efficiency for scalable kernel methods in high-dimensional settings.

Gaussian and polynomial kernels are foundational classes of positive definite kernels central to machine learning, approximation theory, and computational physics. These kernels induce concrete feature maps into (potentially infinite-dimensional) Hilbert spaces, yielding flexible hypothesis classes for regression, classification, and surrogate modeling. Their algebraic structure, universal approximation properties, connections via polynomial expansions, asymptotic regimes, and efficient computation underlie much of the recent progress in scalable kernel methods, functional data analysis, and surrogate modeling for complex scientific problems.

1. Definitions, Feature Maps, and Algebraic Structure

The most widely used kernels on RD\mathbb{R}^D are:

  • Gaussian (RBF) kernel (universal):

kG(x,x)=exp(xx22σ2)k_{\rm G}(x, x') = \exp\left(-\frac{\|x - x'\|^2}{2\sigma^2}\right)

where \|\cdot\| is Euclidean norm, and σ>0\sigma > 0 is the length-scale (bandwidth).

  • Polynomial kernel (order pp; not universal):

kpoly(x,x)=(c+xx)pk_{\rm poly}(x, x') = (c + x^\top x')^p

with c0c \ge 0 (often c=1c=1) controlling the offset, and pNp \in \mathbb{N} the degree.

Both are positive-definite; the induced feature map ϕ()\phi(\cdot) satisfies k(x,x)=ϕ(x),ϕ(x)Hk(x, x') = \langle \phi(x), \phi(x') \rangle_\mathcal{H}. For the polynomial kernel, the feature map enumerates all monomials of total degree p\le p, and the feature-space dimension is (p+DD)\binom{p+D}{D}. The Gaussian kernel, by contrast, admits an infinite-dimensional feature space, often represented via the Hermite expansion or random Fourier features (Klus et al., 2021, Cotter et al., 2011, Gorodetsky et al., 2015).

2. Taylor Expansions and the Gaussian-Polynomial Connection

The Gaussian kernel’s analytic structure allows for polynomial approximations via its Taylor (Mercer) expansion in the inner product:

kG(x,x)=exp(x2+x22σ2)exp(xxσ2)=ex2+x22σ2m=01m!(xxσ2)mk_{\rm G}(x, x') = \exp\left(-\frac{\|x\|^2 + \|x'\|^2}{2\sigma^2}\right) \cdot \exp\left(\frac{x^\top x'}{\sigma^2}\right) = e^{-\frac{\|x\|^2+\|x'\|^2}{2\sigma^2}} \sum_{m=0}^\infty \frac{1}{m!}\left(\frac{x^\top x'}{\sigma^2}\right)^m

Truncating at QQ yields a finite-dimensional polynomial kernel approximation. Each term is an inner product of degree-mm monomials, giving a hierarchy of polynomial spaces, and the truncation error for x,xR\|x\|, \|x'\| \le R is bounded by (R2/σ2)Q+1/(Q+1)!(R^2/\sigma^2)^{Q+1}/(Q+1)! (Cotter et al., 2011). For R2/σ2=O(1)R^2/\sigma^2 = O(1), only O(logn)O(\log n) terms are required to achieve error ϵ\epsilon on datasets of size nn, yielding efficient polynomial approximations to the Gaussian kernel (Song et al., 2021, Ahle et al., 2019).

3. Approximation Power and Limiting Regimes

Polynomial kernels are non-universal: the associated RKHS comprises polynomials up to degree pp, with rapid convergence to zero error after p+1p+1 samples in general position. In contrast, the Gaussian kernel’s RKHS is dense in C(X)C(\mathcal{X}) (for compact X\mathcal{X}), achieving super-exponential approximation rates:

enmin(ε2)n(n!)1/2e_n^{\min} \sim \left(\frac{\varepsilon}{2}\right)^n (n!)^{-1/2}

for nn samples and width parameter ε\varepsilon (Karvonen et al., 2022). However, in high-dimensional, sparse-data regimes, the optimal Gaussian bandwidth diverges, and the kernel regresses to low-degree polynomials—formally, for \ell \gg data diameter, the Gaussian kernel’s expansion truncates after finitely many terms, and kernel regression reduces to polynomial regression (Manzhos et al., 2023). This regime transition is rigorously characterized: if optimal DN1/d\ell^* \gtrsim D N^{1/d} for dd-dimensional data, only O(1)O(1) degrees matter regardless of NN (Manzhos et al., 2023).

The minimal polynomial degree required for uniform approximation of exe^{-x} to error δ\delta on [0,B][0,B] is dB;δ(ex)=Θ(max{Blog(1/δ), log(1/δ)log(B1log(1/δ))})d_{B;\delta}(e^{-x}) = \Theta(\max\{\sqrt{B\log(1/\delta)},\ \frac{\log(1/\delta)}{\log(B^{-1}\log(1/\delta))}\}) (Aggarwal et al., 2022).

4. Symmetric and Antisymmetric Kernel Lifting

For tasks requiring invariance or equivariance to permutations (e.g., indistinguishable particles, point-cloud data in quantum chemistry), standard kernels can be lifted to symmetric (invariant) or antisymmetric (sign-changing) forms via averaging over the symmetric group SDS_D:

  • Symmetrized kernel:

ks(x,x)=1D!πSDk ⁣(π(x),x)k_s(x, x') = \frac{1}{D!} \sum_{\pi \in S_D} k\!\bigl(\pi(x), x'\bigr)

  • Antisymmetrized kernel:

$k_a(x, x') = \frac{1}{D!} \sum_{\pi \in S_D} \sgn(\pi)\,k\!\bigl(\pi(x), x'\bigr)$

These remain positive-definite. Symmetrized polynomial kernels have feature spaces indexed by integer partitions, reducing dimension and thus sample complexity compared to the conventional tensor basis. Antisymmetric polynomial kernels require all monomial exponents to be distinct; for the Gaussian kernel, the antisymmetrization yields the Slater-determinant representation:

ka(x,x)=1D!det(exp((xixj)22σ2))i,j=1Dk_a(x, x') = \frac{1}{D!}\det \left( \exp\left(-\frac{(x_i - x'_j)^2}{2\sigma^2}\right) \right)_{i,j=1}^D

which is standard in many-fermion wavefunction modeling and can be evaluated in O(D3)O(D^3) time (Klus et al., 2021).

Imposing (anti)symmetry reduces the hypothesis space dimension and can halve or further reduce the necessary training data, improving data efficiency and conditioning in physics and chemistry applications.

5. Efficient Computation: Oblivious Sketching and Feature Approximations

Naïve kernel methods scale poorly with sample size nn and feature dimension, especially for high-degree polynomial or infinite-dimensional Gaussian kernels. Recent advances use oblivious sketching—combining fast Johnson–Lindenstrauss (SRHT) and TensorSketch methods—to embed the lifted polynomial feature space into Rm\mathbb{R}^m, with mm only polynomial in qq (degree) and dataset size, independently of dqd^q (Song et al., 2021, Ahle et al., 2019).

For the Gaussian kernel, Taylor truncation plus polynomial sketch achieves kernel inner-product approximation in O(nd+ϵ2n2polylogn)O(nd + \epsilon^{-2}n^2 \operatorname{polylog} n) time, eliminating the linear-in-degree slowdown typical of prior approaches. Similarly, random Fourier features and explicit monomial “Taylor features” provide alternative approximations: for ss-sparse data and moderate MM, Taylor features often outperform random Fourier features per FLOP for bandwidths not much smaller than the data radius (Cotter et al., 2011). Sketching enables kernel ridge regression, SVMs, and kernel PCA to operate efficiently at large scale.

6. RKHS, Mercer Expansions, and Operator-Theoretic Perspective

Both Gaussian and polynomial kernels admit Mercer (eigenfunction) expansions:

  • Gaussian kernel on compact X\mathcal{X}:

kG(x,x)=i=1λi(G)φi(x)φi(x)k_G(x, x') = \sum_{i=1}^\infty \lambda_i^{(G)} \varphi_i(x) \varphi_i(x')

with super-exponential eigenvalue decay.

  • Polynomial kernel (degree pp):

kP(x,x)=α=pλαψα(x)ψα(x)k_P(x, x') = \sum_{|\alpha| = p} \lambda_\alpha \psi_\alpha(x) \psi_\alpha(x')

with finitely many nonzero eigenvalues, corresponding to the monomial basis of order pp.

Truncating the Gaussian Mercer series reduces it to polynomial kernels, and in the “flat-kernel” (σ\sigma \to \infty) regime, the induced RKHS converges to the polynomial subspace of degree determined by the sample configuration (Karvonen et al., 2019, Karvonen et al., 2022). The Gaussian kernel’s infinite smoothness yields super-exponential convergence for function approximation, while finite-rank polynomial kernels offer algebraic rates and exact interpolation upon sufficient sampling.

Kernel ridge regression with a truncated Gaussian Mercer expansion is equivalent to pseudospectral polynomial projection, provided the data locations form an exact quadrature rule for the polynomial space—establishing direct equivalence between Gaussian-process regression and polynomial least-squares in this regime (Gorodetsky et al., 2015, Karvonen et al., 2019).

7. Summary Table: Key Properties

Kernel Type RKHS Dimension Universality Approximation Rate Efficient Sketch/Ftrs
Polynomial (pp) (p+DD)\binom{p+D}{D} Non-universal Exact after p+1p+1 samples Oblivious sketching, explicit monomials
Gaussian (RBF) Infinite Universal Super-exponential (factorial in nn) Taylor features, random Fourier, sketching
Sym. Poly. integer partitions Non-universal Reduced monomial count Permutationally invariant polynomial bases
Antisym. Poly./Gauss. partition constraints Universal on antisymmetric subspaces Reduced monomials/Slater det. Slater determinant, sketching

The interplay of analytic structure, spectral decay, permutation invariance, and computational approximation governs the selection and deployment of Gaussian and polynomial kernels across scientific and data-driven domains. Key algorithmic advances have made scalable kernel methods feasible for a range of large-scale and high-dimensional problems, while deep operator-theoretic and algebraic connections clarify regimes in which each kernel class is most effective (Klus et al., 2021, Song et al., 2021, Cotter et al., 2011, Manzhos et al., 2023, Karvonen et al., 2022, Gorodetsky et al., 2015).

Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Gaussian and Polynomial Kernels.