Gaussian and Polynomial Kernels

Updated 21 November 2025

Gaussian and Polynomial Kernels are fundamental positive definite functions that enable regression, classification, and surrogate modeling through induced feature maps.
They leverage finite (polynomial) or infinite-dimensional (Gaussian) feature spaces to capture complex data structures and provide universal or exact approximations.
Recent advances such as Taylor expansions and sketching techniques improve computational efficiency for scalable kernel methods in high-dimensional settings.

Gaussian and polynomial kernels are foundational classes of positive definite kernels central to machine learning, approximation theory, and computational physics. These kernels induce concrete feature maps into (potentially infinite-dimensional) Hilbert spaces, yielding flexible hypothesis classes for regression, classification, and surrogate modeling. Their algebraic structure, universal approximation properties, connections via polynomial expansions, asymptotic regimes, and efficient computation underlie much of the recent progress in scalable kernel methods, functional data analysis, and surrogate modeling for complex scientific problems.

1. Definitions, Feature Maps, and Algebraic Structure

The most widely used kernels on $\mathbb{R}^D$ are:

Gaussian (RBF) kernel (universal):

$k_{\rm G}(x, x') = \exp\left(-\frac{\|x - x'\|^2}{2\sigma^2}\right)$

where $\|\cdot\|$ is Euclidean norm, and $\sigma > 0$ is the length-scale (bandwidth).

Polynomial kernel (order $p$ ; not universal):

$k_{\rm poly}(x, x') = (c + x^\top x')^p$

with $c \ge 0$ (often $c=1$ ) controlling the offset, and $p \in \mathbb{N}$ the degree.

Both are positive-definite; the induced feature map $\phi(\cdot)$ satisfies $k(x, x') = \langle \phi(x), \phi(x') \rangle_\mathcal{H}$ . For the polynomial kernel, the feature map enumerates all monomials of total degree $\le p$ , and the feature-space dimension is $\binom{p+D}{D}$ . The Gaussian kernel, by contrast, admits an infinite-dimensional feature space, often represented via the Hermite expansion or random Fourier features (Klus et al., 2021, Cotter et al., 2011, Gorodetsky et al., 2015).

2. Taylor Expansions and the Gaussian-Polynomial Connection

The Gaussian kernel’s analytic structure allows for polynomial approximations via its Taylor (Mercer) expansion in the inner product:

$k_{\rm G}(x, x') = \exp\left(-\frac{\|x\|^2 + \|x'\|^2}{2\sigma^2}\right) \cdot \exp\left(\frac{x^\top x'}{\sigma^2}\right) = e^{-\frac{\|x\|^2+\|x'\|^2}{2\sigma^2}} \sum_{m=0}^\infty \frac{1}{m!}\left(\frac{x^\top x'}{\sigma^2}\right)^m$

Truncating at $Q$ yields a finite-dimensional polynomial kernel approximation. Each term is an inner product of degree- $m$ monomials, giving a hierarchy of polynomial spaces, and the truncation error for $\|x\|, \|x'\| \le R$ is bounded by $(R^2/\sigma^2)^{Q+1}/(Q+1)!$ (Cotter et al., 2011). For $R^2/\sigma^2 = O(1)$ , only $O(\log n)$ terms are required to achieve error $\epsilon$ on datasets of size $n$ , yielding efficient polynomial approximations to the Gaussian kernel (Song et al., 2021, Ahle et al., 2019).

3. Approximation Power and Limiting Regimes

Polynomial kernels are non-universal: the associated RKHS comprises polynomials up to degree $p$ , with rapid convergence to zero error after $p+1$ samples in general position. In contrast, the Gaussian kernel’s RKHS is dense in $C(\mathcal{X})$ (for compact $\mathcal{X}$ ), achieving super-exponential approximation rates:

$e_n^{\min} \sim \left(\frac{\varepsilon}{2}\right)^n (n!)^{-1/2}$

for $n$ samples and width parameter $\varepsilon$ (Karvonen et al., 2022). However, in high-dimensional, sparse-data regimes, the optimal Gaussian bandwidth diverges, and the kernel regresses to low-degree polynomials—formally, for $\ell \gg$ data diameter, the Gaussian kernel’s expansion truncates after finitely many terms, and kernel regression reduces to polynomial regression (Manzhos et al., 2023). This regime transition is rigorously characterized: if optimal $\ell^* \gtrsim D N^{1/d}$ for $d$ -dimensional data, only $O(1)$ degrees matter regardless of $N$ (Manzhos et al., 2023).

The minimal polynomial degree required for uniform approximation of $e^{-x}$ to error $\delta$ on $[0,B]$ is $d_{B;\delta}(e^{-x}) = \Theta(\max\{\sqrt{B\log(1/\delta)},\ \frac{\log(1/\delta)}{\log(B^{-1}\log(1/\delta))}\})$ (Aggarwal et al., 2022).

4. Symmetric and Antisymmetric Kernel Lifting

For tasks requiring invariance or equivariance to permutations (e.g., indistinguishable particles, point-cloud data in quantum chemistry), standard kernels can be lifted to symmetric (invariant) or antisymmetric (sign-changing) forms via averaging over the symmetric group $S_D$ :

Symmetrized kernel:

$k_s(x, x') = \frac{1}{D!} \sum_{\pi \in S_D} k\!\bigl(\pi(x), x'\bigr)$

Antisymmetrized kernel:

$k_a(x, x') = \frac{1}{D!} \sum_{\pi \in S_D} \sgn(\pi)\,k\!\bigl(\pi(x), x'\bigr)$

These remain positive-definite. Symmetrized polynomial kernels have feature spaces indexed by integer partitions, reducing dimension and thus sample complexity compared to the conventional tensor basis. Antisymmetric polynomial kernels require all monomial exponents to be distinct; for the Gaussian kernel, the antisymmetrization yields the Slater-determinant representation:

$k_a(x, x') = \frac{1}{D!}\det \left( \exp\left(-\frac{(x_i - x'_j)^2}{2\sigma^2}\right) \right)_{i,j=1}^D$

which is standard in many-fermion wavefunction modeling and can be evaluated in $O(D^3)$ time (Klus et al., 2021).

Imposing (anti)symmetry reduces the hypothesis space dimension and can halve or further reduce the necessary training data, improving data efficiency and conditioning in physics and chemistry applications.

5. Efficient Computation: Oblivious Sketching and Feature Approximations

Naïve kernel methods scale poorly with sample size $n$ and feature dimension, especially for high-degree polynomial or infinite-dimensional Gaussian kernels. Recent advances use oblivious sketching—combining fast Johnson–Lindenstrauss (SRHT) and TensorSketch methods—to embed the lifted polynomial feature space into $\mathbb{R}^m$ , with $m$ only polynomial in $q$ (degree) and dataset size, independently of $d^q$ (Song et al., 2021, Ahle et al., 2019).

For the Gaussian kernel, Taylor truncation plus polynomial sketch achieves kernel inner-product approximation in $O(nd + \epsilon^{-2}n^2 \operatorname{polylog} n)$ time, eliminating the linear-in-degree slowdown typical of prior approaches. Similarly, random Fourier features and explicit monomial “Taylor features” provide alternative approximations: for $s$ -sparse data and moderate $M$ , Taylor features often outperform random Fourier features per FLOP for bandwidths not much smaller than the data radius (Cotter et al., 2011). Sketching enables kernel ridge regression, SVMs, and kernel PCA to operate efficiently at large scale.

6. RKHS, Mercer Expansions, and Operator-Theoretic Perspective

Both Gaussian and polynomial kernels admit Mercer (eigenfunction) expansions:

Gaussian kernel on compact $\mathcal{X}$ :

$k_G(x, x') = \sum_{i=1}^\infty \lambda_i^{(G)} \varphi_i(x) \varphi_i(x')$

with super-exponential eigenvalue decay.

Polynomial kernel (degree $p$ ):

$k_P(x, x') = \sum_{|\alpha| = p} \lambda_\alpha \psi_\alpha(x) \psi_\alpha(x')$

with finitely many nonzero eigenvalues, corresponding to the monomial basis of order $p$ .

Truncating the Gaussian Mercer series reduces it to polynomial kernels, and in the “flat-kernel” ( $\sigma \to \infty$ ) regime, the induced RKHS converges to the polynomial subspace of degree determined by the sample configuration (Karvonen et al., 2019, Karvonen et al., 2022). The Gaussian kernel’s infinite smoothness yields super-exponential convergence for function approximation, while finite-rank polynomial kernels offer algebraic rates and exact interpolation upon sufficient sampling.

Kernel ridge regression with a truncated Gaussian Mercer expansion is equivalent to pseudospectral polynomial projection, provided the data locations form an exact quadrature rule for the polynomial space—establishing direct equivalence between Gaussian-process regression and polynomial least-squares in this regime (Gorodetsky et al., 2015, Karvonen et al., 2019).

7. Summary Table: Key Properties

Kernel Type	RKHS Dimension	Universality	Approximation Rate	Efficient Sketch/Ftrs
Polynomial ( $p$ )	$\binom{p+D}{D}$	Non-universal	Exact after $p+1$ samples	Oblivious sketching, explicit monomials
Gaussian (RBF)	Infinite	Universal	Super-exponential (factorial in $n$ )	Taylor features, random Fourier, sketching
Sym. Poly.	integer partitions	Non-universal	Reduced monomial count	Permutationally invariant polynomial bases
Antisym. Poly./Gauss.	partition constraints	Universal on antisymmetric subspaces	Reduced monomials/Slater det.	Slater determinant, sketching

The interplay of analytic structure, spectral decay, permutation invariance, and computational approximation governs the selection and deployment of Gaussian and polynomial kernels across scientific and data-driven domains. Key algorithmic advances have made scalable kernel methods feasible for a range of large-scale and high-dimensional problems, while deep operator-theoretic and algebraic connections clarify regimes in which each kernel class is most effective (Klus et al., 2021, Song et al., 2021, Cotter et al., 2011, Manzhos et al., 2023, Karvonen et al., 2022, Gorodetsky et al., 2015).