Mercer's Theorem: Spectral Decomposition

Updated 13 December 2025

Mercer's theorem is a fundamental result that expresses continuous, symmetric, positive-definite kernels as an absolutely and uniformly convergent series of eigenfunctions and eigenvalues.
The theorem leverages compact, self-adjoint integral operators, providing quantitative convergence rates that are critical for constructing reproducing kernel Hilbert spaces.
Extensions to operator- and matrix-valued kernels broaden its applications in spectral theory, probability, and machine learning, supporting numerical and analytical methods.

Mercer's theorem provides a canonical spectral decomposition for continuous, symmetric, positive-definite kernels on compact domains, establishing that such kernels admit an absolutely and uniformly convergent expansion in terms of orthonormal eigenfunctions of the associated integral operator. The result generalizes to operator- and matrix-valued kernels, and connects deeply with the theory of reproducing kernel Hilbert spaces (RKHS), spectral theory of compact operators, and numerous applications in analysis, probability, optimization, and machine learning. Contemporary research further extends Mercer's expansion to indefinite and asymmetric kernels, and to operator-theoretic frameworks in von Neumann algebras.

1. Classical Formulation and Spectral Foundations

Let $X$ be a compact metric space with finite Borel measure $\mu$ , and $K:X\times X\to\mathbb{R}$ a continuous, symmetric, positive-definite kernel—that is, for every finite collection $\{x_i\}_{i=1}^n\subset X$ and $\{c_i\}_{i=1}^n\subset\mathbb{R}$ ,

$\sum_{i=1}^n\sum_{j=1}^n c_ic_jK(x_i,x_j)\geq0.$

The integral operator $T_K: L^2(X,\mu)\to L^2(X,\mu)$ is defined by

$(T_Kf)(x) = \int_X K(x,y)f(y)\,d\mu(y).$

$T_K$ is compact, self-adjoint, and positive. By the spectral theorem, its spectrum consists of a (possibly finite or infinite) sequence of non-negative eigenvalues $\{\lambda_n\}_{n=1}^\infty$ , with $\lambda_n\to0$ , and associated orthonormal eigenfunctions $\{\phi_n\}_{n=1}^\infty\subset C(X)$ . Mercer's theorem asserts the expansion

$K(x,y) = \sum_{n=1}^\infty \lambda_n\phi_n(x)\phi_n(y)\,,$

where the series converges absolutely and uniformly on $X\times X$ (Gheondea, 6 Dec 2025, Bagchi, 2020, Ghojogh et al., 2021).

2. Modes of Convergence and Quantitative Bounds

Mercer's expansion is remarkable for its convergence properties. Not only does it converge in $L^2(X\times X)$ , but absolute and uniform convergence on $X\times X$ holds—this follows from diagonal bounds and Dini’s Theorem applied to the monotonic sequence of positive-definite remainders

$R_N(x,y) = K(x,y) - \sum_{n=1}^N \lambda_n\phi_n(x)\phi_n(y),$

with $|R_N(x,y)|^2 \le R_N(x,x)R_N(y,y)$ and $R_N(x,x)\to0$ uniformly.

Takhanov (Takhanov, 2022) refines the classical theorem by providing explicit rates for the uniform convergence of truncated Mercer expansions, depending on the smoothness class $K\in C^{2m}(X\times X)$ and eigenvalue tail:

$L^1\to L^\infty$ (sup-norm) bound: $\|R_N\|_{L^\infty}=O\!\left(\left(\sum_{i>N}\lambda_i\right)^{m/(m+n)}\right)$ where $n$ is the ambient dimension and $m$ relates to the smoothness.
$L^2\to L^\infty$ (sup-norm) bound: $\|R_N\|_{L^\infty}=O\!\left(\left(\sum_{i>N}\lambda_i^2\right)^{m/(2m+n)}\right)$ .

These rates quantify how rapidly the truncated expansion converges as a function of eigenvalue decay and regularity, and underpin approximation schemes in numerical and statistical contexts.

3. RKHS, Feature Maps, and Operator-Theoretic Perspectives

There is a fundamental relationship between Mercer kernels and RKHS theory. For a Mercer kernel $K$ , the associated RKHS $\mathcal{H}_K$ consists of functions $f$ for which the evaluation $f(x)$ is continuous, with the reproducing property $f(x) = \langle f, K(\cdot, x)\rangle_{\mathcal{H}_K}$ . In operator-theoretic terms, $\mathcal{H}_K$ can be identified with the operator range of $T_K^{1/2}$ , with the kernel serving as the Gram matrix for this Hilbert space structure (Gheondea, 6 Dec 2025, Ghojogh et al., 2021).

The Mercer expansion also induces a canonical Hilbert-space feature map

$\Phi(x) = (\sqrt{\lambda_1}\phi_1(x), \sqrt{\lambda_2}\phi_2(x), \ldots)$

into $\ell^2$ , so $K(x, y) = \langle\Phi(x), \Phi(y)\rangle_{\ell^2}$ . Every $f\in\mathcal{H}_K$ admits the expansion $f(\cdot) = \sum_n \alpha_n\phi_n(\cdot)$ with $\|f\|_{\mathcal{H}_K}^2 = \sum_n \alpha_n^2/\lambda_n$ .

4. Extensions: Operator- and Matrix-Valued Kernels

Mercer’s theorem extends to operator-valued and matrix-valued kernels. For a kernel $K: T\times T \to \mathcal{B}_1(H)$ (trace-class operators on a separable Hilbert space $H$ ), with $K$ continuous, Hermitian, and positive in the appropriate sense,

$K(s, t) = \sum_{j=1}^\infty \lambda_j \left[\Phi_j(s)\otimes_H\Phi_j(t)\right],$

where $\{\Phi_j\}$ is an orthonormal basis of $L^2(T; H)$ consisting of continuous $H$ -valued functions and the sum converges absolutely and uniformly in trace-class norm (Santoro et al., 2023).

In the matrix-valued case, for $K: X\times X\to\mathbb{R}^{N\times N}$ continuous, symmetric, and matrix-valued positive-definite, the Mercer–Young theorem provides a spectral expansion

$K(x, y) = \sum_{k=1}^\infty \sigma_k \Phi_k(x)\Phi_k(y)^\top$

with $\{\Phi_k\}$ an orthonormal sequence in $L^2(X, \mathbb{C}^N)$ and strictly positive $\sigma_k\to0$ . The equivalence of discrete and integral positive-definiteness notions is established, and the expansion converges uniformly componentwise (Neuman et al., 27 Mar 2024).

5. Applications in Probability and Machine Learning

In stochastic analysis, the Mercer expansion of the covariance kernel enables Karhunen–Loève expansions of mean-square continuous Hilbert-valued random processes: $X_t = E[X_t] + \sum_{j=1}^{\infty}\xi_j\Phi_j(t),$ where $\xi_j$ are uncorrelated random coefficients with $E[\xi_i\xi_j] = \lambda_j\delta_{ij}$ . Uniform mean-square convergence in $t$ is ensured under continuity hypotheses (Santoro et al., 2023).

In machine learning, Mercer's theorem is the mathematical foundation for kernel methods including SVMs, kernel ridge regression, and kernel PCA. The RKHS and Mercer’s expansion guarantee that continuous p.d. kernels admit a finite or infinite-dimensional feature mapping, enabling linear methods to be lifted to nonlinear settings without explicit computation of the feature space (Ghojogh et al., 2021, Bagchi, 2020). The uniform convergence property underpins practical spectral and kernel approximation schemes, such as the Nyström method and randomized feature maps.

6. Generalizations: Indefinite, Asymmetric, and Operator-Theoretic Contexts

Recent work extends Mercer's expansion to continuous, indefinite and asymmetric kernels $K: [a,b]\times[c,d]\to\mathbb{R}$ of bounded variation in each variable. For such kernels, the singular value expansion (SVE)

$K(x, y) = \sum_{n=1}^{\infty} \sigma_n u_n(x) v_n(y)$

converges pointwise almost everywhere, almost uniformly, and unconditionally almost everywhere, but not necessarily uniformly or absolutely in the absence of positive-definiteness. Explicit decay rates for $\sigma_n$ are established under smoothness or BV assumptions, and efficient algorithms for practical kernel expansions are provided (Jeong et al., 24 Sep 2024).

In the context of von Neumann algebras and operator bimodules, “Mercer’s theorem” refers to extension phenomena for isometric and intertwining maps between Cartan bimodules or bimodules over crossed products. Uniqueness and structural results for such extensions to normal $*$ -isomorphisms are obtained, leading to spectral-synthesis properties and parametrization of Bures-closed bimodules in terms of central support projections (Cameron et al., 2012, Cameron et al., 2016).

7. Illustrative Examples and Further Remarks

For polynomial kernels $K(x, y) = (x\cdot y)^d$ on $\mathbb{R}^n$ , the Mercer expansion is finite, with monomials as eigenfunctions (Gheondea, 6 Dec 2025).
The Gaussian RBF kernel $K(x, y) = \exp(-\|x-y\|^2/\sigma^2)$ on a compact domain yields super-exponentially decaying eigenvalues and an RKHS of entire functions (Gheondea, 6 Dec 2025, Ghojogh et al., 2021).
For general continuous, symmetric, positive-definite $K$ , the kernel is recovered as a uniformly convergent series of eigenfunctions weighted by positive eigenvalues, forming the basis for spectral methods and functional analytic approaches throughout pure and applied mathematics (Gheondea, 6 Dec 2025, Bagchi, 2020).

Mercer's theorem thus serves as a unifying framework in functional analysis, probability, optimization, and applied mathematics, linking spectral theory, RKHS construction, and positive-definite kernels in a manner that is both theoretically rigorous and of foundational algorithmic importance.