Differential Representer Theorem Overview

Updated 28 November 2025

Differential Representer Theorem is an extension of classical RKHS methods that integrates both function values and derivatives into its framework.
It uses the reproducing properties of kernels and their derivatives to represent solutions as a finite combination of kernel sections and gradient evaluations.
This theorem underpins advanced kernel methods in physics-informed and data-efficient learning, enhancing computational tractability and optimization.

The Differential Representer Theorem extends the classical representer framework in reproducing kernel Hilbert spaces (RKHS) to settings where observed data include not only function values but also partial derivatives of the underlying function. This generalization is fundamental for kernel methods that must exploit both function and gradient information, such as in learning from physics-informed, semi-supervised, or data-efficient paradigms. Under mild assumptions on the smoothness of the kernel, every solution to a regularized empirical risk problem with function and derivative data can be represented as a finite linear combination of kernel sections and their derivatives evaluated at the sample points, providing an efficient parameterization and computational tractability (El-Boukkouri et al., 20 Mar 2025).

1. RKHS Framework and Reproducing Properties

Let $X \subset \mathbb{R}^d$ be open, and let $K: X \times X \to \mathbb{R}$ be a continuous, positive-semidefinite kernel. The RKHS $\mathcal{H}$ associated with $K$ consists of real-valued functions with the reproducing property: $\forall x \in X,\, \forall f \in \mathcal{H}: \quad f(x) = \langle f, K(\cdot, x) \rangle_{\mathcal{H}}.$ For partial derivatives, the $p$ th partial-derivative operator $D_p: \mathcal{H} \to \mathcal{F}(X, \mathbb{R})$ is defined as $(D_p f)(x) = \frac{\partial f}{\partial x_p}(x)$ when the derivative exists. The operator $D_p$ admits a reproducing property if for each $x \in X$ there exists $\psi_{x,p} \in \mathcal{H}$ such that: $\forall f \in \mathcal{H}: \quad \frac{\partial f}{\partial x_p}(x) = \langle f, \psi_{x,p} \rangle_{\mathcal{H}}.$ This property holds under mild regularity—specifically, the kernel $K$ must admit certain mixed partials up to order two that are continuous in a neighborhood of the diagonal (El-Boukkouri et al., 20 Mar 2025).

2. Statement and Implications of the Differential Representer Theorem

Given data comprising function values $\{(x_i, y_i)\}_{i=1}^n$ and gradient observations $\{(x_j, g_j)\}_{j=1}^m$ with $g_j \approx \nabla f(x_j)$ , consider the regularized empirical risk functional: $J(f) = \sum_{i=1}^n (f(x_i) - y_i)^2 + \sum_{j=1}^m \|\nabla f(x_j) - g_j\|^2 + \lambda \|f\|_{\mathcal{H}}^2,$ where $\lambda > 0$ . Under the appropriate kernel smoothness, any minimizer $f^* \in \mathcal{H}$ takes the explicit finite-dimensional form: $f^*(x) = \sum_{i=1}^n \alpha_i K(x_i, x) + \sum_{j=1}^m \sum_{p=1}^d \beta_{j,p} \frac{\partial}{\partial u_p} K((u_p := x_{j,p}), x) = \sum_{i=1}^n \alpha_i K(x_i, \cdot) + \sum_{j=1}^m \sum_{p=1}^d \beta_{j,p} \psi_{x_j,p}(\cdot).$ The $\psi_{x_j,p}$ are the representers for the partial-derivative functionals. This result indicates that the minimizer is parameterized by coefficients associated with both observed values and observed gradients (El-Boukkouri et al., 20 Mar 2025).

3. Proof Outline and Functional-Analytic Structure

The proof leverages the structure of RKHSs and their duals. The continuous linear functionals for value and derivative observations ( $L_i(f)=f(x_i)$ , $M_{j,p}(f)=\frac{\partial f}{\partial x_p}(x_j)$ ) are representable through $K(\cdot, x_i)$ and $\psi_{x_j, p}$ . The minimization occurs over $\mathcal{H}$ , which can be decomposed orthogonally as $\mathcal{H} = \mathcal{S} \oplus \mathcal{S}^\perp$ , with $\mathcal{S}$ the span of $K(\cdot, x_i)$ and $\psi_{x_j,p}$ . The regularizer strictly penalizes components in $\mathcal{S}^\perp$ without improving data fit, forcing the minimizer to lie in $\mathcal{S}$ (El-Boukkouri et al., 20 Mar 2025).

4. Corollaries: Pure Derivative and Higher-Order Functionals

Pure Gradient Observations: If $n=0$ (no value data), then $f^*(x) = \sum_{j=1}^{m} \sum_{p=1}^d \beta_{j,p} \psi_{x_j,p}(x)$ .
Higher-Order Operators: If the kernel admits continuous mixed partials up to order $r$ near the diagonal, the same expansion applies with representers for higher-order operators, as linear combinations of kernel mixed partials up to order $r$ (El-Boukkouri et al., 20 Mar 2025).

5. Algorithmic Implementation and Gram Matrix Structure

Inserting the finite expansion of $f^*$ into $J(f)$ and differentiating with respect to the coefficients $\theta = (\alpha, \beta)$ leads to a structured linear system: $\begin{bmatrix} K_{nn} + \lambda I_n & D K_{nm} \ K_{mn}^T & D^2 K_{mm} + \lambda I_{md} \end{bmatrix} \begin{bmatrix} \alpha \ \beta \end{bmatrix} = \begin{bmatrix} y \ g \end{bmatrix},$ where $K_{nn}$ , $D K_{nm}$ , and $D^2 K_{mm}$ encode all value–value, value–derivative, and derivative–derivative Gram blocks. These can be computed using kernel partial derivatives up to the required order. All necessary blocks become explicit via the kernel derivatives, emphasizing the direct operationalization for computational algorithms (El-Boukkouri et al., 20 Mar 2025).

6. Connections with Classical Representer Theorem and Regularizer Structure

The Differential Representer Theorem is a direct extension of the classical representer theorem, which asserts that minimizers of regularized risk functionals with value observations alone in RKHS take the form $f^*(\cdot) = \sum_{i=1}^n \alpha_i K(\cdot, x_i)$ . The core functional-analytic result is that the regularizer must be a non-decreasing function of the RKHS norm for the representer property to hold. Necessary and sufficient conditions for this phenomenon—radial monotonicity of $\Omega(f) = h(\|f\|)$ —are established both in Hilbert spaces and, under further generalization, in uniformly convex and smooth Banach spaces (Dinuzzo et al., 2012, Schlegel, 2018).

7. Significance and Broader Applications

The differential representer expansion establishes computational tractability for kernel-learning methods incorporating derivative data, a foundational capability in modern machine learning scenarios such as scientific machine learning, system identification, and data-efficient learning from operator-valued or structured responses. The expansion's finite nature enables the use of standard convex optimization solvers, with problem size determined strictly by the number and type of data observations rather than the possibly infinite dimensionality of $\mathcal{H}$ (El-Boukkouri et al., 20 Mar 2025). The existence of Gram blocks for higher-order operators further enables extensible algorithmic design for a broad variety of empirical risk frameworks and physical constraints.

Markdown Upgrade to Chat

References (3)

General reproducing properties in RKHS with application to derivative and integral operators (2025)

The representer theorem for Hilbert spaces: a necessary and sufficient condition (2012)

When is there a Representer Theorem? Nondifferentiable Regularisers and Banach spaces (2018)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Differential Representer Theorem.