Refinement of Operator-Valued Kernels
Last updated: June 11, 2025
Refinement of Operator-Valued Reproducing Kernels
Based exclusively on “Refinement of Operator-Valued Reproducing Kernels °” (Xu et al., 2011 ° )
1 Introduction
Updating an operator-valued kernel is often necessary in multi-task learning:
- a too small RKHS ° causes under-fitting;
- a too large RKHS causes over-fitting °. Refinement offers a principled mechanism for enlarging the hypothesis space ° without changing the norm or the values of previously learned functions [§1, (Xu et al., 2011 ° )].
2 Definition and First Properties
Let be a non-empty set and a Hilbert space. For a kernel write for its RKHS.
Definition 2.1 A kernel is a refinement of if [ \mathcal H_{K}\subseteq\mathcal H_{G},\qquad |f|{\mathcal H_K}= |f|{\mathcal H_G},\ \forall f\in\mathcal H_{K}. ] We denote this by [Def. 2.1, (Xu et al., 2011 ° )].
Immediately, [ K\preceq G \Longleftrightarrow (\forall x,y)\;G(x,y)-K(x,y)\ \text{is a p.d. kernel and } \mathcal H_{K}\cap\mathcal H_{G-K}={0} ] (Proposition 3.1). Moreover,
an orthogonal direct sum ° preserving norms.
3 Characterisations via Feature Maps
Assume and with feature maps , .
Theorem 3.2 iff there exists a bounded operator ° such that and is an isometry [Thm. 3.2].
Hence refining amounts to embedding the original feature space isometrically into a larger one.
4 Integral Representation
For translation-invariant kernels on , [ K(x,y)=\int_{\mathbb R{d}}!e{i(x-y)\cdot t}\,\varphi_1(t)\,d\mu(t),\quad G(x,y)=\int_{\mathbb R{d}}!e{i(x-y)\cdot t}\,\varphi_2(t)\,d\mu(t), ] where are operator-valued densities.
Proposition 5.6 iff for -a.e. [Prop. 5.6].
The refinement thus corresponds to a pointwise dominance of the spectral measures °.
5 Existence of Non-Trivial Refinements
If is infinite, every kernel admits a non-trivial refinement unless (Proposition 6.1). For finite , the existence reduces to strict positivity of the Gram matrix ° (Proposition 6.2).
6 Preserved Properties
Refinement conserves key attributes:
Property | Preserved? | Reference |
---|---|---|
Continuity of kernel | ✔ | Prop. 6.4 |
Universality (density in ) | ✔ | Prop. 6.6 |
7 Numerical Evidence
Two illustrative experiments [§7]:
- Under-fitting scenario (Gaussian kernel on non-smooth target). Refinement with a polynomial component decreased mean squared error significantly (Tables 7.1–7.2).
- Over-fitting scenario (Gaussian + high-degree polynomial kernel). Replacing by its coarser component (i.e. using ) reduced test error and variance (Tables 7.3–7.4).
These confirm that controlled enlargement or reduction of the RKHS via refinement effectively balances bias and variance °.
8 Worked Examples
- Finite Hilbert-Schmidt kernels . Refinement is achieved by adding new terms or enlarging coefficients while keeping existing ones intact (Theorem 5.11).
- Hessian ° kernels:
Refinement of the Hessian of a scalar kernel corresponds to refining the underlying scalar kernel (Theorem 5.8).
- Transformation kernels:
Refinement criteria translate to refinement of each transformed scalar sub-kernel (Propositions 5.9–5.10).
9 Conclusion
Refinement provides a rigorous, constructive method to adapt operator-valued kernels:
- expands or contracts the hypothesis space without disturbing existing estimators,
- is characterised precisely through difference kernels, feature-map embeddings, and integral spectra,
- preserves desirable analytic properties,
- is empirically effective for mitigating under- and over-fitting in multi-output learning.
These results supply practical tools and theoretical guarantees ° for dynamic kernel selection ° in vector-valued machine-learning problems.