Universality, Characteristic Kernels and RKHS Embedding of Measures (1003.0887v1)

Published 3 Mar 2010 in stat.ML, math.ST, and stat.TH

Abstract: A Hilbert space embedding for probability measures has recently been proposed, wherein any probability measure is represented as a mean element in a reproducing kernel Hilbert space (RKHS). Such an embedding has found applications in homogeneity testing, independence testing, dimensionality reduction, etc., with the requirement that the reproducing kernel is characteristic, i.e., the embedding is injective. In this paper, we generalize this embedding to finite signed Borel measures, wherein any finite signed Borel measure is represented as a mean element in an RKHS. We show that the proposed embedding is injective if and only if the kernel is universal. This therefore, provides a novel characterization of universal kernels, which are proposed in the context of achieving the Bayes risk by kernel-based classification/regression algorithms. By exploiting this relation between universality and the embedding of finite signed Borel measures into an RKHS, we establish the relation between universal and characteristic kernels.

Authors (3)

Bharath K. Sriperumbudur (35 papers)
Kenji Fukumizu (89 papers)
Gert R. G. Lanckriet (6 papers)

Citations (501)

View on Semantic Scholar

Summary

The paper extends RKHS embeddings to finite signed Borel measures, enabling comprehensive analysis for classification and regression tasks.
It establishes that a kernel is universal if its RKHS embedding is injective, ensuring unique representation of measures.
The study clarifies distinctions between universal and characteristic kernels, guiding optimal kernel selection in practical machine learning.

Overview of Universality, Characteristic Kernels, and RKHS Embedding of Measures

This paper tackles the complex topic of embedding finite signed Borel measures into a Reproducing Kernel Hilbert Space (RKHS) and examines the conditions making this embedding injective. It offers new insights into the concepts of universality and characteristic kernels, core elements in the field of machine learning, especially in kernel-based algorithms.

Key Contributions

Measure Embedding Generalization: The authors extend the RKHS embedding from probability measures to finite signed Borel measures. This allows for a more comprehensive analysis, including binary classification and regression tasks.
Universality Characterization: The paper introduces a novel way to characterize universal kernels through the injective embedding of Borel measures into an RKHS. Universal kernels are critical for achieving the Bayes risk in classification and regression tasks.
Relation to Characteristic Kernels: The paper explores the relationship between universal and characteristic kernels, shedding light on their distinct yet interconnected properties.

Main Findings

Injectivity Conditions: The embedding of measures into an RKHS is injective if the kernel is universal. The paper establishes this relationship both theoretically and through examples, providing a clear criterion for universality.
Kernel Classes: Various classes of kernels such as translation invariant and radial kernels on Euclidean spaces are examined. For example, translation invariant kernels are shown to be $c_0$ -universal if their spectral measures have support covering the whole space.
Universal vs. Characteristic: It is established that any $c_0$ -universal or c-universal kernel is characteristic, meaning such kernels can uniquely represent probability measures. However, the converse does not always hold, and this distinction is elaborated upon through specific kernel classes.

Practical Implications

Kernel Methods in Machine Learning: Understanding when a kernel is universal or characteristic helps in selecting appropriate kernels for machine learning tasks like support vector machines or kernel PCA, ensuring consistency and reliability of the algorithms.
Metrization of Weak Topology: The work shows that for a $c_0$ -universal kernel, the metric induced by the RKHS embedding (MMD) aligns with the weak topology on probability measures. This equivalence is crucial for theoretical examinations of convergence in statistical and learning contexts.

Theoretical and Future Directions

Generalization to Non-Compact Spaces: The findings extend to more complex, non-compact spaces, therefore broadening the understanding of kernel applications in more general settings.
Potential Areas for Exploration: Future research could delve into $L_p$ -universality and its implications across different statistical learning scenarios, providing further depth in understanding kernel efficacy.

This paper's rigorous theoretical foundation offers significant advancements in kernel methods, both for practical applications and theoretical explorations in machine learning.

PDF Markdown